574 lines
15 KiB
Markdown
574 lines
15 KiB
Markdown
# Requesty.ai API Research
|
|
|
|
## API Overview
|
|
|
|
### Base Information
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **Base URL** | `https://router.requesty.ai/v1` |
|
|
| **API Version** | v1 |
|
|
| **Protocol** | HTTPS REST |
|
|
| **Authentication** | Bearer token (API key) |
|
|
| **Request Format** | JSON |
|
|
| **Response Format** | JSON |
|
|
| **Compatibility** | OpenAI SDK drop-in replacement |
|
|
|
|
### API Endpoints
|
|
|
|
Based on documentation and OpenAI compatibility:
|
|
|
|
1. **Chat Completions** - `/v1/chat/completions` (PRIMARY)
|
|
2. **Embeddings** - `/v1/embeddings`
|
|
3. **Models** - `/v1/models` (likely)
|
|
|
|
For agentic-flow integration, we only need **Chat Completions**.
|
|
|
|
## Authentication
|
|
|
|
### API Key Format
|
|
|
|
```
|
|
Authorization: Bearer requesty-<key>
|
|
```
|
|
|
|
### Key Generation
|
|
|
|
1. Visit https://app.requesty.ai/getting-started
|
|
2. Navigate to API Keys section
|
|
3. Generate new key
|
|
4. Copy key starting with `requesty-`
|
|
|
|
### Environment Variable
|
|
|
|
```bash
|
|
export REQUESTY_API_KEY="requesty-xxxxxxxxxxxxx"
|
|
```
|
|
|
|
### Security Considerations
|
|
|
|
- API keys should be kept secret (never commit to git)
|
|
- Use environment variables or .env files
|
|
- Rotate keys periodically
|
|
- Monitor usage for unauthorized access
|
|
|
|
## Chat Completions Endpoint
|
|
|
|
### Request Schema
|
|
|
|
#### Endpoint
|
|
```
|
|
POST https://router.requesty.ai/v1/chat/completions
|
|
```
|
|
|
|
#### Headers
|
|
```http
|
|
Content-Type: application/json
|
|
Authorization: Bearer requesty-xxxxxxxxxxxxx
|
|
HTTP-Referer: https://github.com/ruvnet/agentic-flow # Optional
|
|
X-Title: Agentic Flow # Optional
|
|
```
|
|
|
|
#### Request Body (OpenAI Format)
|
|
|
|
```json
|
|
{
|
|
"model": "openai/gpt-4o",
|
|
"messages": [
|
|
{
|
|
"role": "system",
|
|
"content": "You are a helpful assistant."
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": "Hello, who are you?"
|
|
}
|
|
],
|
|
"temperature": 0.7,
|
|
"max_tokens": 4096,
|
|
"stream": false,
|
|
"tools": [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_weather",
|
|
"description": "Get weather for a location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string"
|
|
}
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### Available Parameters
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| `model` | string | Yes | Model identifier (e.g., "openai/gpt-4o") |
|
|
| `messages` | array | Yes | Array of message objects |
|
|
| `temperature` | number | No | 0.0-2.0, controls randomness (default: 1.0) |
|
|
| `max_tokens` | number | No | Maximum tokens to generate |
|
|
| `stream` | boolean | No | Enable streaming (default: false) |
|
|
| `tools` | array | No | Function calling tools (OpenAI format) |
|
|
| `tool_choice` | string/object | No | Control tool usage |
|
|
| `top_p` | number | No | Nucleus sampling parameter |
|
|
| `frequency_penalty` | number | No | Reduce repetition (-2.0 to 2.0) |
|
|
| `presence_penalty` | number | No | Encourage new topics (-2.0 to 2.0) |
|
|
| `stop` | string/array | No | Stop sequences |
|
|
| `n` | number | No | Number of completions to generate |
|
|
| `user` | string | No | User identifier for tracking |
|
|
|
|
### Response Schema
|
|
|
|
#### Non-Streaming Response
|
|
|
|
```json
|
|
{
|
|
"id": "chatcmpl-xxxxxxxxxxxxx",
|
|
"object": "chat.completion",
|
|
"created": 1704067200,
|
|
"model": "openai/gpt-4o",
|
|
"choices": [
|
|
{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": "I am an AI assistant created by OpenAI...",
|
|
"tool_calls": [
|
|
{
|
|
"id": "call_xxxxx",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_weather",
|
|
"arguments": "{\"location\": \"San Francisco\"}"
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"finish_reason": "tool_calls"
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 25,
|
|
"completion_tokens": 150,
|
|
"total_tokens": 175
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Streaming Response (SSE)
|
|
|
|
```
|
|
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":null}]}
|
|
|
|
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}
|
|
|
|
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
|
|
|
|
data: [DONE]
|
|
```
|
|
|
|
### Finish Reasons
|
|
|
|
| Reason | Description |
|
|
|--------|-------------|
|
|
| `stop` | Natural completion (model decided to stop) |
|
|
| `length` | Max tokens reached |
|
|
| `tool_calls` | Model wants to call a function |
|
|
| `content_filter` | Content filtered by safety system |
|
|
|
|
## Tool/Function Calling
|
|
|
|
### Format
|
|
|
|
Requesty uses **OpenAI function calling format** (same as OpenRouter).
|
|
|
|
#### Request with Tools
|
|
|
|
```json
|
|
{
|
|
"model": "openai/gpt-4o",
|
|
"messages": [...],
|
|
"tools": [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "Read",
|
|
"description": "Read a file from the filesystem",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"file_path": {
|
|
"type": "string",
|
|
"description": "Absolute path to file"
|
|
}
|
|
},
|
|
"required": ["file_path"]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### Response with Tool Calls
|
|
|
|
```json
|
|
{
|
|
"choices": [
|
|
{
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": null,
|
|
"tool_calls": [
|
|
{
|
|
"id": "call_abc123",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "Read",
|
|
"arguments": "{\"file_path\": \"/workspace/file.txt\"}"
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"finish_reason": "tool_calls"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Tool Calling Support by Model
|
|
|
|
Based on OpenAI compatibility:
|
|
|
|
**Native Support** (confirmed):
|
|
- OpenAI models (GPT-4o, GPT-4-turbo, GPT-3.5-turbo)
|
|
- Anthropic models (Claude 3.5 Sonnet, Claude 3 Opus)
|
|
- Google models (Gemini 2.5 Pro, Gemini 2.5 Flash)
|
|
- DeepSeek models (DeepSeek-V3)
|
|
- Llama 3.3+ models
|
|
- Qwen 2.5+ models
|
|
- Mistral Large models
|
|
|
|
**Requires Emulation** (likely):
|
|
- Older Llama 2 models
|
|
- Smaller models (<7B parameters)
|
|
- Non-instruct base models
|
|
|
|
## Model Naming Convention
|
|
|
|
### Format
|
|
```
|
|
<provider>/<model-name>
|
|
```
|
|
|
|
### Examples
|
|
```
|
|
openai/gpt-4o
|
|
anthropic/claude-3.5-sonnet
|
|
google/gemini-2.5-flash
|
|
deepseek/deepseek-chat
|
|
meta-llama/llama-3.3-70b-instruct
|
|
```
|
|
|
|
### Model Categories
|
|
|
|
| Provider | Example Models | Notes |
|
|
|----------|----------------|-------|
|
|
| OpenAI | `openai/gpt-4o`, `openai/gpt-4-turbo` | Premium, expensive |
|
|
| Anthropic | `anthropic/claude-3.5-sonnet` | High quality, medium cost |
|
|
| Google | `google/gemini-2.5-flash` | Fast, cost-effective |
|
|
| DeepSeek | `deepseek/deepseek-chat` | Cheap, good quality |
|
|
| Meta | `meta-llama/llama-3.3-70b-instruct` | Open source |
|
|
| Qwen | `qwen/qwen-2.5-coder-32b-instruct` | Coding-focused |
|
|
|
|
## Rate Limits
|
|
|
|
### Expected Limits (to be confirmed)
|
|
|
|
Based on typical AI gateway providers:
|
|
|
|
| Tier | Requests/min | Requests/day | Token Limit |
|
|
|------|--------------|--------------|-------------|
|
|
| Free | 20 | 1,000 | 100K tokens/day |
|
|
| Starter | 60 | 10,000 | 1M tokens/day |
|
|
| Pro | 300 | 100,000 | 10M tokens/day |
|
|
| Enterprise | Custom | Custom | Custom |
|
|
|
|
### Rate Limit Headers (expected)
|
|
|
|
```http
|
|
X-RateLimit-Limit: 60
|
|
X-RateLimit-Remaining: 59
|
|
X-RateLimit-Reset: 1704067260
|
|
```
|
|
|
|
### Handling Rate Limits
|
|
|
|
When rate limited, expect:
|
|
```json
|
|
{
|
|
"error": {
|
|
"message": "Rate limit exceeded",
|
|
"type": "rate_limit_error",
|
|
"code": "rate_limit_exceeded"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Implementation Strategy:**
|
|
1. Detect 429 status code
|
|
2. Read `Retry-After` header
|
|
3. Implement exponential backoff
|
|
4. Log rate limit events
|
|
5. Optionally fall back to other providers
|
|
|
|
## Pricing
|
|
|
|
### Cost Structure
|
|
|
|
Based on documentation (80% savings vs Claude):
|
|
|
|
**Estimated Pricing** (per million tokens):
|
|
|
|
| Model Class | Input | Output | vs Claude Savings |
|
|
|-------------|-------|--------|-------------------|
|
|
| GPT-4o | $0.50 | $1.50 | 70% |
|
|
| Claude 3.5 Sonnet | $0.60 | $1.80 | 80% |
|
|
| Gemini 2.5 Flash | FREE | FREE | 100% |
|
|
| DeepSeek Chat | $0.03 | $0.06 | 98% |
|
|
| Llama 3.3 70B | $0.10 | $0.20 | 95% |
|
|
|
|
### Cost Tracking Features
|
|
|
|
Requesty includes:
|
|
- Real-time cost monitoring
|
|
- Per-request cost attribution
|
|
- Monthly spending reports
|
|
- Budget alerts
|
|
- Cost optimization recommendations
|
|
|
|
## Unique Requesty Features
|
|
|
|
### 1. Auto-Routing
|
|
|
|
Requesty can automatically route requests to optimal models based on:
|
|
- Cost constraints
|
|
- Performance requirements
|
|
- Availability
|
|
- Load balancing
|
|
|
|
**API Parameter** (if available):
|
|
```json
|
|
{
|
|
"model": "auto",
|
|
"routing_strategy": "cost_optimized"
|
|
}
|
|
```
|
|
|
|
### 2. Caching
|
|
|
|
Intelligent caching to reduce costs:
|
|
- Semantic similarity matching
|
|
- Configurable TTL
|
|
- Cache hit/miss reporting
|
|
|
|
### 3. Analytics
|
|
|
|
Built-in analytics dashboard:
|
|
- Request volume
|
|
- Token usage
|
|
- Cost breakdown
|
|
- Latency metrics
|
|
- Error rates
|
|
- Model performance comparison
|
|
|
|
### 4. Failover
|
|
|
|
Automatic failover if primary model is unavailable:
|
|
- Model-level failover
|
|
- Provider-level failover
|
|
- Custom fallback chains
|
|
|
|
## Error Handling
|
|
|
|
### Error Response Format
|
|
|
|
```json
|
|
{
|
|
"error": {
|
|
"message": "Invalid API key",
|
|
"type": "authentication_error",
|
|
"code": "invalid_api_key"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Common Errors
|
|
|
|
| Status | Error Type | Description | Recovery |
|
|
|--------|------------|-------------|----------|
|
|
| 401 | `authentication_error` | Invalid API key | Check REQUESTY_API_KEY |
|
|
| 429 | `rate_limit_error` | Too many requests | Retry with backoff |
|
|
| 400 | `invalid_request_error` | Malformed request | Fix request format |
|
|
| 500 | `server_error` | Requesty server issue | Retry or fallback |
|
|
| 503 | `service_unavailable` | Model overloaded | Retry or use different model |
|
|
|
|
## API Comparison Matrix
|
|
|
|
### Requesty vs OpenRouter vs Direct Anthropic
|
|
|
|
| Feature | Requesty | OpenRouter | Anthropic Direct |
|
|
|---------|----------|------------|------------------|
|
|
| **API Format** | OpenAI | OpenAI | Anthropic |
|
|
| **Model Count** | 300+ | 100+ | 5 |
|
|
| **Tool Calling** | OpenAI format | OpenAI format | Anthropic format |
|
|
| **Streaming** | SSE (OpenAI) | SSE (OpenAI) | SSE (Anthropic) |
|
|
| **Base URL** | `router.requesty.ai` | `openrouter.ai` | `api.anthropic.com` |
|
|
| **Auth Header** | `Bearer requesty-*` | `Bearer sk-or-*` | `x-api-key: sk-ant-*` |
|
|
| **Cost Tracking** | Built-in dashboard | Manual tracking | Manual tracking |
|
|
| **Auto-Routing** | Yes | No | N/A |
|
|
| **Caching** | Yes | No | No |
|
|
| **Failover** | Automatic | Manual | Manual |
|
|
|
|
## Integration Compatibility
|
|
|
|
### With Existing agentic-flow Architecture
|
|
|
|
**HIGH COMPATIBILITY** - Requesty is almost identical to OpenRouter:
|
|
|
|
1. **Same API Format** - OpenAI `/chat/completions`
|
|
2. **Same Tool Format** - OpenAI function calling
|
|
3. **Same Streaming** - Server-Sent Events (SSE)
|
|
4. **Same Auth Pattern** - Bearer token in header
|
|
|
|
**Required Changes:**
|
|
- New proxy file: `anthropic-to-requesty.ts`
|
|
- Provider detection: Check for `REQUESTY_API_KEY`
|
|
- Base URL change: `router.requesty.ai` instead of `openrouter.ai`
|
|
- Model naming: Use Requesty model IDs
|
|
|
|
**Reusable from OpenRouter:**
|
|
- Request/response conversion logic (~95% identical)
|
|
- Streaming handler
|
|
- Error handling patterns
|
|
- Tool calling conversion
|
|
- Model capability detection (with new model IDs)
|
|
|
|
## Testing Recommendations
|
|
|
|
### Critical Test Cases
|
|
|
|
1. **Basic Chat** - Simple message without tools
|
|
2. **System Prompt** - Test system message handling
|
|
3. **Tool Calling** - Single tool, multiple tools
|
|
4. **Streaming** - Verify SSE format compatibility
|
|
5. **Error Handling** - Invalid key, rate limits
|
|
6. **Model Override** - Test different model IDs
|
|
7. **Large Context** - Test with long messages
|
|
8. **Concurrent Requests** - Test rate limiting
|
|
|
|
### Suggested Test Models
|
|
|
|
Start with these well-supported models:
|
|
|
|
1. `openai/gpt-4o-mini` - Fast, cheap, reliable
|
|
2. `anthropic/claude-3.5-sonnet` - High quality
|
|
3. `google/gemini-2.5-flash` - Free tier
|
|
4. `deepseek/deepseek-chat` - Cost-optimized
|
|
|
|
## Security Considerations
|
|
|
|
### API Key Protection
|
|
|
|
1. **Never hardcode** - Use environment variables
|
|
2. **Gitignore .env** - Prevent accidental commits
|
|
3. **Rotate regularly** - Change keys periodically
|
|
4. **Monitor usage** - Detect unauthorized access
|
|
5. **Use separate keys** - Dev vs production
|
|
|
|
### Data Privacy
|
|
|
|
1. **Request logging** - Be careful with sensitive data
|
|
2. **Model selection** - Some models may store data
|
|
3. **GDPR compliance** - Check Requesty's policies
|
|
4. **Local vs cloud** - Understand data flow
|
|
|
|
## Open Research Questions
|
|
|
|
### Questions to Answer During Implementation
|
|
|
|
1. **Streaming Format** - Exact SSE event format (confirm matches OpenAI)
|
|
2. **Rate Limits** - Actual limits per tier
|
|
3. **Model List API** - Can we fetch available models programmatically?
|
|
4. **Auto-Routing API** - How to control routing programmatically?
|
|
5. **Cache Control** - Can we control caching per-request?
|
|
6. **Failover Config** - Can we specify fallback chains?
|
|
7. **Analytics API** - Programmatic access to usage data?
|
|
8. **Webhook Support** - Async request notifications?
|
|
9. **Batch API** - Batch processing support?
|
|
10. **Free Tier** - Is there a free tier for testing?
|
|
|
|
## Documentation Gaps
|
|
|
|
### Information Not Found in Public Docs
|
|
|
|
- Exact rate limit values per tier
|
|
- Complete model list with pricing
|
|
- Streaming event format details
|
|
- Auto-routing API parameters
|
|
- Cache control headers
|
|
- Failover configuration
|
|
- Webhook integration
|
|
- Batch processing API
|
|
|
|
### Recommended Actions
|
|
|
|
1. **Email Requesty Support** - Ask for technical docs
|
|
2. **Test in Sandbox** - Create test account
|
|
3. **Monitor Network** - Inspect actual API calls
|
|
4. **Join Discord** - Community knowledge
|
|
5. **Trial Account** - Test features hands-on
|
|
|
|
## Summary for Developers
|
|
|
|
### TL;DR - What You Need to Know
|
|
|
|
1. **Requesty = OpenRouter Clone** - Almost identical API
|
|
2. **Base URL** - `https://router.requesty.ai/v1`
|
|
3. **Auth** - `Authorization: Bearer requesty-*`
|
|
4. **Format** - OpenAI `/chat/completions`
|
|
5. **Tools** - OpenAI function calling format
|
|
6. **Proxy Pattern** - Copy OpenRouter proxy, change URL/key
|
|
7. **Models** - 300+ models, use `<provider>/<model>` format
|
|
8. **Unique Features** - Auto-routing, caching, analytics
|
|
|
|
### Recommended Implementation Strategy
|
|
|
|
**Phase 1:** Clone OpenRouter proxy as starting point
|
|
**Phase 2:** Update base URL and auth header
|
|
**Phase 3:** Add Requesty-specific features (auto-routing, caching)
|
|
**Phase 4:** Test with multiple models
|
|
**Phase 5:** Add to model optimizer
|
|
|
|
### Estimated Compatibility
|
|
|
|
| Component | Compatibility | Effort |
|
|
|-----------|---------------|--------|
|
|
| API Format | 99% | Minimal |
|
|
| Tool Calling | 100% | None |
|
|
| Streaming | 95% | Minor testing |
|
|
| Error Handling | 90% | Add new error codes |
|
|
| Model Detection | 0% | New model IDs needed |
|
|
| Proxy Architecture | 100% | Copy OpenRouter |
|
|
|
|
**Total Estimated Effort:** 3-4 hours for core implementation
|