15 KiB
Requesty.ai API Research
API Overview
Base Information
| Property | Value |
|---|---|
| Base URL | https://router.requesty.ai/v1 |
| API Version | v1 |
| Protocol | HTTPS REST |
| Authentication | Bearer token (API key) |
| Request Format | JSON |
| Response Format | JSON |
| Compatibility | OpenAI SDK drop-in replacement |
API Endpoints
Based on documentation and OpenAI compatibility:
- Chat Completions -
/v1/chat/completions(PRIMARY) - Embeddings -
/v1/embeddings - Models -
/v1/models(likely)
For agentic-flow integration, we only need Chat Completions.
Authentication
API Key Format
Authorization: Bearer requesty-<key>
Key Generation
- Visit https://app.requesty.ai/getting-started
- Navigate to API Keys section
- Generate new key
- Copy key starting with
requesty-
Environment Variable
export REQUESTY_API_KEY="requesty-xxxxxxxxxxxxx"
Security Considerations
- API keys should be kept secret (never commit to git)
- Use environment variables or .env files
- Rotate keys periodically
- Monitor usage for unauthorized access
Chat Completions Endpoint
Request Schema
Endpoint
POST https://router.requesty.ai/v1/chat/completions
Headers
Content-Type: application/json
Authorization: Bearer requesty-xxxxxxxxxxxxx
HTTP-Referer: https://github.com/ruvnet/agentic-flow # Optional
X-Title: Agentic Flow # Optional
Request Body (OpenAI Format)
{
"model": "openai/gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, who are you?"
}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string"
}
},
"required": ["location"]
}
}
}
]
}
Available Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier (e.g., "openai/gpt-4o") |
messages |
array | Yes | Array of message objects |
temperature |
number | No | 0.0-2.0, controls randomness (default: 1.0) |
max_tokens |
number | No | Maximum tokens to generate |
stream |
boolean | No | Enable streaming (default: false) |
tools |
array | No | Function calling tools (OpenAI format) |
tool_choice |
string/object | No | Control tool usage |
top_p |
number | No | Nucleus sampling parameter |
frequency_penalty |
number | No | Reduce repetition (-2.0 to 2.0) |
presence_penalty |
number | No | Encourage new topics (-2.0 to 2.0) |
stop |
string/array | No | Stop sequences |
n |
number | No | Number of completions to generate |
user |
string | No | User identifier for tracking |
Response Schema
Non-Streaming Response
{
"id": "chatcmpl-xxxxxxxxxxxxx",
"object": "chat.completion",
"created": 1704067200,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I am an AI assistant created by OpenAI...",
"tool_calls": [
{
"id": "call_xxxxx",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
Streaming Response (SSE)
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Finish Reasons
| Reason | Description |
|---|---|
stop |
Natural completion (model decided to stop) |
length |
Max tokens reached |
tool_calls |
Model wants to call a function |
content_filter |
Content filtered by safety system |
Tool/Function Calling
Format
Requesty uses OpenAI function calling format (same as OpenRouter).
Request with Tools
{
"model": "openai/gpt-4o",
"messages": [...],
"tools": [
{
"type": "function",
"function": {
"name": "Read",
"description": "Read a file from the filesystem",
"parameters": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to file"
}
},
"required": ["file_path"]
}
}
}
]
}
Response with Tool Calls
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "Read",
"arguments": "{\"file_path\": \"/workspace/file.txt\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}
Tool Calling Support by Model
Based on OpenAI compatibility:
Native Support (confirmed):
- OpenAI models (GPT-4o, GPT-4-turbo, GPT-3.5-turbo)
- Anthropic models (Claude 3.5 Sonnet, Claude 3 Opus)
- Google models (Gemini 2.5 Pro, Gemini 2.5 Flash)
- DeepSeek models (DeepSeek-V3)
- Llama 3.3+ models
- Qwen 2.5+ models
- Mistral Large models
Requires Emulation (likely):
- Older Llama 2 models
- Smaller models (<7B parameters)
- Non-instruct base models
Model Naming Convention
Format
<provider>/<model-name>
Examples
openai/gpt-4o
anthropic/claude-3.5-sonnet
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-3.3-70b-instruct
Model Categories
| Provider | Example Models | Notes |
|---|---|---|
| OpenAI | openai/gpt-4o, openai/gpt-4-turbo |
Premium, expensive |
| Anthropic | anthropic/claude-3.5-sonnet |
High quality, medium cost |
google/gemini-2.5-flash |
Fast, cost-effective | |
| DeepSeek | deepseek/deepseek-chat |
Cheap, good quality |
| Meta | meta-llama/llama-3.3-70b-instruct |
Open source |
| Qwen | qwen/qwen-2.5-coder-32b-instruct |
Coding-focused |
Rate Limits
Expected Limits (to be confirmed)
Based on typical AI gateway providers:
| Tier | Requests/min | Requests/day | Token Limit |
|---|---|---|---|
| Free | 20 | 1,000 | 100K tokens/day |
| Starter | 60 | 10,000 | 1M tokens/day |
| Pro | 300 | 100,000 | 10M tokens/day |
| Enterprise | Custom | Custom | Custom |
Rate Limit Headers (expected)
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1704067260
Handling Rate Limits
When rate limited, expect:
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
Implementation Strategy:
- Detect 429 status code
- Read
Retry-Afterheader - Implement exponential backoff
- Log rate limit events
- Optionally fall back to other providers
Pricing
Cost Structure
Based on documentation (80% savings vs Claude):
Estimated Pricing (per million tokens):
| Model Class | Input | Output | vs Claude Savings |
|---|---|---|---|
| GPT-4o | $0.50 | $1.50 | 70% |
| Claude 3.5 Sonnet | $0.60 | $1.80 | 80% |
| Gemini 2.5 Flash | FREE | FREE | 100% |
| DeepSeek Chat | $0.03 | $0.06 | 98% |
| Llama 3.3 70B | $0.10 | $0.20 | 95% |
Cost Tracking Features
Requesty includes:
- Real-time cost monitoring
- Per-request cost attribution
- Monthly spending reports
- Budget alerts
- Cost optimization recommendations
Unique Requesty Features
1. Auto-Routing
Requesty can automatically route requests to optimal models based on:
- Cost constraints
- Performance requirements
- Availability
- Load balancing
API Parameter (if available):
{
"model": "auto",
"routing_strategy": "cost_optimized"
}
2. Caching
Intelligent caching to reduce costs:
- Semantic similarity matching
- Configurable TTL
- Cache hit/miss reporting
3. Analytics
Built-in analytics dashboard:
- Request volume
- Token usage
- Cost breakdown
- Latency metrics
- Error rates
- Model performance comparison
4. Failover
Automatic failover if primary model is unavailable:
- Model-level failover
- Provider-level failover
- Custom fallback chains
Error Handling
Error Response Format
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Common Errors
| Status | Error Type | Description | Recovery |
|---|---|---|---|
| 401 | authentication_error |
Invalid API key | Check REQUESTY_API_KEY |
| 429 | rate_limit_error |
Too many requests | Retry with backoff |
| 400 | invalid_request_error |
Malformed request | Fix request format |
| 500 | server_error |
Requesty server issue | Retry or fallback |
| 503 | service_unavailable |
Model overloaded | Retry or use different model |
API Comparison Matrix
Requesty vs OpenRouter vs Direct Anthropic
| Feature | Requesty | OpenRouter | Anthropic Direct |
|---|---|---|---|
| API Format | OpenAI | OpenAI | Anthropic |
| Model Count | 300+ | 100+ | 5 |
| Tool Calling | OpenAI format | OpenAI format | Anthropic format |
| Streaming | SSE (OpenAI) | SSE (OpenAI) | SSE (Anthropic) |
| Base URL | router.requesty.ai |
openrouter.ai |
api.anthropic.com |
| Auth Header | Bearer requesty-* |
Bearer sk-or-* |
x-api-key: sk-ant-* |
| Cost Tracking | Built-in dashboard | Manual tracking | Manual tracking |
| Auto-Routing | Yes | No | N/A |
| Caching | Yes | No | No |
| Failover | Automatic | Manual | Manual |
Integration Compatibility
With Existing agentic-flow Architecture
HIGH COMPATIBILITY - Requesty is almost identical to OpenRouter:
- Same API Format - OpenAI
/chat/completions - Same Tool Format - OpenAI function calling
- Same Streaming - Server-Sent Events (SSE)
- Same Auth Pattern - Bearer token in header
Required Changes:
- New proxy file:
anthropic-to-requesty.ts - Provider detection: Check for
REQUESTY_API_KEY - Base URL change:
router.requesty.aiinstead ofopenrouter.ai - Model naming: Use Requesty model IDs
Reusable from OpenRouter:
- Request/response conversion logic (~95% identical)
- Streaming handler
- Error handling patterns
- Tool calling conversion
- Model capability detection (with new model IDs)
Testing Recommendations
Critical Test Cases
- Basic Chat - Simple message without tools
- System Prompt - Test system message handling
- Tool Calling - Single tool, multiple tools
- Streaming - Verify SSE format compatibility
- Error Handling - Invalid key, rate limits
- Model Override - Test different model IDs
- Large Context - Test with long messages
- Concurrent Requests - Test rate limiting
Suggested Test Models
Start with these well-supported models:
openai/gpt-4o-mini- Fast, cheap, reliableanthropic/claude-3.5-sonnet- High qualitygoogle/gemini-2.5-flash- Free tierdeepseek/deepseek-chat- Cost-optimized
Security Considerations
API Key Protection
- Never hardcode - Use environment variables
- Gitignore .env - Prevent accidental commits
- Rotate regularly - Change keys periodically
- Monitor usage - Detect unauthorized access
- Use separate keys - Dev vs production
Data Privacy
- Request logging - Be careful with sensitive data
- Model selection - Some models may store data
- GDPR compliance - Check Requesty's policies
- Local vs cloud - Understand data flow
Open Research Questions
Questions to Answer During Implementation
- Streaming Format - Exact SSE event format (confirm matches OpenAI)
- Rate Limits - Actual limits per tier
- Model List API - Can we fetch available models programmatically?
- Auto-Routing API - How to control routing programmatically?
- Cache Control - Can we control caching per-request?
- Failover Config - Can we specify fallback chains?
- Analytics API - Programmatic access to usage data?
- Webhook Support - Async request notifications?
- Batch API - Batch processing support?
- Free Tier - Is there a free tier for testing?
Documentation Gaps
Information Not Found in Public Docs
- Exact rate limit values per tier
- Complete model list with pricing
- Streaming event format details
- Auto-routing API parameters
- Cache control headers
- Failover configuration
- Webhook integration
- Batch processing API
Recommended Actions
- Email Requesty Support - Ask for technical docs
- Test in Sandbox - Create test account
- Monitor Network - Inspect actual API calls
- Join Discord - Community knowledge
- Trial Account - Test features hands-on
Summary for Developers
TL;DR - What You Need to Know
- Requesty = OpenRouter Clone - Almost identical API
- Base URL -
https://router.requesty.ai/v1 - Auth -
Authorization: Bearer requesty-* - Format - OpenAI
/chat/completions - Tools - OpenAI function calling format
- Proxy Pattern - Copy OpenRouter proxy, change URL/key
- Models - 300+ models, use
<provider>/<model>format - Unique Features - Auto-routing, caching, analytics
Recommended Implementation Strategy
Phase 1: Clone OpenRouter proxy as starting point Phase 2: Update base URL and auth header Phase 3: Add Requesty-specific features (auto-routing, caching) Phase 4: Test with multiple models Phase 5: Add to model optimizer
Estimated Compatibility
| Component | Compatibility | Effort |
|---|---|---|
| API Format | 99% | Minimal |
| Tool Calling | 100% | None |
| Streaming | 95% | Minor testing |
| Error Handling | 90% | Add new error codes |
| Model Detection | 0% | New model IDs needed |
| Proxy Architecture | 100% | Copy OpenRouter |
Total Estimated Effort: 3-4 hours for core implementation