tasq/node_modules/agentic-flow/docs/plans/requesty/01-api-research.md

15 KiB

Requesty.ai API Research

API Overview

Base Information

Property Value
Base URL https://router.requesty.ai/v1
API Version v1
Protocol HTTPS REST
Authentication Bearer token (API key)
Request Format JSON
Response Format JSON
Compatibility OpenAI SDK drop-in replacement

API Endpoints

Based on documentation and OpenAI compatibility:

  1. Chat Completions - /v1/chat/completions (PRIMARY)
  2. Embeddings - /v1/embeddings
  3. Models - /v1/models (likely)

For agentic-flow integration, we only need Chat Completions.

Authentication

API Key Format

Authorization: Bearer requesty-<key>

Key Generation

  1. Visit https://app.requesty.ai/getting-started
  2. Navigate to API Keys section
  3. Generate new key
  4. Copy key starting with requesty-

Environment Variable

export REQUESTY_API_KEY="requesty-xxxxxxxxxxxxx"

Security Considerations

  • API keys should be kept secret (never commit to git)
  • Use environment variables or .env files
  • Rotate keys periodically
  • Monitor usage for unauthorized access

Chat Completions Endpoint

Request Schema

Endpoint

POST https://router.requesty.ai/v1/chat/completions

Headers

Content-Type: application/json
Authorization: Bearer requesty-xxxxxxxxxxxxx
HTTP-Referer: https://github.com/ruvnet/agentic-flow  # Optional
X-Title: Agentic Flow                                  # Optional

Request Body (OpenAI Format)

{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, who are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Available Parameters

Parameter Type Required Description
model string Yes Model identifier (e.g., "openai/gpt-4o")
messages array Yes Array of message objects
temperature number No 0.0-2.0, controls randomness (default: 1.0)
max_tokens number No Maximum tokens to generate
stream boolean No Enable streaming (default: false)
tools array No Function calling tools (OpenAI format)
tool_choice string/object No Control tool usage
top_p number No Nucleus sampling parameter
frequency_penalty number No Reduce repetition (-2.0 to 2.0)
presence_penalty number No Encourage new topics (-2.0 to 2.0)
stop string/array No Stop sequences
n number No Number of completions to generate
user string No User identifier for tracking

Response Schema

Non-Streaming Response

{
  "id": "chatcmpl-xxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I am an AI assistant created by OpenAI...",
        "tool_calls": [
          {
            "id": "call_xxxxx",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming Response (SSE)

data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Finish Reasons

Reason Description
stop Natural completion (model decided to stop)
length Max tokens reached
tool_calls Model wants to call a function
content_filter Content filtered by safety system

Tool/Function Calling

Format

Requesty uses OpenAI function calling format (same as OpenRouter).

Request with Tools

{
  "model": "openai/gpt-4o",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "Read",
        "description": "Read a file from the filesystem",
        "parameters": {
          "type": "object",
          "properties": {
            "file_path": {
              "type": "string",
              "description": "Absolute path to file"
            }
          },
          "required": ["file_path"]
        }
      }
    }
  ]
}

Response with Tool Calls

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "Read",
              "arguments": "{\"file_path\": \"/workspace/file.txt\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Tool Calling Support by Model

Based on OpenAI compatibility:

Native Support (confirmed):

  • OpenAI models (GPT-4o, GPT-4-turbo, GPT-3.5-turbo)
  • Anthropic models (Claude 3.5 Sonnet, Claude 3 Opus)
  • Google models (Gemini 2.5 Pro, Gemini 2.5 Flash)
  • DeepSeek models (DeepSeek-V3)
  • Llama 3.3+ models
  • Qwen 2.5+ models
  • Mistral Large models

Requires Emulation (likely):

  • Older Llama 2 models
  • Smaller models (<7B parameters)
  • Non-instruct base models

Model Naming Convention

Format

<provider>/<model-name>

Examples

openai/gpt-4o
anthropic/claude-3.5-sonnet
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-3.3-70b-instruct

Model Categories

Provider Example Models Notes
OpenAI openai/gpt-4o, openai/gpt-4-turbo Premium, expensive
Anthropic anthropic/claude-3.5-sonnet High quality, medium cost
Google google/gemini-2.5-flash Fast, cost-effective
DeepSeek deepseek/deepseek-chat Cheap, good quality
Meta meta-llama/llama-3.3-70b-instruct Open source
Qwen qwen/qwen-2.5-coder-32b-instruct Coding-focused

Rate Limits

Expected Limits (to be confirmed)

Based on typical AI gateway providers:

Tier Requests/min Requests/day Token Limit
Free 20 1,000 100K tokens/day
Starter 60 10,000 1M tokens/day
Pro 300 100,000 10M tokens/day
Enterprise Custom Custom Custom

Rate Limit Headers (expected)

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1704067260

Handling Rate Limits

When rate limited, expect:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Implementation Strategy:

  1. Detect 429 status code
  2. Read Retry-After header
  3. Implement exponential backoff
  4. Log rate limit events
  5. Optionally fall back to other providers

Pricing

Cost Structure

Based on documentation (80% savings vs Claude):

Estimated Pricing (per million tokens):

Model Class Input Output vs Claude Savings
GPT-4o $0.50 $1.50 70%
Claude 3.5 Sonnet $0.60 $1.80 80%
Gemini 2.5 Flash FREE FREE 100%
DeepSeek Chat $0.03 $0.06 98%
Llama 3.3 70B $0.10 $0.20 95%

Cost Tracking Features

Requesty includes:

  • Real-time cost monitoring
  • Per-request cost attribution
  • Monthly spending reports
  • Budget alerts
  • Cost optimization recommendations

Unique Requesty Features

1. Auto-Routing

Requesty can automatically route requests to optimal models based on:

  • Cost constraints
  • Performance requirements
  • Availability
  • Load balancing

API Parameter (if available):

{
  "model": "auto",
  "routing_strategy": "cost_optimized"
}

2. Caching

Intelligent caching to reduce costs:

  • Semantic similarity matching
  • Configurable TTL
  • Cache hit/miss reporting

3. Analytics

Built-in analytics dashboard:

  • Request volume
  • Token usage
  • Cost breakdown
  • Latency metrics
  • Error rates
  • Model performance comparison

4. Failover

Automatic failover if primary model is unavailable:

  • Model-level failover
  • Provider-level failover
  • Custom fallback chains

Error Handling

Error Response Format

{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Common Errors

Status Error Type Description Recovery
401 authentication_error Invalid API key Check REQUESTY_API_KEY
429 rate_limit_error Too many requests Retry with backoff
400 invalid_request_error Malformed request Fix request format
500 server_error Requesty server issue Retry or fallback
503 service_unavailable Model overloaded Retry or use different model

API Comparison Matrix

Requesty vs OpenRouter vs Direct Anthropic

Feature Requesty OpenRouter Anthropic Direct
API Format OpenAI OpenAI Anthropic
Model Count 300+ 100+ 5
Tool Calling OpenAI format OpenAI format Anthropic format
Streaming SSE (OpenAI) SSE (OpenAI) SSE (Anthropic)
Base URL router.requesty.ai openrouter.ai api.anthropic.com
Auth Header Bearer requesty-* Bearer sk-or-* x-api-key: sk-ant-*
Cost Tracking Built-in dashboard Manual tracking Manual tracking
Auto-Routing Yes No N/A
Caching Yes No No
Failover Automatic Manual Manual

Integration Compatibility

With Existing agentic-flow Architecture

HIGH COMPATIBILITY - Requesty is almost identical to OpenRouter:

  1. Same API Format - OpenAI /chat/completions
  2. Same Tool Format - OpenAI function calling
  3. Same Streaming - Server-Sent Events (SSE)
  4. Same Auth Pattern - Bearer token in header

Required Changes:

  • New proxy file: anthropic-to-requesty.ts
  • Provider detection: Check for REQUESTY_API_KEY
  • Base URL change: router.requesty.ai instead of openrouter.ai
  • Model naming: Use Requesty model IDs

Reusable from OpenRouter:

  • Request/response conversion logic (~95% identical)
  • Streaming handler
  • Error handling patterns
  • Tool calling conversion
  • Model capability detection (with new model IDs)

Testing Recommendations

Critical Test Cases

  1. Basic Chat - Simple message without tools
  2. System Prompt - Test system message handling
  3. Tool Calling - Single tool, multiple tools
  4. Streaming - Verify SSE format compatibility
  5. Error Handling - Invalid key, rate limits
  6. Model Override - Test different model IDs
  7. Large Context - Test with long messages
  8. Concurrent Requests - Test rate limiting

Suggested Test Models

Start with these well-supported models:

  1. openai/gpt-4o-mini - Fast, cheap, reliable
  2. anthropic/claude-3.5-sonnet - High quality
  3. google/gemini-2.5-flash - Free tier
  4. deepseek/deepseek-chat - Cost-optimized

Security Considerations

API Key Protection

  1. Never hardcode - Use environment variables
  2. Gitignore .env - Prevent accidental commits
  3. Rotate regularly - Change keys periodically
  4. Monitor usage - Detect unauthorized access
  5. Use separate keys - Dev vs production

Data Privacy

  1. Request logging - Be careful with sensitive data
  2. Model selection - Some models may store data
  3. GDPR compliance - Check Requesty's policies
  4. Local vs cloud - Understand data flow

Open Research Questions

Questions to Answer During Implementation

  1. Streaming Format - Exact SSE event format (confirm matches OpenAI)
  2. Rate Limits - Actual limits per tier
  3. Model List API - Can we fetch available models programmatically?
  4. Auto-Routing API - How to control routing programmatically?
  5. Cache Control - Can we control caching per-request?
  6. Failover Config - Can we specify fallback chains?
  7. Analytics API - Programmatic access to usage data?
  8. Webhook Support - Async request notifications?
  9. Batch API - Batch processing support?
  10. Free Tier - Is there a free tier for testing?

Documentation Gaps

Information Not Found in Public Docs

  • Exact rate limit values per tier
  • Complete model list with pricing
  • Streaming event format details
  • Auto-routing API parameters
  • Cache control headers
  • Failover configuration
  • Webhook integration
  • Batch processing API
  1. Email Requesty Support - Ask for technical docs
  2. Test in Sandbox - Create test account
  3. Monitor Network - Inspect actual API calls
  4. Join Discord - Community knowledge
  5. Trial Account - Test features hands-on

Summary for Developers

TL;DR - What You Need to Know

  1. Requesty = OpenRouter Clone - Almost identical API
  2. Base URL - https://router.requesty.ai/v1
  3. Auth - Authorization: Bearer requesty-*
  4. Format - OpenAI /chat/completions
  5. Tools - OpenAI function calling format
  6. Proxy Pattern - Copy OpenRouter proxy, change URL/key
  7. Models - 300+ models, use <provider>/<model> format
  8. Unique Features - Auto-routing, caching, analytics

Phase 1: Clone OpenRouter proxy as starting point Phase 2: Update base URL and auth header Phase 3: Add Requesty-specific features (auto-routing, caching) Phase 4: Test with multiple models Phase 5: Add to model optimizer

Estimated Compatibility

Component Compatibility Effort
API Format 99% Minimal
Tool Calling 100% None
Streaming 95% Minor testing
Error Handling 90% Add new error codes
Model Detection 0% New model IDs needed
Proxy Architecture 100% Copy OpenRouter

Total Estimated Effort: 3-4 hours for core implementation