tasq/node_modules/agentic-flow/docs/plans/requesty/01-api-research.md

# Requesty.ai API Research

## API Overview

### Base Information

| Property | Value |
|----------|-------|
| **Base URL** | `https://router.requesty.ai/v1` |
| **API Version** | v1 |
| **Protocol** | HTTPS REST |
| **Authentication** | Bearer token (API key) |
| **Request Format** | JSON |
| **Response Format** | JSON |
| **Compatibility** | OpenAI SDK drop-in replacement |

### API Endpoints

Based on documentation and OpenAI compatibility:

1. **Chat Completions** - `/v1/chat/completions` (PRIMARY)
2. **Embeddings** - `/v1/embeddings`
3. **Models** - `/v1/models` (likely)

For agentic-flow integration, we only need **Chat Completions**.

## Authentication

### API Key Format

```
Authorization: Bearer requesty-<key>
```

### Key Generation

1. Visit https://app.requesty.ai/getting-started
2. Navigate to API Keys section
3. Generate new key
4. Copy key starting with `requesty-`

### Environment Variable

```bash
export REQUESTY_API_KEY="requesty-xxxxxxxxxxxxx"
```

### Security Considerations

- API keys should be kept secret (never commit to git)
- Use environment variables or .env files
- Rotate keys periodically
- Monitor usage for unauthorized access

## Chat Completions Endpoint

### Request Schema

#### Endpoint
```
POST https://router.requesty.ai/v1/chat/completions
```

#### Headers
```http
Content-Type: application/json
Authorization: Bearer requesty-xxxxxxxxxxxxx
HTTP-Referer: https://github.com/ruvnet/agentic-flow  # Optional
X-Title: Agentic Flow                                  # Optional
```

#### Request Body (OpenAI Format)

```json
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, who are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}
```

#### Available Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model` | string | Yes | Model identifier (e.g., "openai/gpt-4o") |
| `messages` | array | Yes | Array of message objects |
| `temperature` | number | No | 0.0-2.0, controls randomness (default: 1.0) |
| `max_tokens` | number | No | Maximum tokens to generate |
| `stream` | boolean | No | Enable streaming (default: false) |
| `tools` | array | No | Function calling tools (OpenAI format) |
| `tool_choice` | string/object | No | Control tool usage |
| `top_p` | number | No | Nucleus sampling parameter |
| `frequency_penalty` | number | No | Reduce repetition (-2.0 to 2.0) |
| `presence_penalty` | number | No | Encourage new topics (-2.0 to 2.0) |
| `stop` | string/array | No | Stop sequences |
| `n` | number | No | Number of completions to generate |
| `user` | string | No | User identifier for tracking |

### Response Schema

#### Non-Streaming Response

```json
{
  "id": "chatcmpl-xxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I am an AI assistant created by OpenAI...",
        "tool_calls": [
          {
            "id": "call_xxxxx",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}
```

#### Streaming Response (SSE)

```
data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxxxx","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
```

### Finish Reasons

| Reason | Description |
|--------|-------------|
| `stop` | Natural completion (model decided to stop) |
| `length` | Max tokens reached |
| `tool_calls` | Model wants to call a function |
| `content_filter` | Content filtered by safety system |

## Tool/Function Calling

### Format

Requesty uses **OpenAI function calling format** (same as OpenRouter).

#### Request with Tools

```json
{
  "model": "openai/gpt-4o",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "Read",
        "description": "Read a file from the filesystem",
        "parameters": {
          "type": "object",
          "properties": {
            "file_path": {
              "type": "string",
              "description": "Absolute path to file"
            }
          },
          "required": ["file_path"]
        }
      }
    }
  ]
}
```

#### Response with Tool Calls

```json
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "Read",
              "arguments": "{\"file_path\": \"/workspace/file.txt\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
```

### Tool Calling Support by Model

Based on OpenAI compatibility:

**Native Support** (confirmed):
- OpenAI models (GPT-4o, GPT-4-turbo, GPT-3.5-turbo)
- Anthropic models (Claude 3.5 Sonnet, Claude 3 Opus)
- Google models (Gemini 2.5 Pro, Gemini 2.5 Flash)
- DeepSeek models (DeepSeek-V3)
- Llama 3.3+ models
- Qwen 2.5+ models
- Mistral Large models

**Requires Emulation** (likely):
- Older Llama 2 models
- Smaller models (<7B parameters)
- Non-instruct base models

## Model Naming Convention

### Format
```
<provider>/<model-name>
```

### Examples
```
openai/gpt-4o
anthropic/claude-3.5-sonnet
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-3.3-70b-instruct
```

### Model Categories

| Provider | Example Models | Notes |
|----------|----------------|-------|
| OpenAI | `openai/gpt-4o`, `openai/gpt-4-turbo` | Premium, expensive |
| Anthropic | `anthropic/claude-3.5-sonnet` | High quality, medium cost |
| Google | `google/gemini-2.5-flash` | Fast, cost-effective |
| DeepSeek | `deepseek/deepseek-chat` | Cheap, good quality |
| Meta | `meta-llama/llama-3.3-70b-instruct` | Open source |
| Qwen | `qwen/qwen-2.5-coder-32b-instruct` | Coding-focused |

## Rate Limits

### Expected Limits (to be confirmed)

Based on typical AI gateway providers:

| Tier | Requests/min | Requests/day | Token Limit |
|------|--------------|--------------|-------------|
| Free | 20 | 1,000 | 100K tokens/day |
| Starter | 60 | 10,000 | 1M tokens/day |
| Pro | 300 | 100,000 | 10M tokens/day |
| Enterprise | Custom | Custom | Custom |

### Rate Limit Headers (expected)

```http
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1704067260
```

### Handling Rate Limits

When rate limited, expect:
```json
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
```

**Implementation Strategy:**
1. Detect 429 status code
2. Read `Retry-After` header
3. Implement exponential backoff
4. Log rate limit events
5. Optionally fall back to other providers

## Pricing

### Cost Structure

Based on documentation (80% savings vs Claude):

**Estimated Pricing** (per million tokens):

| Model Class | Input | Output | vs Claude Savings |
|-------------|-------|--------|-------------------|
| GPT-4o | $0.50 | $1.50 | 70% |
| Claude 3.5 Sonnet | $0.60 | $1.80 | 80% |
| Gemini 2.5 Flash | FREE | FREE | 100% |
| DeepSeek Chat | $0.03 | $0.06 | 98% |
| Llama 3.3 70B | $0.10 | $0.20 | 95% |

### Cost Tracking Features

Requesty includes:
- Real-time cost monitoring
- Per-request cost attribution
- Monthly spending reports
- Budget alerts
- Cost optimization recommendations

## Unique Requesty Features

### 1. Auto-Routing

Requesty can automatically route requests to optimal models based on:
- Cost constraints
- Performance requirements
- Availability
- Load balancing

**API Parameter** (if available):
```json
{
  "model": "auto",
  "routing_strategy": "cost_optimized"
}
```

### 2. Caching

Intelligent caching to reduce costs:
- Semantic similarity matching
- Configurable TTL
- Cache hit/miss reporting

### 3. Analytics

Built-in analytics dashboard:
- Request volume
- Token usage
- Cost breakdown
- Latency metrics
- Error rates
- Model performance comparison

### 4. Failover

Automatic failover if primary model is unavailable:
- Model-level failover
- Provider-level failover
- Custom fallback chains

## Error Handling

### Error Response Format

```json
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}
```

### Common Errors

| Status | Error Type | Description | Recovery |
|--------|------------|-------------|----------|
| 401 | `authentication_error` | Invalid API key | Check REQUESTY_API_KEY |
| 429 | `rate_limit_error` | Too many requests | Retry with backoff |
| 400 | `invalid_request_error` | Malformed request | Fix request format |
| 500 | `server_error` | Requesty server issue | Retry or fallback |
| 503 | `service_unavailable` | Model overloaded | Retry or use different model |

## API Comparison Matrix

### Requesty vs OpenRouter vs Direct Anthropic

| Feature | Requesty | OpenRouter | Anthropic Direct |
|---------|----------|------------|------------------|
| **API Format** | OpenAI | OpenAI | Anthropic |
| **Model Count** | 300+ | 100+ | 5 |
| **Tool Calling** | OpenAI format | OpenAI format | Anthropic format |
| **Streaming** | SSE (OpenAI) | SSE (OpenAI) | SSE (Anthropic) |
| **Base URL** | `router.requesty.ai` | `openrouter.ai` | `api.anthropic.com` |
| **Auth Header** | `Bearer requesty-*` | `Bearer sk-or-*` | `x-api-key: sk-ant-*` |
| **Cost Tracking** | Built-in dashboard | Manual tracking | Manual tracking |
| **Auto-Routing** | Yes | No | N/A |
| **Caching** | Yes | No | No |
| **Failover** | Automatic | Manual | Manual |

## Integration Compatibility

### With Existing agentic-flow Architecture

**HIGH COMPATIBILITY** - Requesty is almost identical to OpenRouter:

1. **Same API Format** - OpenAI `/chat/completions`
2. **Same Tool Format** - OpenAI function calling
3. **Same Streaming** - Server-Sent Events (SSE)
4. **Same Auth Pattern** - Bearer token in header

**Required Changes:**
- New proxy file: `anthropic-to-requesty.ts`
- Provider detection: Check for `REQUESTY_API_KEY`
- Base URL change: `router.requesty.ai` instead of `openrouter.ai`
- Model naming: Use Requesty model IDs

**Reusable from OpenRouter:**
- Request/response conversion logic (~95% identical)
- Streaming handler
- Error handling patterns
- Tool calling conversion
- Model capability detection (with new model IDs)

## Testing Recommendations

### Critical Test Cases

1. **Basic Chat** - Simple message without tools
2. **System Prompt** - Test system message handling
3. **Tool Calling** - Single tool, multiple tools
4. **Streaming** - Verify SSE format compatibility
5. **Error Handling** - Invalid key, rate limits
6. **Model Override** - Test different model IDs
7. **Large Context** - Test with long messages
8. **Concurrent Requests** - Test rate limiting

### Suggested Test Models

Start with these well-supported models:

1. `openai/gpt-4o-mini` - Fast, cheap, reliable
2. `anthropic/claude-3.5-sonnet` - High quality
3. `google/gemini-2.5-flash` - Free tier
4. `deepseek/deepseek-chat` - Cost-optimized

## Security Considerations

### API Key Protection

1. **Never hardcode** - Use environment variables
2. **Gitignore .env** - Prevent accidental commits
3. **Rotate regularly** - Change keys periodically
4. **Monitor usage** - Detect unauthorized access
5. **Use separate keys** - Dev vs production

### Data Privacy

1. **Request logging** - Be careful with sensitive data
2. **Model selection** - Some models may store data
3. **GDPR compliance** - Check Requesty's policies
4. **Local vs cloud** - Understand data flow

## Open Research Questions

### Questions to Answer During Implementation

1. **Streaming Format** - Exact SSE event format (confirm matches OpenAI)
2. **Rate Limits** - Actual limits per tier
3. **Model List API** - Can we fetch available models programmatically?
4. **Auto-Routing API** - How to control routing programmatically?
5. **Cache Control** - Can we control caching per-request?
6. **Failover Config** - Can we specify fallback chains?
7. **Analytics API** - Programmatic access to usage data?
8. **Webhook Support** - Async request notifications?
9. **Batch API** - Batch processing support?
10. **Free Tier** - Is there a free tier for testing?

## Documentation Gaps

### Information Not Found in Public Docs

- Exact rate limit values per tier
- Complete model list with pricing
- Streaming event format details
- Auto-routing API parameters
- Cache control headers
- Failover configuration
- Webhook integration
- Batch processing API

### Recommended Actions

1. **Email Requesty Support** - Ask for technical docs
2. **Test in Sandbox** - Create test account
3. **Monitor Network** - Inspect actual API calls
4. **Join Discord** - Community knowledge
5. **Trial Account** - Test features hands-on

## Summary for Developers

### TL;DR - What You Need to Know

1. **Requesty = OpenRouter Clone** - Almost identical API
2. **Base URL** - `https://router.requesty.ai/v1`
3. **Auth** - `Authorization: Bearer requesty-*`
4. **Format** - OpenAI `/chat/completions`
5. **Tools** - OpenAI function calling format
6. **Proxy Pattern** - Copy OpenRouter proxy, change URL/key
7. **Models** - 300+ models, use `<provider>/<model>` format
8. **Unique Features** - Auto-routing, caching, analytics

### Recommended Implementation Strategy

**Phase 1:** Clone OpenRouter proxy as starting point
**Phase 2:** Update base URL and auth header
**Phase 3:** Add Requesty-specific features (auto-routing, caching)
**Phase 4:** Test with multiple models
**Phase 5:** Add to model optimizer

### Estimated Compatibility

| Component | Compatibility | Effort |
|-----------|---------------|--------|
| API Format | 99% | Minimal |
| Tool Calling | 100% | None |
| Streaming | 95% | Minor testing |
| Error Handling | 90% | Add new error codes |
| Model Detection | 0% | New model IDs needed |
| Proxy Architecture | 100% | Copy OpenRouter |

**Total Estimated Effort:** 3-4 hours for core implementation