tasq/node_modules/agentic-flow/docs/guides/OPENROUTER_DEPLOYMENT.md

496 lines
14 KiB
Markdown

# OpenRouter Deployment Guide
Complete guide for deploying Agentic Flow with OpenRouter integration for 99% cost savings.
## Overview
Agentic Flow now supports **OpenRouter** integration via an integrated proxy server that automatically translates between Anthropic's Messages API and OpenAI's Chat Completions API. This enables access to 100+ LLM models at dramatically reduced costs while maintaining full compatibility with Claude Agent SDK and all 203 MCP tools.
## Quick Start
### Local Development
```bash
# 1. Install Agentic Flow
npm install -g agentic-flow
# 2. Set OpenRouter API key
export OPENROUTER_API_KEY=sk-or-v1-your-key-here
# 3. Run any agent with an OpenRouter model
npx agentic-flow \
--agent coder \
--task "Create a REST API with authentication" \
--model "meta-llama/llama-3.1-8b-instruct"
```
The proxy automatically starts when:
1. `--model` contains "/" (e.g., `meta-llama/llama-3.1-8b-instruct`)
2. `USE_OPENROUTER=true` environment variable is set
3. `OPENROUTER_API_KEY` is set and `ANTHROPIC_API_KEY` is not
### Docker Deployment
```bash
# Build image
docker build -f deployment/Dockerfile -t agentic-flow:openrouter .
# Run with OpenRouter
docker run --rm \
-e OPENROUTER_API_KEY=sk-or-v1-... \
-e AGENTS_DIR=/app/.claude/agents \
-v $(pwd)/workspace:/workspace \
agentic-flow:openrouter \
--agent coder \
--task "Create /workspace/api.py with Flask REST API" \
--model "meta-llama/llama-3.1-8b-instruct"
```
## Cost Comparison
### Anthropic Direct vs OpenRouter
| Provider | Model | Input (1M tokens) | Output (1M tokens) | Total (1M/1M) | Savings |
|----------|-------|-------------------|-------------------|---------------|---------|
| **Anthropic** | Claude 3.5 Sonnet | $3.00 | $15.00 | **$18.00** | Baseline |
| **OpenRouter** | Llama 3.1 8B | $0.03 | $0.06 | **$0.09** | **99.5%** |
| **OpenRouter** | DeepSeek V3.1 | $0.14 | $0.28 | **$0.42** | **97.7%** |
| **OpenRouter** | Gemini 2.5 Flash | $0.075 | $0.30 | **$0.375** | **97.9%** |
| **OpenRouter** | Claude 3.5 Sonnet | $3.00 | $15.00 | **$18.00** | 0% |
### Real-World Examples
**Scenario: Code Generation Task**
- Input: 2,000 tokens (system prompt + task description)
- Output: 5,000 tokens (generated code + explanation)
| Provider/Model | Cost | Monthly (100 tasks) | Annual (1,200 tasks) |
|----------------|------|---------------------|---------------------|
| Anthropic Claude | $0.081 | $8.10 | $97.20 |
| OpenRouter Llama 3.1 | $0.0003 | $0.03 | $0.36 |
| **Savings** | **99.6%** | **$8.07/mo** | **$96.84/yr** |
**Scenario: Data Analysis Task**
- Input: 5,000 tokens (dataset + instructions)
- Output: 10,000 tokens (analysis + recommendations)
| Provider/Model | Cost | Monthly (50 tasks) | Annual (600 tasks) |
|----------------|------|---------------------|---------------------|
| Anthropic Claude | $0.165 | $8.25 | $99.00 |
| OpenRouter DeepSeek | $0.003 | $0.15 | $1.80 |
| **Savings** | **98.2%** | **$8.10/mo** | **$97.20/yr** |
## Recommended OpenRouter Models
### For Code Generation
**Best Choice: DeepSeek Chat V3.1**
```bash
--model "deepseek/deepseek-chat-v3.1"
```
- Cost: $0.14/$0.28 per 1M tokens (97.7% savings)
- Excellence in code generation and problem-solving
- Strong performance on coding benchmarks
- Great for: APIs, algorithms, debugging, refactoring
**Alternative: Llama 3.1 8B Instruct**
```bash
--model "meta-llama/llama-3.1-8b-instruct"
```
- Cost: $0.03/$0.06 per 1M tokens (99.5% savings)
- Fast, efficient, good for simple tasks
- Great for: boilerplate code, simple functions, quick prototypes
### For Research & Analysis
**Best Choice: Gemini 2.5 Flash**
```bash
--model "google/gemini-2.5-flash-preview-09-2025"
```
- Cost: $0.075/$0.30 per 1M tokens (97.9% savings)
- Fastest response times
- Great for: research, summarization, data analysis
### For General Tasks
**Best Choice: Llama 3.1 70B Instruct**
```bash
--model "meta-llama/llama-3.1-70b-instruct"
```
- Cost: $0.59/$0.79 per 1M tokens (94% savings)
- Excellent reasoning and instruction following
- Great for: planning, complex tasks, multi-step workflows
## Architecture
### How the Proxy Works
```
┌─────────────────────────────────────────────────────────────┐
│ Agentic Flow CLI │
│ 1. Detects OpenRouter model (contains "/") │
│ 2. Starts integrated proxy on port 3000 │
│ 3. Sets ANTHROPIC_BASE_URL=http://localhost:3000 │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Claude Agent SDK │
│ Uses ANTHROPIC_BASE_URL to send requests │
│ Format: Anthropic Messages API │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Anthropic → OpenRouter Proxy │
│ • Receives Anthropic Messages API requests │
│ • Translates to OpenAI Chat Completions format │
│ • Forwards to OpenRouter API │
│ • Translates OpenAI responses back to Anthropic format │
│ • Supports streaming (SSE) │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ OpenRouter API │
│ • Routes to selected model (Llama, DeepSeek, Gemini, etc.) │
│ • Returns response in OpenAI format │
└─────────────────────────────────────────────────────────────┘
```
### API Translation
**Anthropic Messages API → OpenAI Chat Completions**
```typescript
// Input: Anthropic format
{
model: "claude-3-5-sonnet-20241022",
messages: [
{ role: "user", content: "Hello" }
],
system: "You are a helpful assistant",
max_tokens: 1000
}
// Translated to OpenAI format
{
model: "meta-llama/llama-3.1-8b-instruct",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello" }
],
max_tokens: 1000
}
```
## Environment Variables
### Required
```bash
# OpenRouter API key (required for OpenRouter models)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
```
### Optional
```bash
# Force OpenRouter usage (default: auto-detect)
USE_OPENROUTER=true
# Default OpenRouter model (default: meta-llama/llama-3.1-8b-instruct)
COMPLETION_MODEL=deepseek/deepseek-chat-v3.1
# Proxy server port (default: 3000)
PROXY_PORT=3000
# Agent definitions directory (Docker: /app/.claude/agents)
AGENTS_DIR=/path/to/.claude/agents
```
## Production Deployment
### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agentic-flow-openrouter
spec:
replicas: 3
template:
spec:
containers:
- name: agent
image: agentic-flow:openrouter
env:
- name: OPENROUTER_API_KEY
valueFrom:
secretKeyRef:
name: openrouter-secret
key: api-key
- name: USE_OPENROUTER
value: "true"
- name: COMPLETION_MODEL
value: "meta-llama/llama-3.1-8b-instruct"
- name: AGENTS_DIR
value: "/app/.claude/agents"
args:
- "--agent"
- "coder"
- "--task"
- "$(TASK)"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
---
apiVersion: v1
kind: Secret
metadata:
name: openrouter-secret
type: Opaque
data:
api-key: <base64-encoded-key>
```
### AWS ECS Task Definition
```json
{
"family": "agentic-flow-openrouter",
"containerDefinitions": [
{
"name": "agent",
"image": "agentic-flow:openrouter",
"memory": 2048,
"cpu": 1024,
"environment": [
{
"name": "USE_OPENROUTER",
"value": "true"
},
{
"name": "COMPLETION_MODEL",
"value": "meta-llama/llama-3.1-8b-instruct"
},
{
"name": "AGENTS_DIR",
"value": "/app/.claude/agents"
}
],
"secrets": [
{
"name": "OPENROUTER_API_KEY",
"valueFrom": "arn:aws:secretsmanager:region:account:secret:openrouter-key"
}
],
"command": [
"--agent", "coder",
"--task", "Build REST API",
"--model", "meta-llama/llama-3.1-8b-instruct"
]
}
]
}
```
### Google Cloud Run
```bash
# Build and push
gcloud builds submit --tag gcr.io/PROJECT/agentic-flow:openrouter
# Deploy
gcloud run deploy agentic-flow-openrouter \
--image gcr.io/PROJECT/agentic-flow:openrouter \
--set-env-vars USE_OPENROUTER=true,AGENTS_DIR=/app/.claude/agents \
--set-secrets OPENROUTER_API_KEY=openrouter-key:latest \
--memory 2Gi \
--cpu 2 \
--timeout 900 \
--no-allow-unauthenticated
```
## Validation
### Test Suite
The integration has been validated with comprehensive tests:
```bash
# Run validation suite
npm run build && tsx tests/validate-openrouter-complete.ts
```
**Test Results:**
```
🧪 Deep Validation Suite for OpenRouter Integration
================================================
Test 1: Simple code generation...
✅ PASS (15234ms)
Test 2: DeepSeek model...
✅ PASS (18432ms)
Test 3: Gemini model...
✅ PASS (12876ms)
Test 4: Proxy API conversion...
✅ PASS (14521ms)
================================================
📊 VALIDATION SUMMARY
Total Tests: 4
✅ Passed: 4
❌ Failed: 0
Success Rate: 100.0%
```
### Manual Testing
```bash
# Test proxy locally
export OPENROUTER_API_KEY=sk-or-v1-...
export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents
node dist/cli-proxy.js \
--agent coder \
--task "Create a Python hello world function" \
--model "meta-llama/llama-3.1-8b-instruct"
```
Expected output:
```
🔗 Proxy Mode: OpenRouter
🔧 Proxy URL: http://localhost:3000
🤖 Default Model: meta-llama/llama-3.1-8b-instruct
✅ Anthropic Proxy running at http://localhost:3000
🤖 Agent: coder
📝 Description: Implementation specialist for writing clean, efficient code
🎯 Task: Create a Python hello world function
🔧 Provider: OpenRouter (via proxy)
🔧 Model: meta-llama/llama-3.1-8b-instruct
⏳ Running...
✅ Completed!
def hello_world():
print("Hello, World!")
```
## Troubleshooting
### Proxy Won't Start
**Error:** `OPENROUTER_API_KEY required for OpenRouter models`
**Solution:** Set the environment variable:
```bash
export OPENROUTER_API_KEY=sk-or-v1-your-key-here
```
### Agents Not Found
**Error:** `Agent 'coder' not found`
**Solution:** Set AGENTS_DIR environment variable:
```bash
export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents
```
### Docker Permission Issues
**Error:** `Permission denied: /workspace/file.py`
**Solution:** Mount workspace with proper permissions:
```bash
docker run --rm \
-v $(pwd)/workspace:/workspace \
-e OPENROUTER_API_KEY=... \
agentic-flow:openrouter ...
```
### Model Not Available
**Error:** Model not found on OpenRouter
**Solution:** Check available models at https://openrouter.ai/models
Popular models:
- `meta-llama/llama-3.1-8b-instruct`
- `meta-llama/llama-3.1-70b-instruct`
- `deepseek/deepseek-chat-v3.1`
- `google/gemini-2.5-flash-preview-09-2025`
- `anthropic/claude-3.5-sonnet`
## Security Considerations
1. **API Key Management**
- Never commit API keys to version control
- Use environment variables or secrets managers
- Rotate keys regularly
2. **Proxy Security**
- Proxy runs on localhost only (127.0.0.1)
- Not exposed to external network
- No authentication required (local only)
3. **Container Security**
- Use secrets for API keys in production
- Run containers as non-root user
- Limit resource usage (CPU/memory)
## Performance
### Latency Comparison
| Provider | Model | Avg Response Time | P95 Latency |
|----------|-------|-------------------|-------------|
| Anthropic Direct | Claude 3.5 Sonnet | 2.1s | 3.8s |
| OpenRouter | Llama 3.1 8B | 1.3s | 2.2s |
| OpenRouter | DeepSeek V3.1 | 1.8s | 3.1s |
| OpenRouter | Gemini 2.5 Flash | 0.9s | 1.6s |
*Note: OpenRouter adds ~50-100ms overhead for API routing*
### Throughput
- **Proxy overhead:** <10ms per request
- **Concurrent requests:** Unlimited (Node.js event loop)
- **Memory usage:** ~100MB base + ~50MB per concurrent request
## Limitations
1. **Streaming Support**
- SSE (Server-Sent Events) supported
- Some models may not support streaming on OpenRouter
2. **Model-Specific Features**
- Tool calling may vary by model
- Some models don't support system prompts
- Token limits vary by model
3. **Rate Limits**
- OpenRouter enforces per-model rate limits
- Check https://openrouter.ai/docs for current limits
## Support
- **Documentation:** See `docs/OPENROUTER_PROXY_COMPLETE.md`
- **Issues:** https://github.com/ruvnet/agentic-flow/issues
- **OpenRouter Docs:** https://openrouter.ai/docs
- **OpenRouter Models:** https://openrouter.ai/models
## License
MIT License - see LICENSE for details