Marc Rejohn Castillano 5cb6561924 added ruflo

2026-04-09 19:01:53 +08:00

14 KiB

Raw Blame History

OpenRouter Deployment Guide

Complete guide for deploying Agentic Flow with OpenRouter integration for 99% cost savings.

Overview

Agentic Flow now supports OpenRouter integration via an integrated proxy server that automatically translates between Anthropic's Messages API and OpenAI's Chat Completions API. This enables access to 100+ LLM models at dramatically reduced costs while maintaining full compatibility with Claude Agent SDK and all 203 MCP tools.

Quick Start

Local Development

# 1. Install Agentic Flow
npm install -g agentic-flow

# 2. Set OpenRouter API key
export OPENROUTER_API_KEY=sk-or-v1-your-key-here

# 3. Run any agent with an OpenRouter model
npx agentic-flow \
  --agent coder \
  --task "Create a REST API with authentication" \
  --model "meta-llama/llama-3.1-8b-instruct"

The proxy automatically starts when:

--model contains "/" (e.g., meta-llama/llama-3.1-8b-instruct)
USE_OPENROUTER=true environment variable is set
OPENROUTER_API_KEY is set and ANTHROPIC_API_KEY is not

Docker Deployment

# Build image
docker build -f deployment/Dockerfile -t agentic-flow:openrouter .

# Run with OpenRouter
docker run --rm \
  -e OPENROUTER_API_KEY=sk-or-v1-... \
  -e AGENTS_DIR=/app/.claude/agents \
  -v $(pwd)/workspace:/workspace \
  agentic-flow:openrouter \
  --agent coder \
  --task "Create /workspace/api.py with Flask REST API" \
  --model "meta-llama/llama-3.1-8b-instruct"

Cost Comparison

Anthropic Direct vs OpenRouter

Provider	Model	Input (1M tokens)	Output (1M tokens)	Total (1M/1M)	Savings
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	$18.00	Baseline
OpenRouter	Llama 3.1 8B	$0.03	$0.06	$0.09	99.5%
OpenRouter	DeepSeek V3.1	$0.14	$0.28	$0.42	97.7%
OpenRouter	Gemini 2.5 Flash	$0.075	$0.30	$0.375	97.9%
OpenRouter	Claude 3.5 Sonnet	$3.00	$15.00	$18.00	0%

Real-World Examples

Scenario: Code Generation Task

Input: 2,000 tokens (system prompt + task description)
Output: 5,000 tokens (generated code + explanation)

Provider/Model	Cost	Monthly (100 tasks)	Annual (1,200 tasks)
Anthropic Claude	$0.081	$8.10	$97.20
OpenRouter Llama 3.1	$0.0003	$0.03	$0.36
Savings	99.6%	$8.07/mo	$96.84/yr

Scenario: Data Analysis Task

Input: 5,000 tokens (dataset + instructions)
Output: 10,000 tokens (analysis + recommendations)

Provider/Model	Cost	Monthly (50 tasks)	Annual (600 tasks)
Anthropic Claude	$0.165	$8.25	$99.00
OpenRouter DeepSeek	$0.003	$0.15	$1.80
Savings	98.2%	$8.10/mo	$97.20/yr

Recommended OpenRouter Models

For Code Generation

Best Choice: DeepSeek Chat V3.1

--model "deepseek/deepseek-chat-v3.1"

Cost: $0.14/$0.28 per 1M tokens (97.7% savings)
Excellence in code generation and problem-solving
Strong performance on coding benchmarks
Great for: APIs, algorithms, debugging, refactoring

Alternative: Llama 3.1 8B Instruct

--model "meta-llama/llama-3.1-8b-instruct"

Cost: $0.03/$0.06 per 1M tokens (99.5% savings)
Fast, efficient, good for simple tasks
Great for: boilerplate code, simple functions, quick prototypes

For Research & Analysis

Best Choice: Gemini 2.5 Flash

--model "google/gemini-2.5-flash-preview-09-2025"

Cost: $0.075/$0.30 per 1M tokens (97.9% savings)
Fastest response times
Great for: research, summarization, data analysis

For General Tasks

Best Choice: Llama 3.1 70B Instruct

--model "meta-llama/llama-3.1-70b-instruct"

Cost: $0.59/$0.79 per 1M tokens (94% savings)
Excellent reasoning and instruction following
Great for: planning, complex tasks, multi-step workflows

Architecture

How the Proxy Works

┌─────────────────────────────────────────────────────────────┐
│                      Agentic Flow CLI                        │
│  1. Detects OpenRouter model (contains "/")                  │
│  2. Starts integrated proxy on port 3000                     │
│  3. Sets ANTHROPIC_BASE_URL=http://localhost:3000            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                  Claude Agent SDK                            │
│  Uses ANTHROPIC_BASE_URL to send requests                   │
│  Format: Anthropic Messages API                              │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Anthropic → OpenRouter Proxy                    │
│  • Receives Anthropic Messages API requests                  │
│  • Translates to OpenAI Chat Completions format             │
│  • Forwards to OpenRouter API                                │
│  • Translates OpenAI responses back to Anthropic format     │
│  • Supports streaming (SSE)                                  │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    OpenRouter API                            │
│  • Routes to selected model (Llama, DeepSeek, Gemini, etc.) │
│  • Returns response in OpenAI format                         │
└─────────────────────────────────────────────────────────────┘

API Translation

Anthropic Messages API → OpenAI Chat Completions

// Input: Anthropic format
{
  model: "claude-3-5-sonnet-20241022",
  messages: [
    { role: "user", content: "Hello" }
  ],
  system: "You are a helpful assistant",
  max_tokens: 1000
}

// Translated to OpenAI format
{
  model: "meta-llama/llama-3.1-8b-instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Hello" }
  ],
  max_tokens: 1000
}

Environment Variables

Required

# OpenRouter API key (required for OpenRouter models)
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Optional

# Force OpenRouter usage (default: auto-detect)
USE_OPENROUTER=true

# Default OpenRouter model (default: meta-llama/llama-3.1-8b-instruct)
COMPLETION_MODEL=deepseek/deepseek-chat-v3.1

# Proxy server port (default: 3000)
PROXY_PORT=3000

# Agent definitions directory (Docker: /app/.claude/agents)
AGENTS_DIR=/path/to/.claude/agents

Production Deployment

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentic-flow-openrouter
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: agent
        image: agentic-flow:openrouter
        env:
        - name: OPENROUTER_API_KEY
          valueFrom:
            secretKeyRef:
              name: openrouter-secret
              key: api-key
        - name: USE_OPENROUTER
          value: "true"
        - name: COMPLETION_MODEL
          value: "meta-llama/llama-3.1-8b-instruct"
        - name: AGENTS_DIR
          value: "/app/.claude/agents"
        args:
        - "--agent"
        - "coder"
        - "--task"
        - "$(TASK)"
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
---
apiVersion: v1
kind: Secret
metadata:
  name: openrouter-secret
type: Opaque
data:
  api-key: <base64-encoded-key>

AWS ECS Task Definition

{
  "family": "agentic-flow-openrouter",
  "containerDefinitions": [
    {
      "name": "agent",
      "image": "agentic-flow:openrouter",
      "memory": 2048,
      "cpu": 1024,
      "environment": [
        {
          "name": "USE_OPENROUTER",
          "value": "true"
        },
        {
          "name": "COMPLETION_MODEL",
          "value": "meta-llama/llama-3.1-8b-instruct"
        },
        {
          "name": "AGENTS_DIR",
          "value": "/app/.claude/agents"
        }
      ],
      "secrets": [
        {
          "name": "OPENROUTER_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:openrouter-key"
        }
      ],
      "command": [
        "--agent", "coder",
        "--task", "Build REST API",
        "--model", "meta-llama/llama-3.1-8b-instruct"
      ]
    }
  ]
}

Google Cloud Run

# Build and push
gcloud builds submit --tag gcr.io/PROJECT/agentic-flow:openrouter

# Deploy
gcloud run deploy agentic-flow-openrouter \
  --image gcr.io/PROJECT/agentic-flow:openrouter \
  --set-env-vars USE_OPENROUTER=true,AGENTS_DIR=/app/.claude/agents \
  --set-secrets OPENROUTER_API_KEY=openrouter-key:latest \
  --memory 2Gi \
  --cpu 2 \
  --timeout 900 \
  --no-allow-unauthenticated

Validation

Test Suite

The integration has been validated with comprehensive tests:

# Run validation suite
npm run build && tsx tests/validate-openrouter-complete.ts

Test Results:

🧪 Deep Validation Suite for OpenRouter Integration

================================================

Test 1: Simple code generation...
  ✅ PASS (15234ms)

Test 2: DeepSeek model...
  ✅ PASS (18432ms)

Test 3: Gemini model...
  ✅ PASS (12876ms)

Test 4: Proxy API conversion...
  ✅ PASS (14521ms)

================================================
📊 VALIDATION SUMMARY

Total Tests: 4
✅ Passed: 4
❌ Failed: 0
Success Rate: 100.0%

Manual Testing

# Test proxy locally
export OPENROUTER_API_KEY=sk-or-v1-...
export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents

node dist/cli-proxy.js \
  --agent coder \
  --task "Create a Python hello world function" \
  --model "meta-llama/llama-3.1-8b-instruct"

Expected output:

🔗 Proxy Mode: OpenRouter
🔧 Proxy URL: http://localhost:3000
🤖 Default Model: meta-llama/llama-3.1-8b-instruct

✅ Anthropic Proxy running at http://localhost:3000

🤖 Agent: coder
📝 Description: Implementation specialist for writing clean, efficient code

🎯 Task: Create a Python hello world function

🔧 Provider: OpenRouter (via proxy)
🔧 Model: meta-llama/llama-3.1-8b-instruct

⏳ Running...

✅ Completed!

def hello_world():
    print("Hello, World!")

Troubleshooting

Proxy Won't Start

Error: OPENROUTER_API_KEY required for OpenRouter models

Solution: Set the environment variable:

export OPENROUTER_API_KEY=sk-or-v1-your-key-here

Agents Not Found

Error: Agent 'coder' not found

Solution: Set AGENTS_DIR environment variable:

export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents

Docker Permission Issues

Error: Permission denied: /workspace/file.py

Solution: Mount workspace with proper permissions:

docker run --rm \
  -v $(pwd)/workspace:/workspace \
  -e OPENROUTER_API_KEY=... \
  agentic-flow:openrouter ...

Model Not Available

Error: Model not found on OpenRouter

Solution: Check available models at https://openrouter.ai/models

Popular models:

meta-llama/llama-3.1-8b-instruct
meta-llama/llama-3.1-70b-instruct
deepseek/deepseek-chat-v3.1
google/gemini-2.5-flash-preview-09-2025
anthropic/claude-3.5-sonnet

Security Considerations

API Key Management
- Never commit API keys to version control
- Use environment variables or secrets managers
- Rotate keys regularly
Proxy Security
- Proxy runs on localhost only (127.0.0.1)
- Not exposed to external network
- No authentication required (local only)
Container Security
- Use secrets for API keys in production
- Run containers as non-root user
- Limit resource usage (CPU/memory)

Performance

Latency Comparison

Provider	Model	Avg Response Time	P95 Latency
Anthropic Direct	Claude 3.5 Sonnet	2.1s	3.8s
OpenRouter	Llama 3.1 8B	1.3s	2.2s
OpenRouter	DeepSeek V3.1	1.8s	3.1s
OpenRouter	Gemini 2.5 Flash	0.9s	1.6s

Note: OpenRouter adds ~50-100ms overhead for API routing

Throughput

Proxy overhead: <10ms per request
Concurrent requests: Unlimited (Node.js event loop)
Memory usage: ~100MB base + ~50MB per concurrent request

Limitations

Streaming Support
- SSE (Server-Sent Events) supported
- Some models may not support streaming on OpenRouter
Model-Specific Features
- Tool calling may vary by model
- Some models don't support system prompts
- Token limits vary by model
Rate Limits
- OpenRouter enforces per-model rate limits
- Check https://openrouter.ai/docs for current limits

Support

Documentation: See docs/OPENROUTER_PROXY_COMPLETE.md
Issues: https://github.com/ruvnet/agentic-flow/issues
OpenRouter Docs: https://openrouter.ai/docs
OpenRouter Models: https://openrouter.ai/models

License

MIT License - see LICENSE for details

14 KiB Raw Blame History