tasq/node_modules/agentic-flow/docs/guides/OPENROUTER_DEPLOYMENT.md

14 KiB

OpenRouter Deployment Guide

Complete guide for deploying Agentic Flow with OpenRouter integration for 99% cost savings.

Overview

Agentic Flow now supports OpenRouter integration via an integrated proxy server that automatically translates between Anthropic's Messages API and OpenAI's Chat Completions API. This enables access to 100+ LLM models at dramatically reduced costs while maintaining full compatibility with Claude Agent SDK and all 203 MCP tools.

Quick Start

Local Development

# 1. Install Agentic Flow
npm install -g agentic-flow

# 2. Set OpenRouter API key
export OPENROUTER_API_KEY=sk-or-v1-your-key-here

# 3. Run any agent with an OpenRouter model
npx agentic-flow \
  --agent coder \
  --task "Create a REST API with authentication" \
  --model "meta-llama/llama-3.1-8b-instruct"

The proxy automatically starts when:

  1. --model contains "/" (e.g., meta-llama/llama-3.1-8b-instruct)
  2. USE_OPENROUTER=true environment variable is set
  3. OPENROUTER_API_KEY is set and ANTHROPIC_API_KEY is not

Docker Deployment

# Build image
docker build -f deployment/Dockerfile -t agentic-flow:openrouter .

# Run with OpenRouter
docker run --rm \
  -e OPENROUTER_API_KEY=sk-or-v1-... \
  -e AGENTS_DIR=/app/.claude/agents \
  -v $(pwd)/workspace:/workspace \
  agentic-flow:openrouter \
  --agent coder \
  --task "Create /workspace/api.py with Flask REST API" \
  --model "meta-llama/llama-3.1-8b-instruct"

Cost Comparison

Anthropic Direct vs OpenRouter

Provider Model Input (1M tokens) Output (1M tokens) Total (1M/1M) Savings
Anthropic Claude 3.5 Sonnet $3.00 $15.00 $18.00 Baseline
OpenRouter Llama 3.1 8B $0.03 $0.06 $0.09 99.5%
OpenRouter DeepSeek V3.1 $0.14 $0.28 $0.42 97.7%
OpenRouter Gemini 2.5 Flash $0.075 $0.30 $0.375 97.9%
OpenRouter Claude 3.5 Sonnet $3.00 $15.00 $18.00 0%

Real-World Examples

Scenario: Code Generation Task

  • Input: 2,000 tokens (system prompt + task description)
  • Output: 5,000 tokens (generated code + explanation)
Provider/Model Cost Monthly (100 tasks) Annual (1,200 tasks)
Anthropic Claude $0.081 $8.10 $97.20
OpenRouter Llama 3.1 $0.0003 $0.03 $0.36
Savings 99.6% $8.07/mo $96.84/yr

Scenario: Data Analysis Task

  • Input: 5,000 tokens (dataset + instructions)
  • Output: 10,000 tokens (analysis + recommendations)
Provider/Model Cost Monthly (50 tasks) Annual (600 tasks)
Anthropic Claude $0.165 $8.25 $99.00
OpenRouter DeepSeek $0.003 $0.15 $1.80
Savings 98.2% $8.10/mo $97.20/yr

For Code Generation

Best Choice: DeepSeek Chat V3.1

--model "deepseek/deepseek-chat-v3.1"
  • Cost: $0.14/$0.28 per 1M tokens (97.7% savings)
  • Excellence in code generation and problem-solving
  • Strong performance on coding benchmarks
  • Great for: APIs, algorithms, debugging, refactoring

Alternative: Llama 3.1 8B Instruct

--model "meta-llama/llama-3.1-8b-instruct"
  • Cost: $0.03/$0.06 per 1M tokens (99.5% savings)
  • Fast, efficient, good for simple tasks
  • Great for: boilerplate code, simple functions, quick prototypes

For Research & Analysis

Best Choice: Gemini 2.5 Flash

--model "google/gemini-2.5-flash-preview-09-2025"
  • Cost: $0.075/$0.30 per 1M tokens (97.9% savings)
  • Fastest response times
  • Great for: research, summarization, data analysis

For General Tasks

Best Choice: Llama 3.1 70B Instruct

--model "meta-llama/llama-3.1-70b-instruct"
  • Cost: $0.59/$0.79 per 1M tokens (94% savings)
  • Excellent reasoning and instruction following
  • Great for: planning, complex tasks, multi-step workflows

Architecture

How the Proxy Works

┌─────────────────────────────────────────────────────────────┐
│                      Agentic Flow CLI                        │
│  1. Detects OpenRouter model (contains "/")                  │
│  2. Starts integrated proxy on port 3000                     │
│  3. Sets ANTHROPIC_BASE_URL=http://localhost:3000            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                  Claude Agent SDK                            │
│  Uses ANTHROPIC_BASE_URL to send requests                   │
│  Format: Anthropic Messages API                              │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Anthropic → OpenRouter Proxy                    │
│  • Receives Anthropic Messages API requests                  │
│  • Translates to OpenAI Chat Completions format             │
│  • Forwards to OpenRouter API                                │
│  • Translates OpenAI responses back to Anthropic format     │
│  • Supports streaming (SSE)                                  │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    OpenRouter API                            │
│  • Routes to selected model (Llama, DeepSeek, Gemini, etc.) │
│  • Returns response in OpenAI format                         │
└─────────────────────────────────────────────────────────────┘

API Translation

Anthropic Messages API → OpenAI Chat Completions

// Input: Anthropic format
{
  model: "claude-3-5-sonnet-20241022",
  messages: [
    { role: "user", content: "Hello" }
  ],
  system: "You are a helpful assistant",
  max_tokens: 1000
}

// Translated to OpenAI format
{
  model: "meta-llama/llama-3.1-8b-instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Hello" }
  ],
  max_tokens: 1000
}

Environment Variables

Required

# OpenRouter API key (required for OpenRouter models)
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Optional

# Force OpenRouter usage (default: auto-detect)
USE_OPENROUTER=true

# Default OpenRouter model (default: meta-llama/llama-3.1-8b-instruct)
COMPLETION_MODEL=deepseek/deepseek-chat-v3.1

# Proxy server port (default: 3000)
PROXY_PORT=3000

# Agent definitions directory (Docker: /app/.claude/agents)
AGENTS_DIR=/path/to/.claude/agents

Production Deployment

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentic-flow-openrouter
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: agent
        image: agentic-flow:openrouter
        env:
        - name: OPENROUTER_API_KEY
          valueFrom:
            secretKeyRef:
              name: openrouter-secret
              key: api-key
        - name: USE_OPENROUTER
          value: "true"
        - name: COMPLETION_MODEL
          value: "meta-llama/llama-3.1-8b-instruct"
        - name: AGENTS_DIR
          value: "/app/.claude/agents"
        args:
        - "--agent"
        - "coder"
        - "--task"
        - "$(TASK)"
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
---
apiVersion: v1
kind: Secret
metadata:
  name: openrouter-secret
type: Opaque
data:
  api-key: <base64-encoded-key>

AWS ECS Task Definition

{
  "family": "agentic-flow-openrouter",
  "containerDefinitions": [
    {
      "name": "agent",
      "image": "agentic-flow:openrouter",
      "memory": 2048,
      "cpu": 1024,
      "environment": [
        {
          "name": "USE_OPENROUTER",
          "value": "true"
        },
        {
          "name": "COMPLETION_MODEL",
          "value": "meta-llama/llama-3.1-8b-instruct"
        },
        {
          "name": "AGENTS_DIR",
          "value": "/app/.claude/agents"
        }
      ],
      "secrets": [
        {
          "name": "OPENROUTER_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:openrouter-key"
        }
      ],
      "command": [
        "--agent", "coder",
        "--task", "Build REST API",
        "--model", "meta-llama/llama-3.1-8b-instruct"
      ]
    }
  ]
}

Google Cloud Run

# Build and push
gcloud builds submit --tag gcr.io/PROJECT/agentic-flow:openrouter

# Deploy
gcloud run deploy agentic-flow-openrouter \
  --image gcr.io/PROJECT/agentic-flow:openrouter \
  --set-env-vars USE_OPENROUTER=true,AGENTS_DIR=/app/.claude/agents \
  --set-secrets OPENROUTER_API_KEY=openrouter-key:latest \
  --memory 2Gi \
  --cpu 2 \
  --timeout 900 \
  --no-allow-unauthenticated

Validation

Test Suite

The integration has been validated with comprehensive tests:

# Run validation suite
npm run build && tsx tests/validate-openrouter-complete.ts

Test Results:

🧪 Deep Validation Suite for OpenRouter Integration

================================================

Test 1: Simple code generation...
  ✅ PASS (15234ms)

Test 2: DeepSeek model...
  ✅ PASS (18432ms)

Test 3: Gemini model...
  ✅ PASS (12876ms)

Test 4: Proxy API conversion...
  ✅ PASS (14521ms)

================================================
📊 VALIDATION SUMMARY

Total Tests: 4
✅ Passed: 4
❌ Failed: 0
Success Rate: 100.0%

Manual Testing

# Test proxy locally
export OPENROUTER_API_KEY=sk-or-v1-...
export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents

node dist/cli-proxy.js \
  --agent coder \
  --task "Create a Python hello world function" \
  --model "meta-llama/llama-3.1-8b-instruct"

Expected output:

🔗 Proxy Mode: OpenRouter
🔧 Proxy URL: http://localhost:3000
🤖 Default Model: meta-llama/llama-3.1-8b-instruct

✅ Anthropic Proxy running at http://localhost:3000

🤖 Agent: coder
📝 Description: Implementation specialist for writing clean, efficient code

🎯 Task: Create a Python hello world function

🔧 Provider: OpenRouter (via proxy)
🔧 Model: meta-llama/llama-3.1-8b-instruct

⏳ Running...

✅ Completed!

def hello_world():
    print("Hello, World!")

Troubleshooting

Proxy Won't Start

Error: OPENROUTER_API_KEY required for OpenRouter models

Solution: Set the environment variable:

export OPENROUTER_API_KEY=sk-or-v1-your-key-here

Agents Not Found

Error: Agent 'coder' not found

Solution: Set AGENTS_DIR environment variable:

export AGENTS_DIR=/workspaces/agentic-flow/agentic-flow/.claude/agents

Docker Permission Issues

Error: Permission denied: /workspace/file.py

Solution: Mount workspace with proper permissions:

docker run --rm \
  -v $(pwd)/workspace:/workspace \
  -e OPENROUTER_API_KEY=... \
  agentic-flow:openrouter ...

Model Not Available

Error: Model not found on OpenRouter

Solution: Check available models at https://openrouter.ai/models

Popular models:

  • meta-llama/llama-3.1-8b-instruct
  • meta-llama/llama-3.1-70b-instruct
  • deepseek/deepseek-chat-v3.1
  • google/gemini-2.5-flash-preview-09-2025
  • anthropic/claude-3.5-sonnet

Security Considerations

  1. API Key Management

    • Never commit API keys to version control
    • Use environment variables or secrets managers
    • Rotate keys regularly
  2. Proxy Security

    • Proxy runs on localhost only (127.0.0.1)
    • Not exposed to external network
    • No authentication required (local only)
  3. Container Security

    • Use secrets for API keys in production
    • Run containers as non-root user
    • Limit resource usage (CPU/memory)

Performance

Latency Comparison

Provider Model Avg Response Time P95 Latency
Anthropic Direct Claude 3.5 Sonnet 2.1s 3.8s
OpenRouter Llama 3.1 8B 1.3s 2.2s
OpenRouter DeepSeek V3.1 1.8s 3.1s
OpenRouter Gemini 2.5 Flash 0.9s 1.6s

Note: OpenRouter adds ~50-100ms overhead for API routing

Throughput

  • Proxy overhead: <10ms per request
  • Concurrent requests: Unlimited (Node.js event loop)
  • Memory usage: ~100MB base + ~50MB per concurrent request

Limitations

  1. Streaming Support

    • SSE (Server-Sent Events) supported
    • Some models may not support streaming on OpenRouter
  2. Model-Specific Features

    • Tool calling may vary by model
    • Some models don't support system prompts
    • Token limits vary by model
  3. Rate Limits

Support

License

MIT License - see LICENSE for details