tasq/node_modules/agentic-flow/docs/guides/PROXY-ARCHITECTURE-AND-EXTENSION.md

18 KiB

Proxy Architecture and Extension Guide

📖 Table of Contents


How the Proxy Works

The Problem

Claude Code and the Claude Agent SDK expect requests in Anthropic's Messages API format. When you want to use cheaper alternatives (OpenRouter, Gemini, local models), you need to:

  1. Translate Anthropic request format → Provider's format
  2. Forward request to the provider's API
  3. Translate provider's response → Anthropic response format
  4. Return to Claude Code/SDK (which thinks it's talking to Anthropic)

The Solution

A transparent HTTP proxy that sits between Claude Code and the LLM provider:

┌─────────────┐       ┌──────────────┐       ┌──────────────┐
│ Claude Code │──────▶│ Proxy Server │──────▶│   Provider   │
│   /SDK      │       │ (localhost)  │       │ (OpenRouter, │
│             │◀──────│              │◀──────│  Gemini, etc)│
└─────────────┘       └──────────────┘       └──────────────┘
   Anthropic API         Translates            Provider API

Key Benefits:

  • No code changes to Claude Code or Agent SDK
  • 99% cost savings with OpenRouter models
  • 100% free with Gemini free tier
  • All MCP tools work through the proxy
  • Streaming support
  • Function/tool calling support

Architecture Overview

File Structure

src/proxy/
├── anthropic-to-openrouter.ts    # OpenRouter proxy
├── anthropic-to-gemini.ts        # Gemini proxy
└── provider-instructions.ts      # Model-specific configs

Core Components

1. Express Server

  • Listens on port 3000 (configurable)
  • Handles /v1/messages endpoint (Anthropic's Messages API)
  • Health check at /health

2. Request Converter

Translates Anthropic → Provider format:

private convertAnthropicToOpenAI(anthropicReq: AnthropicRequest): OpenAIRequest {
  // 1. Extract system prompt
  // 2. Convert messages array
  // 3. Convert tools (if present)
  // 4. Map model names
  // 5. Apply provider-specific configs
}

3. Response Converter

Translates Provider → Anthropic format:

private convertOpenAIToAnthropic(openaiRes: any): any {
  // 1. Extract choice/candidate
  // 2. Convert tool_calls → tool_use blocks
  // 3. Extract text content
  // 4. Map finish reasons
  // 5. Convert usage stats
}

4. Streaming Handler

For real-time token-by-token output:

private convertOpenAIStreamToAnthropic(chunk: string): string {
  // Convert SSE format: OpenAI → Anthropic
}

Adding New Cloud Providers

Example: Adding Mistral AI

Step 1: Create proxy file

src/proxy/anthropic-to-mistral.ts:

import express, { Request, Response } from 'express';
import { logger } from '../utils/logger.js';

interface MistralMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
}

interface MistralRequest {
  model: string;
  messages: MistralMessage[];
  temperature?: number;
  max_tokens?: number;
  stream?: boolean;
}

export class AnthropicToMistralProxy {
  private app: express.Application;
  private mistralApiKey: string;
  private mistralBaseUrl: string;
  private defaultModel: string;

  constructor(config: {
    mistralApiKey: string;
    mistralBaseUrl?: string;
    defaultModel?: string;
  }) {
    this.app = express();
    this.mistralApiKey = config.mistralApiKey;
    this.mistralBaseUrl = config.mistralBaseUrl || 'https://api.mistral.ai/v1';
    this.defaultModel = config.defaultModel || 'mistral-large-latest';

    this.setupMiddleware();
    this.setupRoutes();
  }

  private setupMiddleware(): void {
    this.app.use(express.json({ limit: '50mb' }));
  }

  private setupRoutes(): void {
    // Health check
    this.app.get('/health', (req: Request, res: Response) => {
      res.json({ status: 'ok', service: 'anthropic-to-mistral-proxy' });
    });

    // Main conversion endpoint
    this.app.post('/v1/messages', async (req: Request, res: Response) => {
      try {
        const anthropicReq = req.body;

        // Convert Anthropic → Mistral
        const mistralReq = this.convertAnthropicToMistral(anthropicReq);

        // Forward to Mistral
        const response = await fetch(`${this.mistralBaseUrl}/chat/completions`, {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${this.mistralApiKey}`,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify(mistralReq)
        });

        if (!response.ok) {
          const error = await response.text();
          logger.error('Mistral API error', { status: response.status, error });
          return res.status(response.status).json({
            error: { type: 'api_error', message: error }
          });
        }

        // Convert Mistral → Anthropic
        const mistralRes = await response.json();
        const anthropicRes = this.convertMistralToAnthropic(mistralRes);

        res.json(anthropicRes);
      } catch (error: any) {
        logger.error('Mistral proxy error', { error: error.message });
        res.status(500).json({
          error: { type: 'proxy_error', message: error.message }
        });
      }
    });
  }

  private convertAnthropicToMistral(anthropicReq: any): MistralRequest {
    const messages: MistralMessage[] = [];

    // Add system prompt if present
    if (anthropicReq.system) {
      messages.push({
        role: 'system',
        content: typeof anthropicReq.system === 'string'
          ? anthropicReq.system
          : anthropicReq.system.map((b: any) => b.text).join('\n')
      });
    }

    // Convert messages
    for (const msg of anthropicReq.messages) {
      messages.push({
        role: msg.role,
        content: typeof msg.content === 'string'
          ? msg.content
          : msg.content.filter((b: any) => b.type === 'text').map((b: any) => b.text).join('\n')
      });
    }

    return {
      model: this.defaultModel,
      messages,
      temperature: anthropicReq.temperature,
      max_tokens: anthropicReq.max_tokens || 4096,
      stream: anthropicReq.stream || false
    };
  }

  private convertMistralToAnthropic(mistralRes: any): any {
    const choice = mistralRes.choices?.[0];
    if (!choice) throw new Error('No choices in Mistral response');

    const content = choice.message?.content || '';

    return {
      id: mistralRes.id || `msg_${Date.now()}`,
      type: 'message',
      role: 'assistant',
      model: mistralRes.model,
      content: [{ type: 'text', text: content }],
      stop_reason: choice.finish_reason === 'stop' ? 'end_turn' : 'max_tokens',
      usage: {
        input_tokens: mistralRes.usage?.prompt_tokens || 0,
        output_tokens: mistralRes.usage?.completion_tokens || 0
      }
    };
  }

  public start(port: number): void {
    this.app.listen(port, () => {
      logger.info('Mistral proxy started', { port });
      console.log(`\n✅ Mistral Proxy running at http://localhost:${port}\n`);
    });
  }
}

// CLI entry point
if (import.meta.url === `file://${process.argv[1]}`) {
  const port = parseInt(process.env.PORT || '3000');
  const mistralApiKey = process.env.MISTRAL_API_KEY;

  if (!mistralApiKey) {
    console.error('❌ Error: MISTRAL_API_KEY environment variable required');
    process.exit(1);
  }

  const proxy = new AnthropicToMistralProxy({ mistralApiKey });
  proxy.start(port);
}

Step 2: Update TypeScript build

Add to config/tsconfig.json if needed (usually auto-detected).

Step 3: Test the proxy

# Terminal 1: Start proxy
export MISTRAL_API_KEY=your-key-here
npm run build
node dist/proxy/anthropic-to-mistral.js

# Terminal 2: Use with Claude Code
export ANTHROPIC_BASE_URL=http://localhost:3000
export ANTHROPIC_API_KEY=dummy-key
npx agentic-flow --agent coder --task "Write hello world"

Adding Local LLM Providers

Example: Adding Ollama Support

Step 1: Create proxy file

src/proxy/anthropic-to-ollama.ts:

import express, { Request, Response } from 'express';
import { logger } from '../utils/logger.js';

export class AnthropicToOllamaProxy {
  private app: express.Application;
  private ollamaBaseUrl: string;
  private defaultModel: string;

  constructor(config: {
    ollamaBaseUrl?: string;
    defaultModel?: string;
  }) {
    this.app = express();
    this.ollamaBaseUrl = config.ollamaBaseUrl || 'http://localhost:11434';
    this.defaultModel = config.defaultModel || 'llama3.3:70b';

    this.setupMiddleware();
    this.setupRoutes();
  }

  private setupMiddleware(): void {
    this.app.use(express.json({ limit: '50mb' }));
  }

  private setupRoutes(): void {
    this.app.get('/health', (req: Request, res: Response) => {
      res.json({ status: 'ok', service: 'anthropic-to-ollama-proxy' });
    });

    this.app.post('/v1/messages', async (req: Request, res: Response) => {
      try {
        const anthropicReq = req.body;

        // Build prompt from messages
        let prompt = '';
        if (anthropicReq.system) {
          prompt += `System: ${anthropicReq.system}\n\n`;
        }

        for (const msg of anthropicReq.messages) {
          const content = typeof msg.content === 'string'
            ? msg.content
            : msg.content.filter((b: any) => b.type === 'text').map((b: any) => b.text).join('\n');

          prompt += `${msg.role === 'user' ? 'Human' : 'Assistant'}: ${content}\n\n`;
        }

        prompt += 'Assistant: ';

        // Call Ollama API
        const response = await fetch(`${this.ollamaBaseUrl}/api/generate`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            model: this.defaultModel,
            prompt,
            stream: false,
            options: {
              temperature: anthropicReq.temperature || 0.7,
              num_predict: anthropicReq.max_tokens || 4096
            }
          })
        });

        if (!response.ok) {
          const error = await response.text();
          logger.error('Ollama API error', { status: response.status, error });
          return res.status(response.status).json({
            error: { type: 'api_error', message: error }
          });
        }

        const ollamaRes = await response.json();

        // Convert to Anthropic format
        const anthropicRes = {
          id: `msg_${Date.now()}`,
          type: 'message',
          role: 'assistant',
          model: this.defaultModel,
          content: [{ type: 'text', text: ollamaRes.response }],
          stop_reason: ollamaRes.done ? 'end_turn' : 'max_tokens',
          usage: {
            input_tokens: ollamaRes.prompt_eval_count || 0,
            output_tokens: ollamaRes.eval_count || 0
          }
        };

        res.json(anthropicRes);
      } catch (error: any) {
        logger.error('Ollama proxy error', { error: error.message });
        res.status(500).json({
          error: { type: 'proxy_error', message: error.message }
        });
      }
    });
  }

  public start(port: number): void {
    this.app.listen(port, () => {
      logger.info('Ollama proxy started', { port, ollamaBaseUrl: this.ollamaBaseUrl });
      console.log(`\n✅ Ollama Proxy running at http://localhost:${port}`);
      console.log(`   Ollama Server: ${this.ollamaBaseUrl}`);
      console.log(`   Default Model: ${this.defaultModel}\n`);
    });
  }
}

// CLI entry point
if (import.meta.url === `file://${process.argv[1]}`) {
  const port = parseInt(process.env.PORT || '3000');

  const proxy = new AnthropicToOllamaProxy({
    ollamaBaseUrl: process.env.OLLAMA_BASE_URL,
    defaultModel: process.env.OLLAMA_MODEL || 'llama3.3:70b'
  });

  proxy.start(port);
}

Step 2: Start Ollama server

# Install Ollama (https://ollama.ai)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.3:70b

# Server starts automatically on port 11434

Step 3: Use with Agentic Flow

# Terminal 1: Start proxy
npm run build
node dist/proxy/anthropic-to-ollama.js

# Terminal 2: Use with agents
export ANTHROPIC_BASE_URL=http://localhost:3000
export ANTHROPIC_API_KEY=dummy-key
npx agentic-flow --agent coder --task "Write hello world"

Message Format Conversion

Anthropic Messages API Format

{
  "model": "claude-3-5-sonnet-20241022",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "system": "You are a helpful assistant",
  "max_tokens": 1024,
  "temperature": 0.7
}

OpenAI Chat Completions Format

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "max_tokens": 1024,
  "temperature": 0.7
}

Gemini generateContent Format

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "System: You are a helpful assistant\n\nHello!"
        }
      ]
    }
  ],
  "generationConfig": {
    "temperature": 0.7,
    "maxOutputTokens": 1024
  }
}

Key Differences

Feature Anthropic OpenAI Gemini
System Prompt Separate system field First message with role: "system" Prepended to first user message
Message Content String or array of blocks Always string Array of parts with text
Role Names user, assistant user, assistant, system user, model
Max Tokens max_tokens max_tokens generationConfig.maxOutputTokens
Response Format content array with typed blocks message.content string candidates[0].content.parts[0].text

Tool/Function Calling Support

Anthropic Tool Format

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  ]
}

OpenAI Tool Format

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Conversion Logic

// Anthropic → OpenAI
if (anthropicReq.tools && anthropicReq.tools.length > 0) {
  openaiReq.tools = anthropicReq.tools.map(tool => ({
    type: 'function',
    function: {
      name: tool.name,
      description: tool.description || '',
      parameters: tool.input_schema || {
        type: 'object',
        properties: {},
        required: []
      }
    }
  }));
}

// OpenAI → Anthropic (tool_calls in response)
if (message.tool_calls && message.tool_calls.length > 0) {
  for (const toolCall of message.tool_calls) {
    contentBlocks.push({
      type: 'tool_use',
      id: toolCall.id,
      name: toolCall.function.name,
      input: JSON.parse(toolCall.function.arguments)
    });
  }
}

Testing Your Proxy

Unit Tests

Create tests/proxy-mistral.test.ts:

import { AnthropicToMistralProxy } from '../src/proxy/anthropic-to-mistral.js';
import fetch from 'node-fetch';

describe('Mistral Proxy', () => {
  let proxy: AnthropicToMistralProxy;
  const port = 3001;

  beforeAll(() => {
    proxy = new AnthropicToMistralProxy({
      mistralApiKey: process.env.MISTRAL_API_KEY || 'test-key'
    });
    proxy.start(port);
  });

  it('should convert Anthropic request to Mistral format', async () => {
    const response = await fetch(`http://localhost:${port}/v1/messages`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'claude-3-5-sonnet-20241022',
        messages: [{ role: 'user', content: 'Hello!' }],
        max_tokens: 100
      })
    });

    expect(response.ok).toBe(true);
    const data = await response.json();
    expect(data).toHaveProperty('content');
    expect(data.role).toBe('assistant');
  });
});

Manual Testing

# Test health check
curl http://localhost:3000/health

# Test message endpoint
curl -X POST http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Examples

Complete Example: Adding Cohere

See full implementation: examples/proxy-cohere.ts

Integration with Agentic Flow

// src/cli-proxy.ts - Add new provider option
if (options.provider === 'mistral' || process.env.USE_MISTRAL) {
  // Start Mistral proxy
  const proxy = new AnthropicToMistralProxy({
    mistralApiKey: process.env.MISTRAL_API_KEY!
  });
  proxy.start(3000);

  // Set environment for SDK
  process.env.ANTHROPIC_BASE_URL = 'http://localhost:3000';
  process.env.ANTHROPIC_API_KEY = 'dummy-key';
}

Best Practices

  1. Error Handling: Always catch and log errors with context
  2. Streaming: Support both streaming and non-streaming modes
  3. Tool Calling: Handle MCP tools via native function calling when possible
  4. Logging: Use verbose logging during development, info in production
  5. API Keys: Never hardcode keys, use environment variables
  6. Health Checks: Always provide a /health endpoint
  7. Rate Limiting: Respect provider rate limits
  8. Timeouts: Set appropriate timeouts for API calls

Resources


Support

Need help adding a provider? Open an issue: GitHub Issues