709 lines
18 KiB
Markdown
709 lines
18 KiB
Markdown
# Proxy Architecture and Extension Guide
|
|
|
|
## 📖 Table of Contents
|
|
|
|
- [How the Proxy Works](#how-the-proxy-works)
|
|
- [Architecture Overview](#architecture-overview)
|
|
- [Adding New Cloud Providers](#adding-new-cloud-providers)
|
|
- [Adding Local LLM Providers](#adding-local-llm-providers)
|
|
- [Message Format Conversion](#message-format-conversion)
|
|
- [Tool/Function Calling Support](#toolfunction-calling-support)
|
|
- [Testing Your Proxy](#testing-your-proxy)
|
|
- [Examples](#examples)
|
|
|
|
---
|
|
|
|
## How the Proxy Works
|
|
|
|
### The Problem
|
|
|
|
Claude Code and the Claude Agent SDK expect requests in **Anthropic's Messages API format**. When you want to use cheaper alternatives (OpenRouter, Gemini, local models), you need to:
|
|
|
|
1. Translate Anthropic request format → Provider's format
|
|
2. Forward request to the provider's API
|
|
3. Translate provider's response → Anthropic response format
|
|
4. Return to Claude Code/SDK (which thinks it's talking to Anthropic)
|
|
|
|
### The Solution
|
|
|
|
A transparent HTTP proxy that sits between Claude Code and the LLM provider:
|
|
|
|
```
|
|
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Claude Code │──────▶│ Proxy Server │──────▶│ Provider │
|
|
│ /SDK │ │ (localhost) │ │ (OpenRouter, │
|
|
│ │◀──────│ │◀──────│ Gemini, etc)│
|
|
└─────────────┘ └──────────────┘ └──────────────┘
|
|
Anthropic API Translates Provider API
|
|
```
|
|
|
|
**Key Benefits:**
|
|
- ✅ No code changes to Claude Code or Agent SDK
|
|
- ✅ 99% cost savings with OpenRouter models
|
|
- ✅ 100% free with Gemini free tier
|
|
- ✅ All MCP tools work through the proxy
|
|
- ✅ Streaming support
|
|
- ✅ Function/tool calling support
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
### File Structure
|
|
|
|
```
|
|
src/proxy/
|
|
├── anthropic-to-openrouter.ts # OpenRouter proxy
|
|
├── anthropic-to-gemini.ts # Gemini proxy
|
|
└── provider-instructions.ts # Model-specific configs
|
|
```
|
|
|
|
### Core Components
|
|
|
|
#### 1. **Express Server**
|
|
- Listens on port 3000 (configurable)
|
|
- Handles `/v1/messages` endpoint (Anthropic's Messages API)
|
|
- Health check at `/health`
|
|
|
|
#### 2. **Request Converter**
|
|
Translates Anthropic → Provider format:
|
|
```typescript
|
|
private convertAnthropicToOpenAI(anthropicReq: AnthropicRequest): OpenAIRequest {
|
|
// 1. Extract system prompt
|
|
// 2. Convert messages array
|
|
// 3. Convert tools (if present)
|
|
// 4. Map model names
|
|
// 5. Apply provider-specific configs
|
|
}
|
|
```
|
|
|
|
#### 3. **Response Converter**
|
|
Translates Provider → Anthropic format:
|
|
```typescript
|
|
private convertOpenAIToAnthropic(openaiRes: any): any {
|
|
// 1. Extract choice/candidate
|
|
// 2. Convert tool_calls → tool_use blocks
|
|
// 3. Extract text content
|
|
// 4. Map finish reasons
|
|
// 5. Convert usage stats
|
|
}
|
|
```
|
|
|
|
#### 4. **Streaming Handler**
|
|
For real-time token-by-token output:
|
|
```typescript
|
|
private convertOpenAIStreamToAnthropic(chunk: string): string {
|
|
// Convert SSE format: OpenAI → Anthropic
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Adding New Cloud Providers
|
|
|
|
### Example: Adding Mistral AI
|
|
|
|
**Step 1: Create proxy file**
|
|
|
|
`src/proxy/anthropic-to-mistral.ts`:
|
|
|
|
```typescript
|
|
import express, { Request, Response } from 'express';
|
|
import { logger } from '../utils/logger.js';
|
|
|
|
interface MistralMessage {
|
|
role: 'system' | 'user' | 'assistant';
|
|
content: string;
|
|
}
|
|
|
|
interface MistralRequest {
|
|
model: string;
|
|
messages: MistralMessage[];
|
|
temperature?: number;
|
|
max_tokens?: number;
|
|
stream?: boolean;
|
|
}
|
|
|
|
export class AnthropicToMistralProxy {
|
|
private app: express.Application;
|
|
private mistralApiKey: string;
|
|
private mistralBaseUrl: string;
|
|
private defaultModel: string;
|
|
|
|
constructor(config: {
|
|
mistralApiKey: string;
|
|
mistralBaseUrl?: string;
|
|
defaultModel?: string;
|
|
}) {
|
|
this.app = express();
|
|
this.mistralApiKey = config.mistralApiKey;
|
|
this.mistralBaseUrl = config.mistralBaseUrl || 'https://api.mistral.ai/v1';
|
|
this.defaultModel = config.defaultModel || 'mistral-large-latest';
|
|
|
|
this.setupMiddleware();
|
|
this.setupRoutes();
|
|
}
|
|
|
|
private setupMiddleware(): void {
|
|
this.app.use(express.json({ limit: '50mb' }));
|
|
}
|
|
|
|
private setupRoutes(): void {
|
|
// Health check
|
|
this.app.get('/health', (req: Request, res: Response) => {
|
|
res.json({ status: 'ok', service: 'anthropic-to-mistral-proxy' });
|
|
});
|
|
|
|
// Main conversion endpoint
|
|
this.app.post('/v1/messages', async (req: Request, res: Response) => {
|
|
try {
|
|
const anthropicReq = req.body;
|
|
|
|
// Convert Anthropic → Mistral
|
|
const mistralReq = this.convertAnthropicToMistral(anthropicReq);
|
|
|
|
// Forward to Mistral
|
|
const response = await fetch(`${this.mistralBaseUrl}/chat/completions`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Bearer ${this.mistralApiKey}`,
|
|
'Content-Type': 'application/json'
|
|
},
|
|
body: JSON.stringify(mistralReq)
|
|
});
|
|
|
|
if (!response.ok) {
|
|
const error = await response.text();
|
|
logger.error('Mistral API error', { status: response.status, error });
|
|
return res.status(response.status).json({
|
|
error: { type: 'api_error', message: error }
|
|
});
|
|
}
|
|
|
|
// Convert Mistral → Anthropic
|
|
const mistralRes = await response.json();
|
|
const anthropicRes = this.convertMistralToAnthropic(mistralRes);
|
|
|
|
res.json(anthropicRes);
|
|
} catch (error: any) {
|
|
logger.error('Mistral proxy error', { error: error.message });
|
|
res.status(500).json({
|
|
error: { type: 'proxy_error', message: error.message }
|
|
});
|
|
}
|
|
});
|
|
}
|
|
|
|
private convertAnthropicToMistral(anthropicReq: any): MistralRequest {
|
|
const messages: MistralMessage[] = [];
|
|
|
|
// Add system prompt if present
|
|
if (anthropicReq.system) {
|
|
messages.push({
|
|
role: 'system',
|
|
content: typeof anthropicReq.system === 'string'
|
|
? anthropicReq.system
|
|
: anthropicReq.system.map((b: any) => b.text).join('\n')
|
|
});
|
|
}
|
|
|
|
// Convert messages
|
|
for (const msg of anthropicReq.messages) {
|
|
messages.push({
|
|
role: msg.role,
|
|
content: typeof msg.content === 'string'
|
|
? msg.content
|
|
: msg.content.filter((b: any) => b.type === 'text').map((b: any) => b.text).join('\n')
|
|
});
|
|
}
|
|
|
|
return {
|
|
model: this.defaultModel,
|
|
messages,
|
|
temperature: anthropicReq.temperature,
|
|
max_tokens: anthropicReq.max_tokens || 4096,
|
|
stream: anthropicReq.stream || false
|
|
};
|
|
}
|
|
|
|
private convertMistralToAnthropic(mistralRes: any): any {
|
|
const choice = mistralRes.choices?.[0];
|
|
if (!choice) throw new Error('No choices in Mistral response');
|
|
|
|
const content = choice.message?.content || '';
|
|
|
|
return {
|
|
id: mistralRes.id || `msg_${Date.now()}`,
|
|
type: 'message',
|
|
role: 'assistant',
|
|
model: mistralRes.model,
|
|
content: [{ type: 'text', text: content }],
|
|
stop_reason: choice.finish_reason === 'stop' ? 'end_turn' : 'max_tokens',
|
|
usage: {
|
|
input_tokens: mistralRes.usage?.prompt_tokens || 0,
|
|
output_tokens: mistralRes.usage?.completion_tokens || 0
|
|
}
|
|
};
|
|
}
|
|
|
|
public start(port: number): void {
|
|
this.app.listen(port, () => {
|
|
logger.info('Mistral proxy started', { port });
|
|
console.log(`\n✅ Mistral Proxy running at http://localhost:${port}\n`);
|
|
});
|
|
}
|
|
}
|
|
|
|
// CLI entry point
|
|
if (import.meta.url === `file://${process.argv[1]}`) {
|
|
const port = parseInt(process.env.PORT || '3000');
|
|
const mistralApiKey = process.env.MISTRAL_API_KEY;
|
|
|
|
if (!mistralApiKey) {
|
|
console.error('❌ Error: MISTRAL_API_KEY environment variable required');
|
|
process.exit(1);
|
|
}
|
|
|
|
const proxy = new AnthropicToMistralProxy({ mistralApiKey });
|
|
proxy.start(port);
|
|
}
|
|
```
|
|
|
|
**Step 2: Update TypeScript build**
|
|
|
|
Add to `config/tsconfig.json` if needed (usually auto-detected).
|
|
|
|
**Step 3: Test the proxy**
|
|
|
|
```bash
|
|
# Terminal 1: Start proxy
|
|
export MISTRAL_API_KEY=your-key-here
|
|
npm run build
|
|
node dist/proxy/anthropic-to-mistral.js
|
|
|
|
# Terminal 2: Use with Claude Code
|
|
export ANTHROPIC_BASE_URL=http://localhost:3000
|
|
export ANTHROPIC_API_KEY=dummy-key
|
|
npx agentic-flow --agent coder --task "Write hello world"
|
|
```
|
|
|
|
---
|
|
|
|
## Adding Local LLM Providers
|
|
|
|
### Example: Adding Ollama Support
|
|
|
|
**Step 1: Create proxy file**
|
|
|
|
`src/proxy/anthropic-to-ollama.ts`:
|
|
|
|
```typescript
|
|
import express, { Request, Response } from 'express';
|
|
import { logger } from '../utils/logger.js';
|
|
|
|
export class AnthropicToOllamaProxy {
|
|
private app: express.Application;
|
|
private ollamaBaseUrl: string;
|
|
private defaultModel: string;
|
|
|
|
constructor(config: {
|
|
ollamaBaseUrl?: string;
|
|
defaultModel?: string;
|
|
}) {
|
|
this.app = express();
|
|
this.ollamaBaseUrl = config.ollamaBaseUrl || 'http://localhost:11434';
|
|
this.defaultModel = config.defaultModel || 'llama3.3:70b';
|
|
|
|
this.setupMiddleware();
|
|
this.setupRoutes();
|
|
}
|
|
|
|
private setupMiddleware(): void {
|
|
this.app.use(express.json({ limit: '50mb' }));
|
|
}
|
|
|
|
private setupRoutes(): void {
|
|
this.app.get('/health', (req: Request, res: Response) => {
|
|
res.json({ status: 'ok', service: 'anthropic-to-ollama-proxy' });
|
|
});
|
|
|
|
this.app.post('/v1/messages', async (req: Request, res: Response) => {
|
|
try {
|
|
const anthropicReq = req.body;
|
|
|
|
// Build prompt from messages
|
|
let prompt = '';
|
|
if (anthropicReq.system) {
|
|
prompt += `System: ${anthropicReq.system}\n\n`;
|
|
}
|
|
|
|
for (const msg of anthropicReq.messages) {
|
|
const content = typeof msg.content === 'string'
|
|
? msg.content
|
|
: msg.content.filter((b: any) => b.type === 'text').map((b: any) => b.text).join('\n');
|
|
|
|
prompt += `${msg.role === 'user' ? 'Human' : 'Assistant'}: ${content}\n\n`;
|
|
}
|
|
|
|
prompt += 'Assistant: ';
|
|
|
|
// Call Ollama API
|
|
const response = await fetch(`${this.ollamaBaseUrl}/api/generate`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
model: this.defaultModel,
|
|
prompt,
|
|
stream: false,
|
|
options: {
|
|
temperature: anthropicReq.temperature || 0.7,
|
|
num_predict: anthropicReq.max_tokens || 4096
|
|
}
|
|
})
|
|
});
|
|
|
|
if (!response.ok) {
|
|
const error = await response.text();
|
|
logger.error('Ollama API error', { status: response.status, error });
|
|
return res.status(response.status).json({
|
|
error: { type: 'api_error', message: error }
|
|
});
|
|
}
|
|
|
|
const ollamaRes = await response.json();
|
|
|
|
// Convert to Anthropic format
|
|
const anthropicRes = {
|
|
id: `msg_${Date.now()}`,
|
|
type: 'message',
|
|
role: 'assistant',
|
|
model: this.defaultModel,
|
|
content: [{ type: 'text', text: ollamaRes.response }],
|
|
stop_reason: ollamaRes.done ? 'end_turn' : 'max_tokens',
|
|
usage: {
|
|
input_tokens: ollamaRes.prompt_eval_count || 0,
|
|
output_tokens: ollamaRes.eval_count || 0
|
|
}
|
|
};
|
|
|
|
res.json(anthropicRes);
|
|
} catch (error: any) {
|
|
logger.error('Ollama proxy error', { error: error.message });
|
|
res.status(500).json({
|
|
error: { type: 'proxy_error', message: error.message }
|
|
});
|
|
}
|
|
});
|
|
}
|
|
|
|
public start(port: number): void {
|
|
this.app.listen(port, () => {
|
|
logger.info('Ollama proxy started', { port, ollamaBaseUrl: this.ollamaBaseUrl });
|
|
console.log(`\n✅ Ollama Proxy running at http://localhost:${port}`);
|
|
console.log(` Ollama Server: ${this.ollamaBaseUrl}`);
|
|
console.log(` Default Model: ${this.defaultModel}\n`);
|
|
});
|
|
}
|
|
}
|
|
|
|
// CLI entry point
|
|
if (import.meta.url === `file://${process.argv[1]}`) {
|
|
const port = parseInt(process.env.PORT || '3000');
|
|
|
|
const proxy = new AnthropicToOllamaProxy({
|
|
ollamaBaseUrl: process.env.OLLAMA_BASE_URL,
|
|
defaultModel: process.env.OLLAMA_MODEL || 'llama3.3:70b'
|
|
});
|
|
|
|
proxy.start(port);
|
|
}
|
|
```
|
|
|
|
**Step 2: Start Ollama server**
|
|
|
|
```bash
|
|
# Install Ollama (https://ollama.ai)
|
|
curl -fsSL https://ollama.ai/install.sh | sh
|
|
|
|
# Pull a model
|
|
ollama pull llama3.3:70b
|
|
|
|
# Server starts automatically on port 11434
|
|
```
|
|
|
|
**Step 3: Use with Agentic Flow**
|
|
|
|
```bash
|
|
# Terminal 1: Start proxy
|
|
npm run build
|
|
node dist/proxy/anthropic-to-ollama.js
|
|
|
|
# Terminal 2: Use with agents
|
|
export ANTHROPIC_BASE_URL=http://localhost:3000
|
|
export ANTHROPIC_API_KEY=dummy-key
|
|
npx agentic-flow --agent coder --task "Write hello world"
|
|
```
|
|
|
|
---
|
|
|
|
## Message Format Conversion
|
|
|
|
### Anthropic Messages API Format
|
|
|
|
```json
|
|
{
|
|
"model": "claude-3-5-sonnet-20241022",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Hello!"
|
|
}
|
|
],
|
|
"system": "You are a helpful assistant",
|
|
"max_tokens": 1024,
|
|
"temperature": 0.7
|
|
}
|
|
```
|
|
|
|
### OpenAI Chat Completions Format
|
|
|
|
```json
|
|
{
|
|
"model": "gpt-4",
|
|
"messages": [
|
|
{
|
|
"role": "system",
|
|
"content": "You are a helpful assistant"
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": "Hello!"
|
|
}
|
|
],
|
|
"max_tokens": 1024,
|
|
"temperature": 0.7
|
|
}
|
|
```
|
|
|
|
### Gemini generateContent Format
|
|
|
|
```json
|
|
{
|
|
"contents": [
|
|
{
|
|
"role": "user",
|
|
"parts": [
|
|
{
|
|
"text": "System: You are a helpful assistant\n\nHello!"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"generationConfig": {
|
|
"temperature": 0.7,
|
|
"maxOutputTokens": 1024
|
|
}
|
|
}
|
|
```
|
|
|
|
### Key Differences
|
|
|
|
| Feature | Anthropic | OpenAI | Gemini |
|
|
|---------|-----------|--------|--------|
|
|
| **System Prompt** | Separate `system` field | First message with `role: "system"` | Prepended to first user message |
|
|
| **Message Content** | String or array of blocks | Always string | Array of `parts` with `text` |
|
|
| **Role Names** | `user`, `assistant` | `user`, `assistant`, `system` | `user`, `model` |
|
|
| **Max Tokens** | `max_tokens` | `max_tokens` | `generationConfig.maxOutputTokens` |
|
|
| **Response Format** | `content` array with typed blocks | `message.content` string | `candidates[0].content.parts[0].text` |
|
|
|
|
---
|
|
|
|
## Tool/Function Calling Support
|
|
|
|
### Anthropic Tool Format
|
|
|
|
```json
|
|
{
|
|
"tools": [
|
|
{
|
|
"name": "get_weather",
|
|
"description": "Get weather for a location",
|
|
"input_schema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": { "type": "string" }
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### OpenAI Tool Format
|
|
|
|
```json
|
|
{
|
|
"tools": [
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_weather",
|
|
"description": "Get weather for a location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": { "type": "string" }
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Conversion Logic
|
|
|
|
```typescript
|
|
// Anthropic → OpenAI
|
|
if (anthropicReq.tools && anthropicReq.tools.length > 0) {
|
|
openaiReq.tools = anthropicReq.tools.map(tool => ({
|
|
type: 'function',
|
|
function: {
|
|
name: tool.name,
|
|
description: tool.description || '',
|
|
parameters: tool.input_schema || {
|
|
type: 'object',
|
|
properties: {},
|
|
required: []
|
|
}
|
|
}
|
|
}));
|
|
}
|
|
|
|
// OpenAI → Anthropic (tool_calls in response)
|
|
if (message.tool_calls && message.tool_calls.length > 0) {
|
|
for (const toolCall of message.tool_calls) {
|
|
contentBlocks.push({
|
|
type: 'tool_use',
|
|
id: toolCall.id,
|
|
name: toolCall.function.name,
|
|
input: JSON.parse(toolCall.function.arguments)
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Your Proxy
|
|
|
|
### Unit Tests
|
|
|
|
Create `tests/proxy-mistral.test.ts`:
|
|
|
|
```typescript
|
|
import { AnthropicToMistralProxy } from '../src/proxy/anthropic-to-mistral.js';
|
|
import fetch from 'node-fetch';
|
|
|
|
describe('Mistral Proxy', () => {
|
|
let proxy: AnthropicToMistralProxy;
|
|
const port = 3001;
|
|
|
|
beforeAll(() => {
|
|
proxy = new AnthropicToMistralProxy({
|
|
mistralApiKey: process.env.MISTRAL_API_KEY || 'test-key'
|
|
});
|
|
proxy.start(port);
|
|
});
|
|
|
|
it('should convert Anthropic request to Mistral format', async () => {
|
|
const response = await fetch(`http://localhost:${port}/v1/messages`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
model: 'claude-3-5-sonnet-20241022',
|
|
messages: [{ role: 'user', content: 'Hello!' }],
|
|
max_tokens: 100
|
|
})
|
|
});
|
|
|
|
expect(response.ok).toBe(true);
|
|
const data = await response.json();
|
|
expect(data).toHaveProperty('content');
|
|
expect(data.role).toBe('assistant');
|
|
});
|
|
});
|
|
```
|
|
|
|
### Manual Testing
|
|
|
|
```bash
|
|
# Test health check
|
|
curl http://localhost:3000/health
|
|
|
|
# Test message endpoint
|
|
curl -X POST http://localhost:3000/v1/messages \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "claude-3-5-sonnet-20241022",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"max_tokens": 100
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Complete Example: Adding Cohere
|
|
|
|
See full implementation: [examples/proxy-cohere.ts](../examples/proxy-cohere.ts)
|
|
|
|
### Integration with Agentic Flow
|
|
|
|
```typescript
|
|
// src/cli-proxy.ts - Add new provider option
|
|
if (options.provider === 'mistral' || process.env.USE_MISTRAL) {
|
|
// Start Mistral proxy
|
|
const proxy = new AnthropicToMistralProxy({
|
|
mistralApiKey: process.env.MISTRAL_API_KEY!
|
|
});
|
|
proxy.start(3000);
|
|
|
|
// Set environment for SDK
|
|
process.env.ANTHROPIC_BASE_URL = 'http://localhost:3000';
|
|
process.env.ANTHROPIC_API_KEY = 'dummy-key';
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
1. **Error Handling**: Always catch and log errors with context
|
|
2. **Streaming**: Support both streaming and non-streaming modes
|
|
3. **Tool Calling**: Handle MCP tools via native function calling when possible
|
|
4. **Logging**: Use verbose logging during development, info in production
|
|
5. **API Keys**: Never hardcode keys, use environment variables
|
|
6. **Health Checks**: Always provide a `/health` endpoint
|
|
7. **Rate Limiting**: Respect provider rate limits
|
|
8. **Timeouts**: Set appropriate timeouts for API calls
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
- [Anthropic Messages API](https://docs.anthropic.com/en/api/messages)
|
|
- [OpenAI Chat Completions](https://platform.openai.com/docs/api-reference/chat)
|
|
- [Google Gemini API](https://ai.google.dev/gemini-api/docs)
|
|
- [OpenRouter API](https://openrouter.ai/docs)
|
|
- [Ollama API](https://github.com/ollama/ollama/blob/main/docs/api.md)
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
Need help adding a provider? Open an issue: [GitHub Issues](https://github.com/ruvnet/agentic-flow/issues)
|