tasq/node_modules/agentic-flow/docs/archived/OPTIMIZATION_SUMMARY.md

6.0 KiB

Multi-Provider Tool Instruction Optimization - Summary

Work Completed

1. Corrected Invalid Model IDs

File: test-top20-models.ts

Fixed model IDs that were returning HTTP 400/404 errors:

  • deepseek/deepseek-v3.1:freedeepseek/deepseek-chat-v3.1:free
  • deepseek/deepseek-v3deepseek/deepseek-v3.2-exp
  • google/gemma-3-12bgoogle/gemma-2-27b-it

2. Created Provider-Specific Instructions

File: src/proxy/provider-instructions.ts (new)

Implemented 7 specialized instruction templates:

Provider Strategy Key Feature
Anthropic Native tool calling Minimal instructions, native support
OpenAI Strong XML emphasis "CRITICAL: Use exact XML formats"
Google Step-by-step guidance Detailed numbered steps
Meta/Llama Clear & concise Simple, direct examples
DeepSeek Technical precision Structured command parsing focus
Mistral Action-oriented "ACTION REQUIRED" urgency
X.AI/Grok Balanced clarity Straightforward command list

3. Integrated Instructions into OpenRouter Proxy

File: src/proxy/anthropic-to-openrouter.ts

Changes:

  • Added imports: getInstructionsForModel, formatInstructions
  • Created extractProvider() helper method
  • Modified convertAnthropicToOpenAI() to dynamically select instructions based on model ID and provider

Code Flow:

const modelId = anthropicReq.model || this.defaultModel;
const provider = this.extractProvider(modelId);  // e.g., "openai" from "openai/gpt-4"
const instructions = getInstructionsForModel(modelId, provider);
const toolInstructions = formatInstructions(instructions);
// Inject into system message

4. Created Validation Test Suite

File: tests/test-provider-instructions.ts (new)

Comprehensive test covering 7 providers with representative models:

  • Tests one model from each provider family
  • Measures tool usage success rate
  • Reports response times
  • Identifies models needing further optimization

5. Documentation

Files Created:

  • docs/PROVIDER_INSTRUCTION_OPTIMIZATION.md - Detailed technical documentation
  • docs/OPTIMIZATION_SUMMARY.md - This summary

Test Results (Before Optimization)

From TOP20_MODELS_MATRIX.md:

  • Total Models Tested: 20
  • Successful Responses: 14/20 (70%)
  • Models Using Tools: 13/14 successful (92.9%)
  • Avg Response Time: 1686ms

Provider Breakdown (Before):

  • x-ai: 100% (2/2)
  • anthropic: 100% (3/3)
  • google: 100% (3/3)
  • meta-llama: 100% (1/1)
  • openai: 80% (4/5) ⚠️
  • deepseek: 0% (0/0) - Invalid IDs

Issues Identified:

  1. Invalid Model IDs: 6 models (deepseek, gemini, gemma, glm)
  2. No Tool Usage: 1 model (gpt-oss-120b)
  3. Generic Instructions: Same instructions for all providers

Expected Improvements (After Optimization)

Tool Usage Success Rate:

  • Before: 92.9% (13/14)
  • Target: 95-100%

Benefits:

  1. Model-Specific Optimization: Each provider gets tailored instructions matching their strengths
  2. Clearer Prompts: Reduced ambiguity leads to better tool usage
  3. Fixed Model IDs: Previously broken models now testable
  4. Better Debugging: Can identify which instruction templates need refinement

How to Validate

Restart Proxy with Optimizations:

# Kill existing proxies
lsof -ti:3000 | xargs kill -9 2>/dev/null

# Start OpenRouter proxy with optimizations
export OPENROUTER_API_KEY="your-key-here"
npx tsx src/proxy/anthropic-to-openrouter.ts &

Run Provider Instruction Test:

export OPENROUTER_API_KEY="your-key-here"
npx tsx tests/test-provider-instructions.ts

Run Full Top 20 Test (Updated):

export OPENROUTER_API_KEY="your-key-here"
npx tsx test-top20-models.ts > tests/top20-optimized-results.log 2>&1 &

Key Metrics to Monitor

  1. Tool Usage Rate: % of successful responses that use tools
  2. Provider Success Rate: % success per provider family
  3. Response Time: Average time per provider
  4. Error Rate: HTTP errors vs successful responses

Next Steps for User

  1. Set API Key: export OPENROUTER_API_KEY="your-key"
  2. Rebuild: npm run build (already done )
  3. Restart Proxy: Kill old proxy, start with optimizations
  4. Run Tests: Execute provider test and top 20 test
  5. Review Results: Check if tool usage improved to 95%+
  6. Fine-tune: Adjust instructions for any remaining failures

Security Compliance

All hardcoded API keys removed from:

  • tests/test-provider-instructions.ts
  • All test files now require env variables
  • Documentation emphasizes env variable usage

Architecture Summary

User Request
    ↓
OpenRouter Proxy (anthropic-to-openrouter.ts)
    ↓
extractProvider("openai/gpt-4") → "openai"
    ↓
getInstructionsForModel(modelId, "openai") → OPENAI_INSTRUCTIONS
    ↓
formatInstructions() → Optimized prompt
    ↓
OpenRouter API (with model-specific instructions)
    ↓
Model Response (with <file_write> tags)
    ↓
parseStructuredCommands() → tool_use format
    ↓
Claude Agent SDK executes tools ✅

Files Modified/Created

File Status Purpose
src/proxy/provider-instructions.ts Created Instruction templates
src/proxy/anthropic-to-openrouter.ts Enhanced Integration
test-top20-models.ts Updated Fixed model IDs
tests/test-provider-instructions.ts Created Validation test
docs/PROVIDER_INSTRUCTION_OPTIMIZATION.md Created Technical docs
docs/OPTIMIZATION_SUMMARY.md Created This summary

Conclusion

Provider-specific instruction optimization is complete and ready for validation. The system now intelligently selects instruction templates based on model provider, maximizing tool calling success across diverse LLM families while maintaining the same proxy architecture.

Status: Implementation Complete | 🔄 Validation Pending (requires user's API key)