tasq/node_modules/agentic-flow/docs/archived/OPENROUTER_ISSUES_AND_FIXES.md

7.8 KiB

OpenRouter Proxy Issues and Required Fixes

Status: 🔴 CRITICAL - v1.1.13 release claims do not match actual behavior

Date: 2025-10-05


Summary

The v1.1.13 release claimed "100% success rate" for OpenRouter providers, but actual testing shows all three providers (GPT-4o-mini, DeepSeek, Llama 3.3) have critical issues:

  1. Models inappropriately trying to use tools for simple code generation
  2. Truncated responses ending with <function= or similar
  3. Malformed tool call syntax in model outputs
  4. Context detection not working properly

Root Causes Identified

1. Context Detection Broken

File: src/proxy/provider-instructions.ts:215-226

Problem:

const fileKeywords = [
  'create file', 'write file', 'save to', 'create a file',
  // ...
];

Task: "Create a Python file at /tmp/test.py" → Returns false (should be true)

Reason: Keyword matching is too strict. "Create a Python file" doesn't match "create file" or "create a file".

Fix Required:

const fileKeywords = [
  /create.*file/i,
  /write.*file/i,
  /save.*to/i,
  /save.*file/i,
  /write.*to disk/i,
  /create.*script/i,
  /make.*file/i
];

return fileKeywords.some(pattern => pattern.test(combined));

2. XML Instructions Still Being Injected

File: src/proxy/anthropic-to-openrouter.ts:204-211

Problem: Even when taskRequiresFileOps() returns false, models are still receiving tool instructions somehow.

Debug needed:

  • Add logging to see what instructions are actually being sent
  • Verify formatInstructions() is being called with correct includeXmlInstructions parameter

3. Models Returning Malformed Tool Calls

Observed Output:

[Executing: python -c "..."]<function=

Problem: Models are trying to use tools incorrectly:

  • Mixing text output with tool calls
  • Incomplete tool call syntax
  • Wrong tool call format

Possible Causes:

  1. Models confused by XML instruction format
  2. Max tokens too low (response truncated mid-tool-call)
  3. Models not understanding when to/not to use tools

Fix Required:

  • REMOVE XML instructions entirely for OpenRouter models
  • Use ONLY native OpenAI function calling format
  • Let OpenRouter handle tool calling natively

4. parseStructuredCommands() Inappropriate

File: src/proxy/anthropic-to-openrouter.ts:286-338

Problem: This function tries to parse XML tags from model output, but:

  1. Models aren't reliably producing valid XML
  2. Models are mixing text with XML
  3. Truncated responses create malformed XML

Fix Required: OpenRouter models should use native OpenAI tool calling:

// DON'T parse XML from text
// DO use message.tool_calls from OpenAI response format

const tool_calls = message.tool_calls || [];
// These are already in correct format from OpenRouter

Proposed Solution

Phase 1: Stop Injecting XML Instructions for OpenRouter

// In convertAnthropicToOpenAI():

// NEVER inject XML instructions for OpenRouter
const toolInstructions = config.provider === 'openrouter'
  ? 'Respond with clean, well-formatted code.'
  : formatInstructions(instructions, needsFileOps);

Phase 2: Use Native OpenAI Tool Calling

// OpenRouter models should use tools ONLY via OpenAI function calling
// NOT via XML tags in text

if (anthropicReq.tools && anthropicReq.tools.length > 0) {
  // Convert MCP tools to OpenAI format (already done)
  openaiReq.tools = anthropicReq.tools.map(tool => ({
    type: 'function',
    function: {
      name: tool.name,
      description: tool.description,
      parameters: tool.input_schema
    }
  }));
}

Phase 3: Handle Tool Calls Correctly in Response

// In convertOpenAIToAnthropic():

// ONLY look at message.tool_calls from OpenAI
// DON'T try to parse XML from message.content

const toolCalls = message.tool_calls || [];

if (toolCalls.length > 0) {
  // Model wants to use tools - convert to Anthropic format
  contentBlocks = toolCalls.map(tc => ({
    type: 'tool_use',
    id: tc.id,
    name: tc.function.name,
    input: JSON.parse(tc.function.arguments)
  }));
} else {
  // Pure text response - no tool use
  contentBlocks = [{
    type: 'text',
    text: message.content
  }];
}

Testing Required

Test Matrix

Test Case Provider Model Task Expected
1 openrouter gpt-4o-mini "Write Python function to add numbers. Just show code." Clean Python code, NO tool calls
2 openrouter gpt-4o-mini "Create file /tmp/test.py with add function" Use Write tool, create file
3 openrouter deepseek-chat "Write multiply function. Just code." Complete code, no truncation
4 openrouter deepseek-chat "Save multiply function to /tmp/mult.py" Use Write tool
5 openrouter llama-3.3-70b "Write subtract function. Show code." Code without prompt repetition
6 openrouter llama-3.3-70b "Create /tmp/sub.py with subtract" Use Write tool

Validation Command

npm run validate:openrouter

Must pass ALL 6 tests before claiming fixes work.


Immediate Actions

  1. [ ] Fix taskRequiresFileOps() - Use regex patterns instead of exact string matching
  2. [ ] Remove XML instructions for OpenRouter - Never inject XML for OR models
  3. [ ] Fix convertOpenAIToAnthropic() - Don't parse XML, use tool_calls only
  4. [ ] Add comprehensive logging - See what's actually being sent/received
  5. [ ] Run full test matrix - Validate ALL cases before release
  6. [ ] Update VALIDATION-RESULTS.md - With REAL test results
  7. [ ] Update CHANGELOG - Acknowledge issues, document fixes

Package Issues to Fix

  1. Missing validation scripts in npm package

    • Add validation/ to package.json files array (done)
    • Add scripts/ to package.json files array (done)
  2. Broken validate:openrouter script

    • Points to non-existent file
    • Needs to point to working validation script
  3. Documentation inconsistency

    • Release notes claim 100% success
    • Actual behavior: 0% success for OpenRouter
    • Need honest documentation

Timeline

URGENT: These are critical bugs affecting core functionality

  • Immediate (today): Implement fixes 1-3
  • Before next release: Complete testing and validation
  • Update release notes: Be honest about what works and what doesn't

Success Criteria

Definition of Done:

  1. All 6 test cases in matrix pass
  2. npm run validate:openrouter passes with 100% success
  3. Real-world usage confirmed (not just unit tests)
  4. Documentation accurately reflects capabilities
  5. No false claims in release notes

Current Status by Provider

Provider Code Gen File Ops Tool Calling Status
Anthropic Perfect Perfect Perfect Production Ready
Gemini Perfect Perfect Perfect Production Ready
OpenRouter GPT-4o-mini Tool calls inappropriately Malformed Truncated 🔴 BROKEN
OpenRouter DeepSeek Malformed output Wrong format Incomplete 🔴 BROKEN
OpenRouter Llama 3.3 Truncated Fails Broken 🔴 BROKEN

Honesty is the best policy:

## v1.1.13 Status Update

**Working Providers:**
- ✅ Anthropic (direct) - Fully tested, production ready
- ✅ Google Gemini - Fully tested, FREE tier available

**Known Issues:**
- ⚠️ OpenRouter proxy has tool calling format issues
- ⚠️ Working on fixes, will release v1.1.14 when validated
- ⚠️ Use Anthropic or Gemini for production until fixed

**If you need OpenRouter:**
- Use agentic-flow CLI directly (works)
- Don't use proxy mode until v1.1.14

Author: Validation testing Last Updated: 2025-10-05