ihompadmin/tasq

Fork 0

Marc Rejohn Castillano 5cb6561924 added ruflo

2026-04-09 19:01:53 +08:00

7.8 KiB

Raw Permalink Blame History

OpenRouter Proxy Issues and Required Fixes

Status: 🔴 CRITICAL - v1.1.13 release claims do not match actual behavior

Date: 2025-10-05

Summary

The v1.1.13 release claimed "100% success rate" for OpenRouter providers, but actual testing shows all three providers (GPT-4o-mini, DeepSeek, Llama 3.3) have critical issues:

Models inappropriately trying to use tools for simple code generation
Truncated responses ending with <function= or similar
Malformed tool call syntax in model outputs
Context detection not working properly

Root Causes Identified

1. Context Detection Broken

File: src/proxy/provider-instructions.ts:215-226

Problem:

const fileKeywords = [
  'create file', 'write file', 'save to', 'create a file',
  // ...
];

Task: "Create a Python file at /tmp/test.py" → Returns false (should be true)

Reason: Keyword matching is too strict. "Create a Python file" doesn't match "create file" or "create a file".

Fix Required:

const fileKeywords = [
  /create.*file/i,
  /write.*file/i,
  /save.*to/i,
  /save.*file/i,
  /write.*to disk/i,
  /create.*script/i,
  /make.*file/i
];

return fileKeywords.some(pattern => pattern.test(combined));

2. XML Instructions Still Being Injected

File: src/proxy/anthropic-to-openrouter.ts:204-211

Problem: Even when taskRequiresFileOps() returns false, models are still receiving tool instructions somehow.

Debug needed:

Add logging to see what instructions are actually being sent
Verify formatInstructions() is being called with correct includeXmlInstructions parameter

3. Models Returning Malformed Tool Calls

Observed Output:

[Executing: python -c "..."]<function=

Problem: Models are trying to use tools incorrectly:

Mixing text output with tool calls
Incomplete tool call syntax
Wrong tool call format

Possible Causes:

Models confused by XML instruction format
Max tokens too low (response truncated mid-tool-call)
Models not understanding when to/not to use tools

Fix Required:

REMOVE XML instructions entirely for OpenRouter models
Use ONLY native OpenAI function calling format
Let OpenRouter handle tool calling natively

4. parseStructuredCommands() Inappropriate

File: src/proxy/anthropic-to-openrouter.ts:286-338

Problem: This function tries to parse XML tags from model output, but:

Models aren't reliably producing valid XML
Models are mixing text with XML
Truncated responses create malformed XML

Fix Required: OpenRouter models should use native OpenAI tool calling:

// DON'T parse XML from text
// DO use message.tool_calls from OpenAI response format

const tool_calls = message.tool_calls || [];
// These are already in correct format from OpenRouter

Proposed Solution

Phase 1: Stop Injecting XML Instructions for OpenRouter

// In convertAnthropicToOpenAI():

// NEVER inject XML instructions for OpenRouter
const toolInstructions = config.provider === 'openrouter'
  ? 'Respond with clean, well-formatted code.'
  : formatInstructions(instructions, needsFileOps);

Phase 2: Use Native OpenAI Tool Calling

// OpenRouter models should use tools ONLY via OpenAI function calling
// NOT via XML tags in text

if (anthropicReq.tools && anthropicReq.tools.length > 0) {
  // Convert MCP tools to OpenAI format (already done)
  openaiReq.tools = anthropicReq.tools.map(tool => ({
    type: 'function',
    function: {
      name: tool.name,
      description: tool.description,
      parameters: tool.input_schema
    }
  }));
}

Phase 3: Handle Tool Calls Correctly in Response

// In convertOpenAIToAnthropic():

// ONLY look at message.tool_calls from OpenAI
// DON'T try to parse XML from message.content

const toolCalls = message.tool_calls || [];

if (toolCalls.length > 0) {
  // Model wants to use tools - convert to Anthropic format
  contentBlocks = toolCalls.map(tc => ({
    type: 'tool_use',
    id: tc.id,
    name: tc.function.name,
    input: JSON.parse(tc.function.arguments)
  }));
} else {
  // Pure text response - no tool use
  contentBlocks = [{
    type: 'text',
    text: message.content
  }];
}

Testing Required

Test Matrix

Test Case	Provider	Model	Task	Expected
1	openrouter	gpt-4o-mini	"Write Python function to add numbers. Just show code."	Clean Python code, NO tool calls
2	openrouter	gpt-4o-mini	"Create file /tmp/test.py with add function"	Use Write tool, create file
3	openrouter	deepseek-chat	"Write multiply function. Just code."	Complete code, no truncation
4	openrouter	deepseek-chat	"Save multiply function to /tmp/mult.py"	Use Write tool
5	openrouter	llama-3.3-70b	"Write subtract function. Show code."	Code without prompt repetition
6	openrouter	llama-3.3-70b	"Create /tmp/sub.py with subtract"	Use Write tool

Validation Command

npm run validate:openrouter

Must pass ALL 6 tests before claiming fixes work.

Immediate Actions

[ ] Fix taskRequiresFileOps() - Use regex patterns instead of exact string matching
[ ] Remove XML instructions for OpenRouter - Never inject XML for OR models
[ ] Fix convertOpenAIToAnthropic() - Don't parse XML, use tool_calls only
[ ] Add comprehensive logging - See what's actually being sent/received
[ ] Run full test matrix - Validate ALL cases before release
[ ] Update VALIDATION-RESULTS.md - With REAL test results
[ ] Update CHANGELOG - Acknowledge issues, document fixes

Package Issues to Fix

Missing validation scripts in npm package
- Add validation/ to package.json files array ✅ (done)
- Add scripts/ to package.json files array ✅ (done)
Broken validate:openrouter script
- Points to non-existent file
- Needs to point to working validation script
Documentation inconsistency
- Release notes claim 100% success
- Actual behavior: 0% success for OpenRouter
- Need honest documentation

Timeline

URGENT: These are critical bugs affecting core functionality

Immediate (today): Implement fixes 1-3
Before next release: Complete testing and validation
Update release notes: Be honest about what works and what doesn't

Success Criteria

✅ Definition of Done:

All 6 test cases in matrix pass
npm run validate:openrouter passes with 100% success
Real-world usage confirmed (not just unit tests)
Documentation accurately reflects capabilities
No false claims in release notes

Current Status by Provider

Provider	Code Gen	File Ops	Tool Calling	Status
Anthropic	✅ Perfect	✅ Perfect	✅ Perfect	✅ Production Ready
Gemini	✅ Perfect	✅ Perfect	✅ Perfect	✅ Production Ready
OpenRouter GPT-4o-mini	❌ Tool calls inappropriately	❌ Malformed	❌ Truncated	🔴 BROKEN
OpenRouter DeepSeek	❌ Malformed output	❌ Wrong format	❌ Incomplete	🔴 BROKEN
OpenRouter Llama 3.3	❌ Truncated	❌ Fails	❌ Broken	🔴 BROKEN

Recommended User Communication

Honesty is the best policy:

## v1.1.13 Status Update

**Working Providers:**
- ✅ Anthropic (direct) - Fully tested, production ready
- ✅ Google Gemini - Fully tested, FREE tier available

**Known Issues:**
- ⚠️ OpenRouter proxy has tool calling format issues
- ⚠️ Working on fixes, will release v1.1.14 when validated
- ⚠️ Use Anthropic or Gemini for production until fixed

**If you need OpenRouter:**
- Use agentic-flow CLI directly (works)
- Don't use proxy mode until v1.1.14

Author: Validation testing Last Updated: 2025-10-05

7.8 KiB Raw Permalink Blame History