tasq/node_modules/agentic-flow/.claude/openrouter-models-research.md

12 KiB

Best OpenRouter Models for Claude Code Tool Use

Research Date: October 6, 2025 Research Focus: Models supporting tool/function calling that are cheap, fast, and high-quality


Executive Summary

This research identifies the top 5 OpenRouter models optimized for Claude Code's tool calling requirements, balancing cost-effectiveness, speed, and quality. Mistral Small 3.1 24B emerges as the best overall value at $0.02/$0.04 per million tokens, while several FREE options are available including DeepSeek V3 0324 and Gemini 2.0 Flash.


🥇 1. Mistral Small 3.1 24B

Model ID: mistralai/mistral-small-3.1-24b

  • Cost: $0.02/M input tokens | $0.04/M output tokens
  • Tool Support: Excellent (optimized for function calling)
  • Speed: Fast (low-latency)
  • Context: 128K tokens
  • Quality: High

Why Choose This:

  • Specifically optimized for function calling APIs and JSON-structured outputs
  • Best cost-to-performance ratio for tool use
  • Low-latency responses ideal for interactive Claude Code workflows
  • Excellent at structured outputs and tool implementation

Best For: Production Claude Code deployments requiring reliable, fast tool calling at minimal cost.


🥈 2. Cohere Command R7B (12-2024)

Model ID: cohere/command-r7b-12-2024

  • Cost: $0.038/M input tokens | $0.15/M output tokens
  • Tool Support: Excellent
  • Speed: Very Fast
  • Context: 128K tokens
  • Quality: High

Why Choose This:

  • Cheapest overall option among premium tool-calling models
  • Excels at RAG, tool use, agents, and complex reasoning
  • 7B parameter model - very efficient and fast
  • Updated December 2024 with latest improvements

Best For: Budget-conscious deployments needing excellent tool calling and agent capabilities.


🥉 3. Qwen Turbo

Model ID: qwen/qwen-turbo

  • Cost: $0.05/M input tokens | $0.20/M output tokens
  • Tool Support: Good
  • Speed: Very Fast (turbo-optimized)
  • Context: 1M tokens (!)
  • Quality: Good

Why Choose This:

  • Massive 1M context window at budget pricing
  • Very fast response times
  • Good tool calling support
  • Cached tokens at $0.02/M for repeated queries

Notes:

  • Model is deprecated (Alibaba recommends Qwen-Flash)
  • Still available and functional on OpenRouter
  • Consider qwen/qwen-flash as alternative

Best For: Projects needing large context windows with tool calling at low cost.


🏆 4. DeepSeek Chat

Model ID: deepseek/deepseek-chat

  • Cost: $0.23/M input tokens | $0.90/M output tokens
  • Tool Support: Good
  • Speed: Fast
  • Context: 131K tokens
  • Quality: Very High

Special Note: DeepSeek V3 0324 is available COMPLETELY FREE on OpenRouter!

  • Model ID: deepseek/deepseek-chat-v3-0324:free
  • Zero cost for input and output tokens
  • Unprecedented free tier offering

Why Choose This:

  • Strong reasoning capabilities
  • Automatic prompt caching (no config needed)
  • Good agentic workflow support
  • Chinese company - excellent multilingual support

Best For:

  • Free tier: Experimentation and development
  • Paid tier: Production deployments needing strong reasoning

5. Google Gemini 2.0 Flash Experimental (FREE)

Model ID: google/gemini-2.0-flash-exp:free

  • Cost: $0.00 (FREE tier)
  • Tool Support: Excellent (enhanced function calling)
  • Speed: Very Fast
  • Context: 1M tokens
  • Quality: Very High

Free Tier Limits:

  • 20 requests per minute
  • 50 requests per day (if account has <$10 credits)
  • No daily limit if account has $10+ credits

Why Choose This:

  • Completely free with generous limits
  • Enhanced function calling in 2.0 version
  • Multimodal understanding capabilities
  • Strong coding performance
  • Most popular model on OpenRouter for tool calling (5M+ requests/week)

Paid Alternative:

  • google/gemini-2.0-flash-001: $0.125/M input | $0.5/M output
  • google/gemini-2.0-flash-lite-001: $0.075/M input | $0.3/M output

Best For: Development, testing, and low-volume production use cases.


Honorable Mentions

Meta Llama 3.3 70B Instruct (FREE)

Model ID: meta-llama/llama-3.3-70b-instruct:free

  • Cost: $0.00 (FREE)
  • Tool Support: Good
  • Speed: Moderate
  • Context: 128K tokens
  • Quality: Very High

Notes:

  • Completely free for training/development
  • 70B parameters - strong capabilities
  • Your requests may be used for training
  • Also available: meta-llama/llama-3.3-8b-instruct:free

Microsoft Phi-4

Model ID: microsoft/phi-4

  • Cost: $0.07/M input | $0.14/M output
  • Tool Support: Good
  • Speed: Fast
  • Context: 16K tokens
  • Quality: Good for size

Alternative: microsoft/phi-4-reasoning-plus at $0.07/M input | $0.35/M output for enhanced reasoning.


Tool Calling Accuracy Rankings

Based on OpenRouter's official benchmarks:

Rank Model Accuracy Notes
🥇 1 GPT-5 99.7% Highest accuracy (expensive)
🥈 2 Claude 4.1 Opus 99.5% Near-perfect (expensive)
🏆 Gemini 2.5 Flash - Most popular (5M+ requests/week)

Key Insight: While GPT-5 and Claude 4.1 Opus lead in accuracy, Gemini 2.5 Flash's popularity suggests excellent real-world performance at much lower cost.


Cost Comparison Table

Model Input $/M Output $/M Total $/M (50/50) Free Tier
Mistral Small 3.1 $0.02 $0.04 $0.03
Command R7B $0.038 $0.15 $0.094
Qwen Turbo $0.05 $0.20 $0.125
DeepSeek V3 0324 $0.00 $0.00 $0.00 FREE
Gemini 2.0 Flash $0.00 $0.00 $0.00 FREE
Llama 3.3 70B $0.00 $0.00 $0.00 FREE
DeepSeek Chat (paid) $0.23 $0.90 $0.565
Phi-4 $0.07 $0.14 $0.105

Note: "Total $/M (50/50)" assumes equal input/output token usage


OpenRouter-Specific Tips

1. Use Model Suffixes for Optimization

:free suffix - Access free tier versions:

google/gemini-2.0-flash-exp:free
meta-llama/llama-3.3-70b-instruct:free
deepseek/deepseek-chat-v3-0324:free

:floor suffix - Get cheapest provider:

deepseek/deepseek-chat:floor

This automatically routes to the cheapest available provider for that model.

:nitro suffix - Get fastest throughput:

anthropic/claude-3.5-sonnet:nitro

2. Filter for Tool Support

Visit: https://openrouter.ai/models?supported_parameters=tools

This shows only models with verified tool/function calling support.

3. No Extra Charges for Tool Calling

OpenRouter charges based on token usage only. Tool calling doesn't incur additional fees - you only pay for:

  • Input tokens (your prompts + tool definitions)
  • Output tokens (model responses + tool calls)

4. Automatic Prompt Caching

Some models (like DeepSeek) have automatic prompt caching:

  • No configuration needed
  • Reduces costs for repeated queries
  • Speeds up responses

5. Free Tier Rate Limits

For models with :free suffix:

  • 20 requests per minute (all free models)
  • 50 requests per day if account balance < $10
  • Unlimited daily requests if account balance ≥ $10

6. OpenRouter Fees

  • 5.5% fee ($0.80 minimum) when purchasing credits
  • No markup on model provider pricing
  • Pay-as-you-go credit system

Use Case Recommendations

For Development & Testing

Recommendation: google/gemini-2.0-flash-exp:free

  • Free tier with generous limits
  • Excellent tool calling
  • Fast responses
  • No cost during development

For Budget Production Deployments

Recommendation: mistralai/mistral-small-3.1-24b

  • Best cost/performance ratio ($0.02/$0.04)
  • Optimized for tool calling
  • Low latency
  • Reliable quality

For Maximum Savings

Recommendation: cohere/command-r7b-12-2024

  • Cheapest paid option ($0.038/$0.15)
  • Excellent agent capabilities
  • Very fast (7B params)
  • Strong tool use support

For Large Context Needs

Recommendation: qwen/qwen-turbo

  • 1M context window
  • Low cost ($0.05/$0.20)
  • Fast responses
  • Good tool support

For High-Quality Reasoning

Recommendation: deepseek/deepseek-chat

  • FREE option available (v3-0324)
  • Strong reasoning capabilities
  • Good for complex workflows
  • Automatic caching

For Multilingual Projects

Recommendation: deepseek/deepseek-chat or qwen/qwen-turbo

  • Chinese models with excellent multilingual support
  • Good tool calling in multiple languages
  • Cost-effective

Implementation Example

Here's how to use these models with agentic-flow:

# Using Mistral Small 3.1 (Best Value)
agentic-flow --agent coder \
  --task "Create a REST API with authentication" \
  --provider openrouter \
  --model "mistralai/mistral-small-3.1-24b"

# Using free Gemini (Development)
agentic-flow --agent researcher \
  --task "Analyze this codebase structure" \
  --provider openrouter \
  --model "google/gemini-2.0-flash-exp:free"

# Using DeepSeek (Free Tier)
agentic-flow --agent analyst \
  --task "Review code quality" \
  --provider openrouter \
  --model "deepseek/deepseek-chat-v3-0324:free"

# Using floor routing (Cheapest)
agentic-flow --agent optimizer \
  --task "Optimize database queries" \
  --provider openrouter \
  --model "deepseek/deepseek-chat:floor"

Key Research Findings

  1. No Extra Tool Calling Fees: OpenRouter charges only for tokens, not for tool usage
  2. Free Tier Available: Multiple high-quality FREE models with tool support
  3. Cost Range: From $0 (free) to $0.90/M output tokens
  4. Quality Trade-offs: Even cheapest models (Mistral Small 3.1) offer excellent tool calling
  5. Speed Leaders: Qwen Turbo, Gemini 2.0 Flash, Command R7B are fastest
  6. Popularity != Accuracy: Gemini 2.5 Flash most used despite GPT-5/Claude leading accuracy
  7. Chinese Models Competitive: DeepSeek and Qwen offer excellent value and capabilities
  8. Free Options Viable: Free tier models are production-ready for many use cases

Migration Path

From Anthropic Claude

  1. Development: Switch to google/gemini-2.0-flash-exp:free
  2. Production: Switch to mistralai/mistral-small-3.1-24b
  3. Savings: ~97% cost reduction (Claude Sonnet: $3/$15 vs Mistral: $0.02/$0.04)

From OpenAI GPT-4

  1. Development: Switch to deepseek/deepseek-chat-v3-0324:free
  2. Production: Switch to cohere/command-r7b-12-2024
  3. Savings: ~99% cost reduction (GPT-4: $30/$60 vs Command R7B: $0.038/$0.15)

Monitoring & Optimization

Track Your Usage

OpenRouter provides detailed analytics:

  • Token usage per model
  • Cost breakdown
  • Response times
  • Error rates

Test these models with your actual workload:

  1. Start with free tier (Gemini/DeepSeek)
  2. Compare with Mistral Small 3.1
  3. Measure: accuracy, speed, cost
  4. Choose based on your requirements

Cost Optimization Tips

  1. Use :floor suffix for automatic cheapest routing
  2. Enable prompt caching where available
  3. Batch requests when possible
  4. Use free tier for non-critical workloads
  5. Monitor and adjust based on actual usage patterns

Conclusion

For Claude Code tool use on OpenRouter, the clear winners are:

🏆 Best Overall Value: mistralai/mistral-small-3.1-24b

  • Optimized for tool calling at unbeatable pricing

🆓 Best Free Option: google/gemini-2.0-flash-exp:free

  • Production-ready free tier with excellent capabilities

💰 Maximum Savings: cohere/command-r7b-12-2024

  • Cheapest paid option with strong performance

All three models offer excellent tool calling support, fast responses, and high-quality outputs suitable for production Claude Code deployments.


Additional Resources


Research Conducted By: Claude Code Research Agent Last Updated: October 6, 2025 Methodology: Web research, documentation review, pricing analysis, benchmark comparison