412 lines
12 KiB
Markdown
412 lines
12 KiB
Markdown
# Best OpenRouter Models for Claude Code Tool Use
|
|
|
|
**Research Date:** October 6, 2025
|
|
**Research Focus:** Models supporting tool/function calling that are cheap, fast, and high-quality
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This research identifies the top 5 OpenRouter models optimized for Claude Code's tool calling requirements, balancing cost-effectiveness, speed, and quality. **Mistral Small 3.1 24B** emerges as the best overall value at $0.02/$0.04 per million tokens, while several FREE options are available including DeepSeek V3 0324 and Gemini 2.0 Flash.
|
|
|
|
---
|
|
|
|
## Top 5 Recommended Models
|
|
|
|
### 🥇 1. Mistral Small 3.1 24B
|
|
**Model ID:** `mistralai/mistral-small-3.1-24b`
|
|
|
|
- **Cost:** $0.02/M input tokens | $0.04/M output tokens
|
|
- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (optimized for function calling)
|
|
- **Speed:** ⚡⚡⚡⚡ Fast (low-latency)
|
|
- **Context:** 128K tokens
|
|
- **Quality:** High
|
|
|
|
**Why Choose This:**
|
|
- Specifically optimized for function calling APIs and JSON-structured outputs
|
|
- Best cost-to-performance ratio for tool use
|
|
- Low-latency responses ideal for interactive Claude Code workflows
|
|
- Excellent at structured outputs and tool implementation
|
|
|
|
**Best For:** Production Claude Code deployments requiring reliable, fast tool calling at minimal cost.
|
|
|
|
---
|
|
|
|
### 🥈 2. Cohere Command R7B (12-2024)
|
|
**Model ID:** `cohere/command-r7b-12-2024`
|
|
|
|
- **Cost:** $0.038/M input tokens | $0.15/M output tokens
|
|
- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent
|
|
- **Speed:** ⚡⚡⚡⚡⚡ Very Fast
|
|
- **Context:** 128K tokens
|
|
- **Quality:** High
|
|
|
|
**Why Choose This:**
|
|
- Cheapest overall option among premium tool-calling models
|
|
- Excels at RAG, tool use, agents, and complex reasoning
|
|
- 7B parameter model - very efficient and fast
|
|
- Updated December 2024 with latest improvements
|
|
|
|
**Best For:** Budget-conscious deployments needing excellent tool calling and agent capabilities.
|
|
|
|
---
|
|
|
|
### 🥉 3. Qwen Turbo
|
|
**Model ID:** `qwen/qwen-turbo`
|
|
|
|
- **Cost:** $0.05/M input tokens | $0.20/M output tokens
|
|
- **Tool Support:** ⭐⭐⭐⭐ Good
|
|
- **Speed:** ⚡⚡⚡⚡⚡ Very Fast (turbo-optimized)
|
|
- **Context:** 1M tokens (!)
|
|
- **Quality:** Good
|
|
|
|
**Why Choose This:**
|
|
- Massive 1M context window at budget pricing
|
|
- Very fast response times
|
|
- Good tool calling support
|
|
- Cached tokens at $0.02/M for repeated queries
|
|
|
|
**Notes:**
|
|
- Model is deprecated (Alibaba recommends Qwen-Flash)
|
|
- Still available and functional on OpenRouter
|
|
- Consider `qwen/qwen-flash` as alternative
|
|
|
|
**Best For:** Projects needing large context windows with tool calling at low cost.
|
|
|
|
---
|
|
|
|
### 🏆 4. DeepSeek Chat
|
|
**Model ID:** `deepseek/deepseek-chat`
|
|
|
|
- **Cost:** $0.23/M input tokens | $0.90/M output tokens
|
|
- **Tool Support:** ⭐⭐⭐⭐ Good
|
|
- **Speed:** ⚡⚡⚡⚡ Fast
|
|
- **Context:** 131K tokens
|
|
- **Quality:** Very High
|
|
|
|
**Special Note:**
|
|
**DeepSeek V3 0324 is available COMPLETELY FREE on OpenRouter!**
|
|
- Model ID: `deepseek/deepseek-chat-v3-0324:free`
|
|
- Zero cost for input and output tokens
|
|
- Unprecedented free tier offering
|
|
|
|
**Why Choose This:**
|
|
- Strong reasoning capabilities
|
|
- Automatic prompt caching (no config needed)
|
|
- Good agentic workflow support
|
|
- Chinese company - excellent multilingual support
|
|
|
|
**Best For:**
|
|
- Free tier: Experimentation and development
|
|
- Paid tier: Production deployments needing strong reasoning
|
|
|
|
---
|
|
|
|
### ⭐ 5. Google Gemini 2.0 Flash Experimental (FREE)
|
|
**Model ID:** `google/gemini-2.0-flash-exp:free`
|
|
|
|
- **Cost:** $0.00 (FREE tier)
|
|
- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (enhanced function calling)
|
|
- **Speed:** ⚡⚡⚡⚡⚡ Very Fast
|
|
- **Context:** 1M tokens
|
|
- **Quality:** Very High
|
|
|
|
**Free Tier Limits:**
|
|
- 20 requests per minute
|
|
- 50 requests per day (if account has <$10 credits)
|
|
- No daily limit if account has $10+ credits
|
|
|
|
**Why Choose This:**
|
|
- Completely free with generous limits
|
|
- Enhanced function calling in 2.0 version
|
|
- Multimodal understanding capabilities
|
|
- Strong coding performance
|
|
- Most popular model on OpenRouter for tool calling (5M+ requests/week)
|
|
|
|
**Paid Alternative:**
|
|
- `google/gemini-2.0-flash-001`: $0.125/M input | $0.5/M output
|
|
- `google/gemini-2.0-flash-lite-001`: $0.075/M input | $0.3/M output
|
|
|
|
**Best For:** Development, testing, and low-volume production use cases.
|
|
|
|
---
|
|
|
|
## Honorable Mentions
|
|
|
|
### Meta Llama 3.3 70B Instruct (FREE)
|
|
**Model ID:** `meta-llama/llama-3.3-70b-instruct:free`
|
|
|
|
- **Cost:** $0.00 (FREE)
|
|
- **Tool Support:** ⭐⭐⭐⭐ Good
|
|
- **Speed:** ⚡⚡⚡ Moderate
|
|
- **Context:** 128K tokens
|
|
- **Quality:** Very High
|
|
|
|
**Notes:**
|
|
- Completely free for training/development
|
|
- 70B parameters - strong capabilities
|
|
- Your requests may be used for training
|
|
- Also available: `meta-llama/llama-3.3-8b-instruct:free`
|
|
|
|
---
|
|
|
|
### Microsoft Phi-4
|
|
**Model ID:** `microsoft/phi-4`
|
|
|
|
- **Cost:** $0.07/M input | $0.14/M output
|
|
- **Tool Support:** ⭐⭐⭐ Good
|
|
- **Speed:** ⚡⚡⚡⚡ Fast
|
|
- **Context:** 16K tokens
|
|
- **Quality:** Good for size
|
|
|
|
**Alternative:** `microsoft/phi-4-reasoning-plus` at $0.07/M input | $0.35/M output for enhanced reasoning.
|
|
|
|
---
|
|
|
|
## Tool Calling Accuracy Rankings
|
|
|
|
Based on OpenRouter's official benchmarks:
|
|
|
|
| Rank | Model | Accuracy | Notes |
|
|
|------|-------|----------|-------|
|
|
| 🥇 1 | GPT-5 | 99.7% | Highest accuracy (expensive) |
|
|
| 🥈 2 | Claude 4.1 Opus | 99.5% | Near-perfect (expensive) |
|
|
| 🏆 | Gemini 2.5 Flash | - | Most popular (5M+ requests/week) |
|
|
|
|
**Key Insight:** While GPT-5 and Claude 4.1 Opus lead in accuracy, Gemini 2.5 Flash's popularity suggests excellent real-world performance at much lower cost.
|
|
|
|
---
|
|
|
|
## Cost Comparison Table
|
|
|
|
| Model | Input $/M | Output $/M | Total $/M (50/50) | Free Tier |
|
|
|-------|-----------|------------|-------------------|-----------|
|
|
| Mistral Small 3.1 | $0.02 | $0.04 | $0.03 | ❌ |
|
|
| Command R7B | $0.038 | $0.15 | $0.094 | ❌ |
|
|
| Qwen Turbo | $0.05 | $0.20 | $0.125 | ❌ |
|
|
| DeepSeek V3 0324 | $0.00 | $0.00 | $0.00 | ✅ FREE |
|
|
| Gemini 2.0 Flash | $0.00 | $0.00 | $0.00 | ✅ FREE |
|
|
| Llama 3.3 70B | $0.00 | $0.00 | $0.00 | ✅ FREE |
|
|
| DeepSeek Chat (paid) | $0.23 | $0.90 | $0.565 | ❌ |
|
|
| Phi-4 | $0.07 | $0.14 | $0.105 | ❌ |
|
|
|
|
*Note: "Total $/M (50/50)" assumes equal input/output token usage*
|
|
|
|
---
|
|
|
|
## OpenRouter-Specific Tips
|
|
|
|
### 1. Use Model Suffixes for Optimization
|
|
|
|
**`:free` suffix** - Access free tier versions:
|
|
```
|
|
google/gemini-2.0-flash-exp:free
|
|
meta-llama/llama-3.3-70b-instruct:free
|
|
deepseek/deepseek-chat-v3-0324:free
|
|
```
|
|
|
|
**`:floor` suffix** - Get cheapest provider:
|
|
```
|
|
deepseek/deepseek-chat:floor
|
|
```
|
|
This automatically routes to the cheapest available provider for that model.
|
|
|
|
**`:nitro` suffix** - Get fastest throughput:
|
|
```
|
|
anthropic/claude-3.5-sonnet:nitro
|
|
```
|
|
|
|
### 2. Filter for Tool Support
|
|
|
|
Visit: `https://openrouter.ai/models?supported_parameters=tools`
|
|
|
|
This shows only models with verified tool/function calling support.
|
|
|
|
### 3. No Extra Charges for Tool Calling
|
|
|
|
OpenRouter charges based on token usage only. Tool calling doesn't incur additional fees - you only pay for:
|
|
- Input tokens (your prompts + tool definitions)
|
|
- Output tokens (model responses + tool calls)
|
|
|
|
### 4. Automatic Prompt Caching
|
|
|
|
Some models (like DeepSeek) have automatic prompt caching:
|
|
- No configuration needed
|
|
- Reduces costs for repeated queries
|
|
- Speeds up responses
|
|
|
|
### 5. Free Tier Rate Limits
|
|
|
|
For models with `:free` suffix:
|
|
- **20 requests per minute** (all free models)
|
|
- **50 requests per day** if account balance < $10
|
|
- **Unlimited daily requests** if account balance ≥ $10
|
|
|
|
### 6. OpenRouter Fees
|
|
|
|
- **5.5% fee** ($0.80 minimum) when purchasing credits
|
|
- **No markup** on model provider pricing
|
|
- Pay-as-you-go credit system
|
|
|
|
---
|
|
|
|
## Use Case Recommendations
|
|
|
|
### For Development & Testing
|
|
**Recommendation:** `google/gemini-2.0-flash-exp:free`
|
|
- Free tier with generous limits
|
|
- Excellent tool calling
|
|
- Fast responses
|
|
- No cost during development
|
|
|
|
### For Budget Production Deployments
|
|
**Recommendation:** `mistralai/mistral-small-3.1-24b`
|
|
- Best cost/performance ratio ($0.02/$0.04)
|
|
- Optimized for tool calling
|
|
- Low latency
|
|
- Reliable quality
|
|
|
|
### For Maximum Savings
|
|
**Recommendation:** `cohere/command-r7b-12-2024`
|
|
- Cheapest paid option ($0.038/$0.15)
|
|
- Excellent agent capabilities
|
|
- Very fast (7B params)
|
|
- Strong tool use support
|
|
|
|
### For Large Context Needs
|
|
**Recommendation:** `qwen/qwen-turbo`
|
|
- 1M context window
|
|
- Low cost ($0.05/$0.20)
|
|
- Fast responses
|
|
- Good tool support
|
|
|
|
### For High-Quality Reasoning
|
|
**Recommendation:** `deepseek/deepseek-chat`
|
|
- FREE option available (v3-0324)
|
|
- Strong reasoning capabilities
|
|
- Good for complex workflows
|
|
- Automatic caching
|
|
|
|
### For Multilingual Projects
|
|
**Recommendation:** `deepseek/deepseek-chat` or `qwen/qwen-turbo`
|
|
- Chinese models with excellent multilingual support
|
|
- Good tool calling in multiple languages
|
|
- Cost-effective
|
|
|
|
---
|
|
|
|
## Implementation Example
|
|
|
|
Here's how to use these models with agentic-flow:
|
|
|
|
```bash
|
|
# Using Mistral Small 3.1 (Best Value)
|
|
agentic-flow --agent coder \
|
|
--task "Create a REST API with authentication" \
|
|
--provider openrouter \
|
|
--model "mistralai/mistral-small-3.1-24b"
|
|
|
|
# Using free Gemini (Development)
|
|
agentic-flow --agent researcher \
|
|
--task "Analyze this codebase structure" \
|
|
--provider openrouter \
|
|
--model "google/gemini-2.0-flash-exp:free"
|
|
|
|
# Using DeepSeek (Free Tier)
|
|
agentic-flow --agent analyst \
|
|
--task "Review code quality" \
|
|
--provider openrouter \
|
|
--model "deepseek/deepseek-chat-v3-0324:free"
|
|
|
|
# Using floor routing (Cheapest)
|
|
agentic-flow --agent optimizer \
|
|
--task "Optimize database queries" \
|
|
--provider openrouter \
|
|
--model "deepseek/deepseek-chat:floor"
|
|
```
|
|
|
|
---
|
|
|
|
## Key Research Findings
|
|
|
|
1. **No Extra Tool Calling Fees:** OpenRouter charges only for tokens, not for tool usage
|
|
2. **Free Tier Available:** Multiple high-quality FREE models with tool support
|
|
3. **Cost Range:** From $0 (free) to $0.90/M output tokens
|
|
4. **Quality Trade-offs:** Even cheapest models (Mistral Small 3.1) offer excellent tool calling
|
|
5. **Speed Leaders:** Qwen Turbo, Gemini 2.0 Flash, Command R7B are fastest
|
|
6. **Popularity != Accuracy:** Gemini 2.5 Flash most used despite GPT-5/Claude leading accuracy
|
|
7. **Chinese Models Competitive:** DeepSeek and Qwen offer excellent value and capabilities
|
|
8. **Free Options Viable:** Free tier models are production-ready for many use cases
|
|
|
|
---
|
|
|
|
## Migration Path
|
|
|
|
### From Anthropic Claude
|
|
1. **Development:** Switch to `google/gemini-2.0-flash-exp:free`
|
|
2. **Production:** Switch to `mistralai/mistral-small-3.1-24b`
|
|
3. **Savings:** ~97% cost reduction (Claude Sonnet: $3/$15 vs Mistral: $0.02/$0.04)
|
|
|
|
### From OpenAI GPT-4
|
|
1. **Development:** Switch to `deepseek/deepseek-chat-v3-0324:free`
|
|
2. **Production:** Switch to `cohere/command-r7b-12-2024`
|
|
3. **Savings:** ~99% cost reduction (GPT-4: $30/$60 vs Command R7B: $0.038/$0.15)
|
|
|
|
---
|
|
|
|
## Monitoring & Optimization
|
|
|
|
### Track Your Usage
|
|
OpenRouter provides detailed analytics:
|
|
- Token usage per model
|
|
- Cost breakdown
|
|
- Response times
|
|
- Error rates
|
|
|
|
### A/B Testing Recommended
|
|
Test these models with your actual workload:
|
|
1. Start with free tier (Gemini/DeepSeek)
|
|
2. Compare with Mistral Small 3.1
|
|
3. Measure: accuracy, speed, cost
|
|
4. Choose based on your requirements
|
|
|
|
### Cost Optimization Tips
|
|
1. Use `:floor` suffix for automatic cheapest routing
|
|
2. Enable prompt caching where available
|
|
3. Batch requests when possible
|
|
4. Use free tier for non-critical workloads
|
|
5. Monitor and adjust based on actual usage patterns
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
For **Claude Code tool use** on OpenRouter, the clear winners are:
|
|
|
|
**🏆 Best Overall Value:** `mistralai/mistral-small-3.1-24b`
|
|
- Optimized for tool calling at unbeatable pricing
|
|
|
|
**🆓 Best Free Option:** `google/gemini-2.0-flash-exp:free`
|
|
- Production-ready free tier with excellent capabilities
|
|
|
|
**💰 Maximum Savings:** `cohere/command-r7b-12-2024`
|
|
- Cheapest paid option with strong performance
|
|
|
|
All three models offer excellent tool calling support, fast responses, and high-quality outputs suitable for production Claude Code deployments.
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- **OpenRouter Models Page:** https://openrouter.ai/models
|
|
- **Tool Calling Docs:** https://openrouter.ai/docs/features/tool-calling
|
|
- **Filter by Tools:** https://openrouter.ai/models?supported_parameters=tools
|
|
- **OpenRouter Discord:** For community support and updates
|
|
- **Model Rankings:** https://openrouter.ai/rankings
|
|
|
|
---
|
|
|
|
**Research Conducted By:** Claude Code Research Agent
|
|
**Last Updated:** October 6, 2025
|
|
**Methodology:** Web research, documentation review, pricing analysis, benchmark comparison
|