Marc Rejohn Castillano 5cb6561924 added ruflo

2026-04-09 19:01:53 +08:00

6.3 KiB

Raw Blame History

Requesty.ai Integration - Overview

Executive Summary

This document outlines the plan to integrate Requesty.ai as a new provider in the agentic-flow project, following the same architectural pattern as the existing OpenRouter integration.

What is Requesty.ai?

Requesty.ai is a unified AI gateway that provides:

300+ AI Models - Access to models from OpenAI, Anthropic, Google, and other providers
OpenAI-Compatible API - Drop-in replacement for OpenAI SDK with base_url override
Cost Optimization - 80% cost savings through intelligent routing and caching
Built-in Analytics - Real-time usage tracking and performance monitoring
Enterprise Features - Zero downtime guarantee, automatic failover, load balancing

Integration Goals

Provider Parity - Requesty should work alongside Anthropic, OpenRouter, Gemini, and ONNX
Minimal Code Changes - Leverage existing proxy pattern from OpenRouter
Model Flexibility - Support 300+ models through Requesty's router
Tool Support - Maintain full MCP tool calling compatibility
Cost Optimization - Enable users to access cheaper models with same quality

Key Differentiators from OpenRouter

Feature	OpenRouter	Requesty.ai
Model Count	100+	300+
API Format	OpenAI `/chat/completions`	OpenAI `/chat/completions`
Base URL	`https://openrouter.ai/api/v1`	`https://router.requesty.ai/v1`
Authentication	`Authorization: Bearer sk-or-...`	`Authorization: Bearer requesty-...`
Cost Savings	~90% vs Claude	~80% vs Claude
Tool Calling	Native OpenAI format	Native OpenAI format
Unique Features	Leaderboard tracking	Auto-routing, caching, analytics

Strategic Benefits

Provider Diversity - Reduces dependency on single gateway
Model Access - 300+ models vs OpenRouter's 100+
Cost Flexibility - Users can choose based on price/performance
Redundancy - Fallback option if OpenRouter has issues
Enterprise Features - Built-in analytics and monitoring

Integration Approach

Architecture Pattern

Requesty will follow the same proxy architecture as OpenRouter:

User Request (Anthropic Format)
    ↓
Agentic Flow CLI
    ↓
Provider Detection (--provider requesty)
    ↓
Anthropic-to-Requesty Proxy Server
    ↓
Format Conversion (Anthropic → OpenAI)
    ↓
Requesty Router (https://router.requesty.ai/v1)
    ↓
Model Execution (300+ models)
    ↓
Response Conversion (OpenAI → Anthropic)
    ↓
Return to User

File Structure

New files to create:

agentic-flow/
├── src/
│   └── proxy/
│       └── anthropic-to-requesty.ts    # NEW - Requesty proxy (clone from OpenRouter)
├── docs/
│   └── plans/
│       └── requesty/
│           ├── 00-overview.md          # This file
│           ├── 01-api-research.md
│           ├── 02-architecture.md
│           ├── 03-implementation-phases.md
│           ├── 04-testing-strategy.md
│           └── 05-migration-guide.md

Existing files to modify:

agentic-flow/
├── src/
│   ├── cli-proxy.ts                    # Add Requesty provider detection
│   ├── agents/claudeAgent.ts           # Add Requesty to provider list
│   └── utils/
│       ├── modelCapabilities.ts        # Add Requesty model mappings
│       └── modelOptimizer.ts           # Include Requesty models in optimizer

Success Criteria

Must Have (MVP)

Users can use --provider requesty flag
Requesty API key via REQUESTY_API_KEY environment variable
Chat completions work with at least 10 tested models
Native tool calling support (MCP tools work)
Streaming responses supported
Error handling and logging
Model override via --model flag

Should Have (V1)

Tool emulation for models without native support
Model capability detection for Requesty models
Integration with model optimizer (--optimize)
Analytics and usage tracking
Proxy mode for Claude Code/Cursor
Cost estimation and reporting

Nice to Have (Future)

Auto-routing based on cost/performance
Caching integration
Fallback to other providers on error
Model benchmarking and comparison

Timeline Estimate

Phase	Tasks	Estimated Time
Phase 1	Research & Planning	2 hours (DONE)
Phase 2	Core Proxy Implementation	4 hours
Phase 3	CLI Integration	2 hours
Phase 4	Testing & Validation	3 hours
Phase 5	Documentation	2 hours
Total		13 hours

Risk Assessment

Technical Risks

Risk	Probability	Impact	Mitigation
API format differences from OpenRouter	Medium	Medium	Thorough testing, fallback handling
Model compatibility issues	Low	Medium	Model capability detection system
Tool calling format incompatibility	Low	High	Test with multiple models early
Rate limiting differences	Medium	Low	Document limits, add retry logic

Business Risks

Risk	Probability	Impact	Mitigation
Requesty API changes	Medium	Medium	Version pinning, changelog monitoring
Service availability issues	Low	High	Multi-provider support already in place
Cost model changes	Low	Medium	Document pricing, update optimizer

Next Steps

Read 01-api-research.md for detailed API analysis
Review 02-architecture.md for technical design
Follow 03-implementation-phases.md for step-by-step implementation
Use 04-testing-strategy.md for comprehensive testing
Reference 05-migration-guide.md for user documentation

Open Questions

Does Requesty support streaming for all models or only specific ones?
Are there any model-specific quirks in tool calling format?
What are the exact rate limits per tier?
Does Requesty offer a free tier for testing?
How does auto-routing work - can we control it programmatically?

References

Requesty.ai Documentation: https://docs.requesty.ai
Requesty.ai Base URL: https://router.requesty.ai/v1
OpenRouter Integration (reference): src/proxy/anthropic-to-openrouter.ts
Gemini Integration (reference): src/proxy/anthropic-to-gemini.ts

6.3 KiB Raw Blame History