620 lines
14 KiB
Markdown
620 lines
14 KiB
Markdown
# Provider Fallback & Dynamic Switching Guide
|
||
|
||
**Production-grade LLM provider fallback for long-running agents**
|
||
|
||
## Overview
|
||
|
||
The `ProviderManager` and `LongRunningAgent` classes provide enterprise-grade provider fallback, health monitoring, cost optimization, and automatic recovery for long-running AI agents.
|
||
|
||
### Key Features
|
||
|
||
- ✅ **Automatic Fallback** - Seamless switching between providers on failure
|
||
- ✅ **Circuit Breaker** - Prevents cascading failures with automatic recovery
|
||
- ✅ **Health Monitoring** - Real-time provider health tracking
|
||
- ✅ **Cost Optimization** - Intelligent provider selection based on cost/performance
|
||
- ✅ **Retry Logic** - Exponential/linear backoff for transient errors
|
||
- ✅ **Checkpointing** - Save/restore agent state for crash recovery
|
||
- ✅ **Budget Control** - Hard limits on spending and runtime
|
||
- ✅ **Performance Tracking** - Latency, success rate, and token usage metrics
|
||
|
||
## Quick Start
|
||
|
||
### Basic Provider Fallback
|
||
|
||
```typescript
|
||
import { ProviderManager, ProviderConfig } from 'agentic-flow/core/provider-manager';
|
||
|
||
// Configure providers
|
||
const providers: ProviderConfig[] = [
|
||
{
|
||
name: 'gemini',
|
||
apiKey: process.env.GOOGLE_GEMINI_API_KEY,
|
||
priority: 1, // Try first
|
||
maxRetries: 3,
|
||
timeout: 30000,
|
||
costPerToken: 0.00015,
|
||
enabled: true
|
||
},
|
||
{
|
||
name: 'anthropic',
|
||
apiKey: process.env.ANTHROPIC_API_KEY,
|
||
priority: 2, // Fallback
|
||
maxRetries: 3,
|
||
timeout: 60000,
|
||
costPerToken: 0.003,
|
||
enabled: true
|
||
},
|
||
{
|
||
name: 'onnx',
|
||
priority: 3, // Last resort (free, local)
|
||
maxRetries: 2,
|
||
timeout: 120000,
|
||
costPerToken: 0,
|
||
enabled: true
|
||
}
|
||
];
|
||
|
||
// Initialize manager
|
||
const manager = new ProviderManager(providers, {
|
||
type: 'priority', // or 'cost-optimized', 'performance-optimized', 'round-robin'
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential'
|
||
});
|
||
|
||
// Execute with automatic fallback
|
||
const { result, provider, attempts } = await manager.executeWithFallback(
|
||
async (providerName) => {
|
||
// Your LLM API call here
|
||
return await callLLM(providerName, prompt);
|
||
}
|
||
);
|
||
|
||
console.log(`Success with ${provider} after ${attempts} attempts`);
|
||
```
|
||
|
||
### Long-Running Agent
|
||
|
||
```typescript
|
||
import { LongRunningAgent } from 'agentic-flow/core/long-running-agent';
|
||
|
||
// Create agent
|
||
const agent = new LongRunningAgent({
|
||
agentName: 'research-agent',
|
||
providers,
|
||
fallbackStrategy: {
|
||
type: 'cost-optimized',
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential',
|
||
costThreshold: 0.50, // Max $0.50 per request
|
||
latencyThreshold: 30000 // Max 30s per request
|
||
},
|
||
checkpointInterval: 30000, // Save state every 30s
|
||
maxRuntime: 3600000, // Max 1 hour
|
||
costBudget: 5.00 // Max $5 total
|
||
});
|
||
|
||
await agent.start();
|
||
|
||
// Execute tasks with automatic provider selection
|
||
const result = await agent.executeTask({
|
||
name: 'analyze-code',
|
||
complexity: 'complex', // 'simple' | 'medium' | 'complex'
|
||
estimatedTokens: 5000,
|
||
execute: async (provider) => {
|
||
return await analyzeCode(provider, code);
|
||
}
|
||
});
|
||
|
||
// Get status
|
||
const status = agent.getStatus();
|
||
console.log(`Completed: ${status.completedTasks}, Cost: $${status.totalCost}`);
|
||
|
||
await agent.stop();
|
||
```
|
||
|
||
## Fallback Strategies
|
||
|
||
### 1. Priority-Based (Default)
|
||
|
||
Tries providers in priority order (1 = highest).
|
||
|
||
```typescript
|
||
{
|
||
type: 'priority',
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential'
|
||
}
|
||
```
|
||
|
||
**Use Case:** Prefer specific provider (e.g., Claude for quality)
|
||
|
||
### 2. Cost-Optimized
|
||
|
||
Selects cheapest provider for estimated token count.
|
||
|
||
```typescript
|
||
{
|
||
type: 'cost-optimized',
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential',
|
||
costThreshold: 0.50 // Max $0.50 per request
|
||
}
|
||
```
|
||
|
||
**Use Case:** High-volume applications, budget constraints
|
||
|
||
### 3. Performance-Optimized
|
||
|
||
Selects provider with best latency and success rate.
|
||
|
||
```typescript
|
||
{
|
||
type: 'performance-optimized',
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential',
|
||
latencyThreshold: 30000 // Max 30s
|
||
}
|
||
```
|
||
|
||
**Use Case:** Real-time applications, user-facing services
|
||
|
||
### 4. Round-Robin
|
||
|
||
Distributes load evenly across providers.
|
||
|
||
```typescript
|
||
{
|
||
type: 'round-robin',
|
||
maxFailures: 3,
|
||
recoveryTime: 60000,
|
||
retryBackoff: 'exponential'
|
||
}
|
||
```
|
||
|
||
**Use Case:** Load balancing, testing multiple providers
|
||
|
||
## Task Complexity Heuristics
|
||
|
||
The system applies intelligent heuristics based on task complexity:
|
||
|
||
### Simple Tasks → Prefer Gemini/ONNX
|
||
```typescript
|
||
await agent.executeTask({
|
||
name: 'format-code',
|
||
complexity: 'simple', // Fast, cheap providers preferred
|
||
estimatedTokens: 200,
|
||
execute: async (provider) => formatCode(code)
|
||
});
|
||
```
|
||
|
||
**Rationale:** Simple tasks don't need Claude's reasoning power
|
||
|
||
### Medium Tasks → Auto-Optimized
|
||
```typescript
|
||
await agent.executeTask({
|
||
name: 'refactor-function',
|
||
complexity: 'medium', // Balance cost/quality
|
||
estimatedTokens: 1500,
|
||
execute: async (provider) => refactorFunction(code)
|
||
});
|
||
```
|
||
|
||
**Rationale:** Uses fallback strategy (cost/performance)
|
||
|
||
### Complex Tasks → Prefer Claude
|
||
```typescript
|
||
await agent.executeTask({
|
||
name: 'design-architecture',
|
||
complexity: 'complex', // Quality matters most
|
||
estimatedTokens: 5000,
|
||
execute: async (provider) => designArchitecture(requirements)
|
||
});
|
||
```
|
||
|
||
**Rationale:** Complex reasoning benefits from Claude's capabilities
|
||
|
||
## Circuit Breaker
|
||
|
||
Prevents cascading failures by temporarily disabling failing providers.
|
||
|
||
### How It Works
|
||
|
||
1. **Failure Tracking:** Count consecutive failures per provider
|
||
2. **Threshold:** Open circuit after N failures (configurable)
|
||
3. **Recovery:** Automatically recover after timeout
|
||
4. **Fallback:** Use next available provider
|
||
|
||
### Configuration
|
||
|
||
```typescript
|
||
{
|
||
maxFailures: 3, // Open circuit after 3 consecutive failures
|
||
recoveryTime: 60000, // Try recovery after 60 seconds
|
||
retryBackoff: 'exponential' // 1s, 2s, 4s, 8s, 16s...
|
||
}
|
||
```
|
||
|
||
### Monitoring
|
||
|
||
```typescript
|
||
const health = manager.getHealth();
|
||
|
||
health.forEach(h => {
|
||
console.log(`${h.provider}:`);
|
||
console.log(` Circuit Breaker: ${h.circuitBreakerOpen ? 'OPEN' : 'CLOSED'}`);
|
||
console.log(` Consecutive Failures: ${h.consecutiveFailures}`);
|
||
console.log(` Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
|
||
});
|
||
```
|
||
|
||
## Cost Tracking & Optimization
|
||
|
||
### Real-Time Cost Monitoring
|
||
|
||
```typescript
|
||
const costs = manager.getCostSummary();
|
||
|
||
console.log(`Total: $${costs.total.toFixed(4)}`);
|
||
console.log(`Tokens: ${costs.totalTokens.toLocaleString()}`);
|
||
|
||
for (const [provider, cost] of Object.entries(costs.byProvider)) {
|
||
console.log(` ${provider}: $${cost.toFixed(4)}`);
|
||
}
|
||
```
|
||
|
||
### Budget Constraints
|
||
|
||
```typescript
|
||
const agent = new LongRunningAgent({
|
||
agentName: 'budget-agent',
|
||
providers,
|
||
costBudget: 10.00, // Hard limit: $10
|
||
// ... other config
|
||
});
|
||
|
||
// Agent automatically stops when budget exceeded
|
||
```
|
||
|
||
### Cost-Per-Provider Configuration
|
||
|
||
```typescript
|
||
const providers: ProviderConfig[] = [
|
||
{
|
||
name: 'gemini',
|
||
costPerToken: 0.00015, // $0.15 per 1M tokens
|
||
// ...
|
||
},
|
||
{
|
||
name: 'anthropic',
|
||
costPerToken: 0.003, // $3 per 1M tokens (Sonnet)
|
||
// ...
|
||
},
|
||
{
|
||
name: 'onnx',
|
||
costPerToken: 0, // FREE (local)
|
||
// ...
|
||
}
|
||
];
|
||
```
|
||
|
||
## Health Monitoring
|
||
|
||
### Automatic Health Checks
|
||
|
||
```typescript
|
||
const providers: ProviderConfig[] = [
|
||
{
|
||
name: 'gemini',
|
||
healthCheckInterval: 60000, // Check every minute
|
||
// ...
|
||
}
|
||
];
|
||
```
|
||
|
||
### Manual Health Check
|
||
|
||
```typescript
|
||
const health = manager.getHealth();
|
||
|
||
health.forEach(h => {
|
||
console.log(`${h.provider}:`);
|
||
console.log(` Healthy: ${h.isHealthy}`);
|
||
console.log(` Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
|
||
console.log(` Avg Latency: ${h.averageLatency.toFixed(0)}ms`);
|
||
console.log(` Error Rate: ${(h.errorRate * 100).toFixed(1)}%`);
|
||
});
|
||
```
|
||
|
||
### Metrics Collection
|
||
|
||
```typescript
|
||
const metrics = manager.getMetrics();
|
||
|
||
metrics.forEach(m => {
|
||
console.log(`${m.provider}:`);
|
||
console.log(` Total Requests: ${m.totalRequests}`);
|
||
console.log(` Successful: ${m.successfulRequests}`);
|
||
console.log(` Failed: ${m.failedRequests}`);
|
||
console.log(` Avg Latency: ${m.averageLatency.toFixed(0)}ms`);
|
||
console.log(` Total Cost: $${m.totalCost.toFixed(4)}`);
|
||
});
|
||
```
|
||
|
||
## Checkpointing & Recovery
|
||
|
||
### Automatic Checkpoints
|
||
|
||
```typescript
|
||
const agent = new LongRunningAgent({
|
||
agentName: 'checkpoint-agent',
|
||
providers,
|
||
checkpointInterval: 30000, // Save every 30 seconds
|
||
// ...
|
||
});
|
||
|
||
await agent.start();
|
||
|
||
// Agent automatically saves checkpoints every 30s
|
||
// On crash, restore from last checkpoint
|
||
```
|
||
|
||
### Manual Checkpoint Management
|
||
|
||
```typescript
|
||
// Get all checkpoints
|
||
const metrics = agent.getMetrics();
|
||
const checkpoints = metrics.checkpoints;
|
||
|
||
// Restore from specific checkpoint
|
||
const lastCheckpoint = checkpoints[checkpoints.length - 1];
|
||
agent.restoreFromCheckpoint(lastCheckpoint);
|
||
```
|
||
|
||
### Checkpoint Data
|
||
|
||
```typescript
|
||
interface AgentCheckpoint {
|
||
timestamp: Date;
|
||
taskProgress: number; // 0-1
|
||
currentProvider: string;
|
||
totalCost: number;
|
||
totalTokens: number;
|
||
completedTasks: number;
|
||
failedTasks: number;
|
||
state: Record<string, any>; // Custom state
|
||
}
|
||
```
|
||
|
||
## Retry Logic
|
||
|
||
### Exponential Backoff (Recommended)
|
||
|
||
```typescript
|
||
{
|
||
retryBackoff: 'exponential'
|
||
}
|
||
```
|
||
|
||
**Delays:** 1s, 2s, 4s, 8s, 16s, 30s (max)
|
||
|
||
**Use Case:** Rate limits, transient errors
|
||
|
||
### Linear Backoff
|
||
|
||
```typescript
|
||
{
|
||
retryBackoff: 'linear'
|
||
}
|
||
```
|
||
|
||
**Delays:** 1s, 2s, 3s, 4s, 5s, 10s (max)
|
||
|
||
**Use Case:** Predictable retry patterns
|
||
|
||
### Retryable Errors
|
||
|
||
Automatically retried:
|
||
- `rate limit`
|
||
- `timeout`
|
||
- `connection`
|
||
- `network`
|
||
- HTTP 503, 502, 429
|
||
|
||
Non-retryable errors fail immediately:
|
||
- Authentication errors
|
||
- Invalid requests
|
||
- HTTP 4xx (except 429)
|
||
|
||
## Production Best Practices
|
||
|
||
### 1. Multi-Provider Strategy
|
||
|
||
```typescript
|
||
const providers: ProviderConfig[] = [
|
||
// Primary: Fast & cheap for simple tasks
|
||
{ name: 'gemini', priority: 1, costPerToken: 0.00015 },
|
||
|
||
// Fallback: High quality for complex tasks
|
||
{ name: 'anthropic', priority: 2, costPerToken: 0.003 },
|
||
|
||
// Emergency: Free local inference
|
||
{ name: 'onnx', priority: 3, costPerToken: 0 }
|
||
];
|
||
```
|
||
|
||
### 2. Cost Optimization
|
||
|
||
```typescript
|
||
// Use cost-optimized strategy for high-volume
|
||
const agent = new LongRunningAgent({
|
||
agentName: 'production-agent',
|
||
providers,
|
||
fallbackStrategy: {
|
||
type: 'cost-optimized',
|
||
costThreshold: 0.50
|
||
},
|
||
costBudget: 100.00 // Daily budget
|
||
});
|
||
```
|
||
|
||
### 3. Health Monitoring
|
||
|
||
```typescript
|
||
// Monitor provider health every minute
|
||
const providers: ProviderConfig[] = [
|
||
{
|
||
name: 'gemini',
|
||
healthCheckInterval: 60000,
|
||
enabled: true
|
||
}
|
||
];
|
||
|
||
// Check health before critical operations
|
||
const health = manager.getHealth();
|
||
const unhealthy = health.filter(h => !h.isHealthy);
|
||
|
||
if (unhealthy.length > 0) {
|
||
console.warn('Unhealthy providers:', unhealthy.map(h => h.provider));
|
||
}
|
||
```
|
||
|
||
### 4. Graceful Degradation
|
||
|
||
```typescript
|
||
// Prefer quality, fallback to cost
|
||
const providers: ProviderConfig[] = [
|
||
{ name: 'anthropic', priority: 1 }, // Best quality
|
||
{ name: 'gemini', priority: 2 }, // Cheaper fallback
|
||
{ name: 'onnx', priority: 3 } // Always available
|
||
];
|
||
```
|
||
|
||
### 5. Circuit Breaker Tuning
|
||
|
||
```typescript
|
||
{
|
||
maxFailures: 5, // More tolerant in production
|
||
recoveryTime: 300000, // 5 minutes before retry
|
||
retryBackoff: 'exponential'
|
||
}
|
||
```
|
||
|
||
## Docker Validation
|
||
|
||
### Build Image
|
||
|
||
```bash
|
||
docker build -f Dockerfile.provider-fallback -t agentic-flow-provider-fallback .
|
||
```
|
||
|
||
### Run Tests
|
||
|
||
```bash
|
||
# With Gemini API key
|
||
docker run --rm \
|
||
-e GOOGLE_GEMINI_API_KEY=your_key_here \
|
||
agentic-flow-provider-fallback
|
||
|
||
# ONNX only (no API key needed)
|
||
docker run --rm agentic-flow-provider-fallback
|
||
```
|
||
|
||
### Expected Output
|
||
|
||
```
|
||
✅ Provider Fallback Validation Test
|
||
====================================
|
||
|
||
📋 Testing Provider Manager...
|
||
|
||
1️⃣ Building TypeScript...
|
||
✅ Build complete
|
||
|
||
2️⃣ Running provider fallback example...
|
||
Using Gemini API key: AIza...
|
||
🚀 Starting Long-Running Agent with Provider Fallback
|
||
|
||
📋 Task 1: Simple Code Generation (Gemini optimal)
|
||
Using provider: gemini
|
||
✅ Result: { code: 'console.log("Hello World");', provider: 'gemini' }
|
||
|
||
📋 Task 2: Complex Architecture Design (Claude optimal)
|
||
Using provider: anthropic
|
||
✅ Result: { architecture: 'Event-driven microservices', provider: 'anthropic' }
|
||
|
||
📈 Provider Health:
|
||
gemini:
|
||
Healthy: true
|
||
Success Rate: 100.0%
|
||
Circuit Breaker: CLOSED
|
||
|
||
✅ All provider fallback tests passed!
|
||
```
|
||
|
||
## API Reference
|
||
|
||
### ProviderManager
|
||
|
||
```typescript
|
||
class ProviderManager {
|
||
constructor(providers: ProviderConfig[], strategy: FallbackStrategy);
|
||
|
||
selectProvider(
|
||
taskComplexity?: 'simple' | 'medium' | 'complex',
|
||
estimatedTokens?: number
|
||
): Promise<ProviderType>;
|
||
|
||
executeWithFallback<T>(
|
||
requestFn: (provider: ProviderType) => Promise<T>,
|
||
taskComplexity?: 'simple' | 'medium' | 'complex',
|
||
estimatedTokens?: number
|
||
): Promise<{ result: T; provider: ProviderType; attempts: number }>;
|
||
|
||
getMetrics(): ProviderMetrics[];
|
||
getHealth(): ProviderHealth[];
|
||
getCostSummary(): { total: number; byProvider: Record<ProviderType, number>; totalTokens: number };
|
||
destroy(): void;
|
||
}
|
||
```
|
||
|
||
### LongRunningAgent
|
||
|
||
```typescript
|
||
class LongRunningAgent {
|
||
constructor(config: LongRunningAgentConfig);
|
||
|
||
start(): Promise<void>;
|
||
stop(): Promise<void>;
|
||
|
||
executeTask<T>(task: {
|
||
name: string;
|
||
complexity: 'simple' | 'medium' | 'complex';
|
||
estimatedTokens?: number;
|
||
execute: (provider: string) => Promise<T>;
|
||
}): Promise<T>;
|
||
|
||
getStatus(): AgentStatus;
|
||
getMetrics(): AgentMetrics;
|
||
restoreFromCheckpoint(checkpoint: AgentCheckpoint): void;
|
||
}
|
||
```
|
||
|
||
## Examples
|
||
|
||
See `src/examples/use-provider-fallback.ts` for complete working examples.
|
||
|
||
## Support
|
||
|
||
- **GitHub Issues:** https://github.com/ruvnet/agentic-flow/issues
|
||
- **Documentation:** https://github.com/ruvnet/agentic-flow#readme
|
||
- **Discord:** Coming soon
|
||
|
||
## License
|
||
|
||
MIT - See LICENSE file for details
|