tasq/node_modules/agentic-flow/docs/providers/PROVIDER-FALLBACK-GUIDE.md

14 KiB
Raw Blame History

Provider Fallback & Dynamic Switching Guide

Production-grade LLM provider fallback for long-running agents

Overview

The ProviderManager and LongRunningAgent classes provide enterprise-grade provider fallback, health monitoring, cost optimization, and automatic recovery for long-running AI agents.

Key Features

  • Automatic Fallback - Seamless switching between providers on failure
  • Circuit Breaker - Prevents cascading failures with automatic recovery
  • Health Monitoring - Real-time provider health tracking
  • Cost Optimization - Intelligent provider selection based on cost/performance
  • Retry Logic - Exponential/linear backoff for transient errors
  • Checkpointing - Save/restore agent state for crash recovery
  • Budget Control - Hard limits on spending and runtime
  • Performance Tracking - Latency, success rate, and token usage metrics

Quick Start

Basic Provider Fallback

import { ProviderManager, ProviderConfig } from 'agentic-flow/core/provider-manager';

// Configure providers
const providers: ProviderConfig[] = [
  {
    name: 'gemini',
    apiKey: process.env.GOOGLE_GEMINI_API_KEY,
    priority: 1, // Try first
    maxRetries: 3,
    timeout: 30000,
    costPerToken: 0.00015,
    enabled: true
  },
  {
    name: 'anthropic',
    apiKey: process.env.ANTHROPIC_API_KEY,
    priority: 2, // Fallback
    maxRetries: 3,
    timeout: 60000,
    costPerToken: 0.003,
    enabled: true
  },
  {
    name: 'onnx',
    priority: 3, // Last resort (free, local)
    maxRetries: 2,
    timeout: 120000,
    costPerToken: 0,
    enabled: true
  }
];

// Initialize manager
const manager = new ProviderManager(providers, {
  type: 'priority', // or 'cost-optimized', 'performance-optimized', 'round-robin'
  maxFailures: 3,
  recoveryTime: 60000,
  retryBackoff: 'exponential'
});

// Execute with automatic fallback
const { result, provider, attempts } = await manager.executeWithFallback(
  async (providerName) => {
    // Your LLM API call here
    return await callLLM(providerName, prompt);
  }
);

console.log(`Success with ${provider} after ${attempts} attempts`);

Long-Running Agent

import { LongRunningAgent } from 'agentic-flow/core/long-running-agent';

// Create agent
const agent = new LongRunningAgent({
  agentName: 'research-agent',
  providers,
  fallbackStrategy: {
    type: 'cost-optimized',
    maxFailures: 3,
    recoveryTime: 60000,
    retryBackoff: 'exponential',
    costThreshold: 0.50, // Max $0.50 per request
    latencyThreshold: 30000 // Max 30s per request
  },
  checkpointInterval: 30000, // Save state every 30s
  maxRuntime: 3600000, // Max 1 hour
  costBudget: 5.00 // Max $5 total
});

await agent.start();

// Execute tasks with automatic provider selection
const result = await agent.executeTask({
  name: 'analyze-code',
  complexity: 'complex', // 'simple' | 'medium' | 'complex'
  estimatedTokens: 5000,
  execute: async (provider) => {
    return await analyzeCode(provider, code);
  }
});

// Get status
const status = agent.getStatus();
console.log(`Completed: ${status.completedTasks}, Cost: $${status.totalCost}`);

await agent.stop();

Fallback Strategies

1. Priority-Based (Default)

Tries providers in priority order (1 = highest).

{
  type: 'priority',
  maxFailures: 3,
  recoveryTime: 60000,
  retryBackoff: 'exponential'
}

Use Case: Prefer specific provider (e.g., Claude for quality)

2. Cost-Optimized

Selects cheapest provider for estimated token count.

{
  type: 'cost-optimized',
  maxFailures: 3,
  recoveryTime: 60000,
  retryBackoff: 'exponential',
  costThreshold: 0.50 // Max $0.50 per request
}

Use Case: High-volume applications, budget constraints

3. Performance-Optimized

Selects provider with best latency and success rate.

{
  type: 'performance-optimized',
  maxFailures: 3,
  recoveryTime: 60000,
  retryBackoff: 'exponential',
  latencyThreshold: 30000 // Max 30s
}

Use Case: Real-time applications, user-facing services

4. Round-Robin

Distributes load evenly across providers.

{
  type: 'round-robin',
  maxFailures: 3,
  recoveryTime: 60000,
  retryBackoff: 'exponential'
}

Use Case: Load balancing, testing multiple providers

Task Complexity Heuristics

The system applies intelligent heuristics based on task complexity:

Simple Tasks → Prefer Gemini/ONNX

await agent.executeTask({
  name: 'format-code',
  complexity: 'simple', // Fast, cheap providers preferred
  estimatedTokens: 200,
  execute: async (provider) => formatCode(code)
});

Rationale: Simple tasks don't need Claude's reasoning power

Medium Tasks → Auto-Optimized

await agent.executeTask({
  name: 'refactor-function',
  complexity: 'medium', // Balance cost/quality
  estimatedTokens: 1500,
  execute: async (provider) => refactorFunction(code)
});

Rationale: Uses fallback strategy (cost/performance)

Complex Tasks → Prefer Claude

await agent.executeTask({
  name: 'design-architecture',
  complexity: 'complex', // Quality matters most
  estimatedTokens: 5000,
  execute: async (provider) => designArchitecture(requirements)
});

Rationale: Complex reasoning benefits from Claude's capabilities

Circuit Breaker

Prevents cascading failures by temporarily disabling failing providers.

How It Works

  1. Failure Tracking: Count consecutive failures per provider
  2. Threshold: Open circuit after N failures (configurable)
  3. Recovery: Automatically recover after timeout
  4. Fallback: Use next available provider

Configuration

{
  maxFailures: 3, // Open circuit after 3 consecutive failures
  recoveryTime: 60000, // Try recovery after 60 seconds
  retryBackoff: 'exponential' // 1s, 2s, 4s, 8s, 16s...
}

Monitoring

const health = manager.getHealth();

health.forEach(h => {
  console.log(`${h.provider}:`);
  console.log(`  Circuit Breaker: ${h.circuitBreakerOpen ? 'OPEN' : 'CLOSED'}`);
  console.log(`  Consecutive Failures: ${h.consecutiveFailures}`);
  console.log(`  Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
});

Cost Tracking & Optimization

Real-Time Cost Monitoring

const costs = manager.getCostSummary();

console.log(`Total: $${costs.total.toFixed(4)}`);
console.log(`Tokens: ${costs.totalTokens.toLocaleString()}`);

for (const [provider, cost] of Object.entries(costs.byProvider)) {
  console.log(`  ${provider}: $${cost.toFixed(4)}`);
}

Budget Constraints

const agent = new LongRunningAgent({
  agentName: 'budget-agent',
  providers,
  costBudget: 10.00, // Hard limit: $10
  // ... other config
});

// Agent automatically stops when budget exceeded

Cost-Per-Provider Configuration

const providers: ProviderConfig[] = [
  {
    name: 'gemini',
    costPerToken: 0.00015, // $0.15 per 1M tokens
    // ...
  },
  {
    name: 'anthropic',
    costPerToken: 0.003, // $3 per 1M tokens (Sonnet)
    // ...
  },
  {
    name: 'onnx',
    costPerToken: 0, // FREE (local)
    // ...
  }
];

Health Monitoring

Automatic Health Checks

const providers: ProviderConfig[] = [
  {
    name: 'gemini',
    healthCheckInterval: 60000, // Check every minute
    // ...
  }
];

Manual Health Check

const health = manager.getHealth();

health.forEach(h => {
  console.log(`${h.provider}:`);
  console.log(`  Healthy: ${h.isHealthy}`);
  console.log(`  Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
  console.log(`  Avg Latency: ${h.averageLatency.toFixed(0)}ms`);
  console.log(`  Error Rate: ${(h.errorRate * 100).toFixed(1)}%`);
});

Metrics Collection

const metrics = manager.getMetrics();

metrics.forEach(m => {
  console.log(`${m.provider}:`);
  console.log(`  Total Requests: ${m.totalRequests}`);
  console.log(`  Successful: ${m.successfulRequests}`);
  console.log(`  Failed: ${m.failedRequests}`);
  console.log(`  Avg Latency: ${m.averageLatency.toFixed(0)}ms`);
  console.log(`  Total Cost: $${m.totalCost.toFixed(4)}`);
});

Checkpointing & Recovery

Automatic Checkpoints

const agent = new LongRunningAgent({
  agentName: 'checkpoint-agent',
  providers,
  checkpointInterval: 30000, // Save every 30 seconds
  // ...
});

await agent.start();

// Agent automatically saves checkpoints every 30s
// On crash, restore from last checkpoint

Manual Checkpoint Management

// Get all checkpoints
const metrics = agent.getMetrics();
const checkpoints = metrics.checkpoints;

// Restore from specific checkpoint
const lastCheckpoint = checkpoints[checkpoints.length - 1];
agent.restoreFromCheckpoint(lastCheckpoint);

Checkpoint Data

interface AgentCheckpoint {
  timestamp: Date;
  taskProgress: number; // 0-1
  currentProvider: string;
  totalCost: number;
  totalTokens: number;
  completedTasks: number;
  failedTasks: number;
  state: Record<string, any>; // Custom state
}

Retry Logic

{
  retryBackoff: 'exponential'
}

Delays: 1s, 2s, 4s, 8s, 16s, 30s (max)

Use Case: Rate limits, transient errors

Linear Backoff

{
  retryBackoff: 'linear'
}

Delays: 1s, 2s, 3s, 4s, 5s, 10s (max)

Use Case: Predictable retry patterns

Retryable Errors

Automatically retried:

  • rate limit
  • timeout
  • connection
  • network
  • HTTP 503, 502, 429

Non-retryable errors fail immediately:

  • Authentication errors
  • Invalid requests
  • HTTP 4xx (except 429)

Production Best Practices

1. Multi-Provider Strategy

const providers: ProviderConfig[] = [
  // Primary: Fast & cheap for simple tasks
  { name: 'gemini', priority: 1, costPerToken: 0.00015 },

  // Fallback: High quality for complex tasks
  { name: 'anthropic', priority: 2, costPerToken: 0.003 },

  // Emergency: Free local inference
  { name: 'onnx', priority: 3, costPerToken: 0 }
];

2. Cost Optimization

// Use cost-optimized strategy for high-volume
const agent = new LongRunningAgent({
  agentName: 'production-agent',
  providers,
  fallbackStrategy: {
    type: 'cost-optimized',
    costThreshold: 0.50
  },
  costBudget: 100.00 // Daily budget
});

3. Health Monitoring

// Monitor provider health every minute
const providers: ProviderConfig[] = [
  {
    name: 'gemini',
    healthCheckInterval: 60000,
    enabled: true
  }
];

// Check health before critical operations
const health = manager.getHealth();
const unhealthy = health.filter(h => !h.isHealthy);

if (unhealthy.length > 0) {
  console.warn('Unhealthy providers:', unhealthy.map(h => h.provider));
}

4. Graceful Degradation

// Prefer quality, fallback to cost
const providers: ProviderConfig[] = [
  { name: 'anthropic', priority: 1 }, // Best quality
  { name: 'gemini', priority: 2 },     // Cheaper fallback
  { name: 'onnx', priority: 3 }        // Always available
];

5. Circuit Breaker Tuning

{
  maxFailures: 5, // More tolerant in production
  recoveryTime: 300000, // 5 minutes before retry
  retryBackoff: 'exponential'
}

Docker Validation

Build Image

docker build -f Dockerfile.provider-fallback -t agentic-flow-provider-fallback .

Run Tests

# With Gemini API key
docker run --rm \
  -e GOOGLE_GEMINI_API_KEY=your_key_here \
  agentic-flow-provider-fallback

# ONNX only (no API key needed)
docker run --rm agentic-flow-provider-fallback

Expected Output

✅ Provider Fallback Validation Test
====================================

📋 Testing Provider Manager...

1⃣  Building TypeScript...
✅ Build complete

2⃣  Running provider fallback example...
   Using Gemini API key: AIza...
🚀 Starting Long-Running Agent with Provider Fallback

📋 Task 1: Simple Code Generation (Gemini optimal)
  Using provider: gemini
  ✅ Result: { code: 'console.log("Hello World");', provider: 'gemini' }

📋 Task 2: Complex Architecture Design (Claude optimal)
  Using provider: anthropic
  ✅ Result: { architecture: 'Event-driven microservices', provider: 'anthropic' }

📈 Provider Health:
gemini:
  Healthy: true
  Success Rate: 100.0%
  Circuit Breaker: CLOSED

✅ All provider fallback tests passed!

API Reference

ProviderManager

class ProviderManager {
  constructor(providers: ProviderConfig[], strategy: FallbackStrategy);

  selectProvider(
    taskComplexity?: 'simple' | 'medium' | 'complex',
    estimatedTokens?: number
  ): Promise<ProviderType>;

  executeWithFallback<T>(
    requestFn: (provider: ProviderType) => Promise<T>,
    taskComplexity?: 'simple' | 'medium' | 'complex',
    estimatedTokens?: number
  ): Promise<{ result: T; provider: ProviderType; attempts: number }>;

  getMetrics(): ProviderMetrics[];
  getHealth(): ProviderHealth[];
  getCostSummary(): { total: number; byProvider: Record<ProviderType, number>; totalTokens: number };
  destroy(): void;
}

LongRunningAgent

class LongRunningAgent {
  constructor(config: LongRunningAgentConfig);

  start(): Promise<void>;
  stop(): Promise<void>;

  executeTask<T>(task: {
    name: string;
    complexity: 'simple' | 'medium' | 'complex';
    estimatedTokens?: number;
    execute: (provider: string) => Promise<T>;
  }): Promise<T>;

  getStatus(): AgentStatus;
  getMetrics(): AgentMetrics;
  restoreFromCheckpoint(checkpoint: AgentCheckpoint): void;
}

Examples

See src/examples/use-provider-fallback.ts for complete working examples.

Support

License

MIT - See LICENSE file for details