tasq/node_modules/agentic-flow/docs/architecture/QUIC-SWARM-INTEGRATION.md

594 lines
18 KiB
Markdown

# QUIC Transport Integration for Multi-Agent Swarm Coordination
## Architecture Overview
This document describes the QUIC transport integration for agentic-flow's multi-agent swarm coordination system. The architecture enables high-performance agent-to-agent communication with transparent fallback to HTTP/2.
### Key Components
```
┌─────────────────────────────────────────────────────────────┐
│ Swarm Coordination Layer │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ QuicCoordinator │◄────►│ TransportRouter │ │
│ │ │ │ (Protocol Selection) │ │
│ │ - Agent registry │ │ - QUIC / HTTP/2 │ │
│ │ - Message routing│ │ - Auto fallback │ │
│ │ - State sync │ │ - Health checks │ │
│ └──────────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Transport Layer (QUIC) │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ QuicClient │ │ QuicConnectionPool │ │
│ │ │ │ │ │
│ │ - 0-RTT support │◄────►│ - Pool management │ │
│ │ - Stream mux │ │ - LRU eviction │ │
│ │ - WASM bindings │ │ - Health monitoring │ │
│ └──────────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WASM QUIC Implementation │
├─────────────────────────────────────────────────────────────┤
│ - UDP transport │
│ - Stream multiplexing (100+ concurrent streams) │
│ - Connection migration (network changes) │
│ - QPACK header compression │
│ - 0-RTT connection establishment │
└─────────────────────────────────────────────────────────────┘
```
## System Architecture
### 1. QuicCoordinator
**Purpose**: Manages agent-to-agent communication in multi-agent swarms
**Features**:
- **Topology Support**: Mesh, Hierarchical, Ring, Star
- **Message Routing**: Topology-aware message forwarding
- **State Synchronization**: Real-time state sync across agents
- **Statistics Tracking**: Per-agent message and latency metrics
- **Heartbeat Monitoring**: Periodic agent health checks
**API**:
```typescript
const coordinator = new QuicCoordinator({
swarmId: 'production-swarm',
topology: 'mesh',
maxAgents: 20,
quicClient,
connectionPool,
heartbeatInterval: 10000,
statesSyncInterval: 5000
});
await coordinator.start();
await coordinator.registerAgent({
id: 'agent-1',
role: 'worker',
host: 'agent-1.example.com',
port: 4433,
capabilities: ['compute', 'analyze']
});
```
### 2. TransportRouter
**Purpose**: Intelligent transport layer with automatic protocol selection
**Features**:
- **Protocol Selection**: QUIC, HTTP/2, or automatic
- **Transparent Fallback**: HTTP/2 fallback on QUIC failure
- **Connection Pooling**: Efficient resource management
- **Health Checking**: Automatic availability detection
- **Statistics**: Per-protocol metrics tracking
**API**:
```typescript
const router = new TransportRouter({
protocol: 'auto',
enableFallback: true,
quicConfig: {
host: 'localhost',
port: 4433,
maxConnections: 100
},
http2Config: {
host: 'localhost',
port: 8443,
maxConnections: 100,
secure: true
}
});
await router.initialize();
// Route message through best available transport
const result = await router.route(message, targetAgent);
```
### 3. Swarm Integration
**Purpose**: High-level API for swarm initialization
**Features**:
- **Simple API**: Single function call to initialize swarms
- **Transport Abstraction**: Hide transport complexity
- **Topology Configuration**: Easy topology selection
- **Agent Management**: Register/unregister agents
- **Statistics**: Unified stats across transport layers
**API**:
```typescript
import { initSwarm } from './swarm/index.js';
const swarm = await initSwarm({
swarmId: 'my-swarm',
topology: 'mesh',
transport: 'quic',
maxAgents: 10,
quicPort: 4433
});
await swarm.registerAgent({
id: 'agent-1',
role: 'worker',
host: 'localhost',
port: 4434,
capabilities: ['compute']
});
const stats = swarm.getStats();
await swarm.shutdown();
```
## Supported Topologies
### Mesh Topology
- **Description**: Peer-to-peer, all agents connect to all others
- **Use Case**: Maximum redundancy, distributed consensus
- **Routing**: Direct agent-to-agent communication
- **Scalability**: O(n²) connections, best for <20 agents
### Hierarchical Topology
- **Description**: Coordinator-worker architecture
- **Use Case**: Centralized task distribution
- **Routing**: Workers Coordinator Workers
- **Scalability**: O(n) connections, scales to 100+ agents
### Ring Topology
- **Description**: Circular agent connections
- **Use Case**: Token-ring protocols, ordered processing
- **Routing**: Forward to next agent in ring
- **Scalability**: O(n) connections, predictable latency
### Star Topology
- **Description**: Central hub with spoke agents
- **Use Case**: Simple coordination, fan-out/fan-in
- **Routing**: All messages through central coordinator
- **Scalability**: O(n) connections, single point of coordination
## Transport Selection Strategy
### QUIC Transport (Recommended)
**Advantages**:
- **0-RTT Connection**: Near-instant connection establishment
- **Stream Multiplexing**: 100+ concurrent streams per connection
- **Connection Migration**: Survives network changes (WiFi Cellular)
- **No Head-of-Line Blocking**: Independent stream processing
- **QPACK Compression**: Efficient header compression
**Performance**:
- Latency: 10-50ms (0-RTT enabled)
- Throughput: 1-10 Gbps (network dependent)
- Concurrent Streams: 100+ per connection
- Connection Overhead: Minimal with pooling
**Use Cases**:
- Real-time agent coordination
- High-frequency message passing
- Distributed computation
- Mobile/unstable networks
### HTTP/2 Transport (Fallback)
**Advantages**:
- **Wide Compatibility**: Universal support
- **Proven Technology**: Battle-tested in production
- **TLS Security**: Standard encryption
**Performance**:
- Latency: 50-200ms (1-RTT handshake)
- Throughput: 1-10 Gbps (network dependent)
- Concurrent Streams: 100 per connection
- Connection Overhead: Higher due to TCP
**Use Cases**:
- Fallback when QUIC unavailable
- Firewall/proxy traversal
- Legacy infrastructure
### Auto Mode (Default)
**Strategy**:
1. Attempt QUIC connection
2. Fallback to HTTP/2 on failure
3. Continuous health checking
4. Automatic protocol switching
**Configuration**:
```typescript
const router = new TransportRouter({
protocol: 'auto',
enableFallback: true
});
```
## Message Flow
### Mesh Topology Message Flow
```
Agent-1 ──QUIC Stream──► Agent-2
──QUIC Stream──► Agent-3
──QUIC Stream──► Agent-4
```
### Hierarchical Topology Message Flow
```
Worker-1 ──QUIC Stream──► Coordinator
Worker-2 ──QUIC Stream──► Coordinator
Coordinator ──QUIC Stream──► Worker-3
Coordinator ──QUIC Stream──► Worker-4
```
### Ring Topology Message Flow
```
Agent-1 ──QUIC Stream──► Agent-2 ──QUIC Stream──► Agent-3
▲ │
└────────────────── QUIC Stream ◄──────────────────┘
```
### Star Topology Message Flow
```
┌─── Central Coordinator ───┐
│ │
QUIC Stream│ QUIC Stream │QUIC Stream
│ │
Agent-1 Agent-2 Agent-3 Agent-4 Agent-5
```
## State Synchronization
### Automatic State Sync
- **Interval**: Configurable (default: 5 seconds)
- **Mechanism**: Broadcast state updates via QUIC streams
- **Payload**: Swarm topology, agent list, statistics
- **Reliability**: At-least-once delivery
### Heartbeat Mechanism
- **Interval**: Configurable (default: 10 seconds)
- **Purpose**: Agent liveness detection
- **Failure Handling**: Automatic agent unregistration
- **Recovery**: Auto-reconnection on availability
## Statistics & Monitoring
### Per-Agent Statistics
```typescript
const stats = coordinator.getAgentStats('agent-1');
// {
// sent: 1234,
// received: 5678,
// avgLatency: 23.4
// }
```
### Transport Statistics
```typescript
const quicStats = router.getStats('quic');
// {
// protocol: 'quic',
// messagesSent: 10000,
// messagesReceived: 9500,
// bytesTransferred: 1234567,
// averageLatency: 15.2,
// errorRate: 0.001
// }
```
### Swarm Statistics
```typescript
const swarmStats = swarm.getStats();
// {
// swarmId: 'my-swarm',
// topology: 'mesh',
// transport: 'quic',
// coordinatorStats: { ... },
// transportStats: { ... },
// quicAvailable: true
// }
```
## Performance Characteristics
### QUIC vs HTTP/2 Comparison
| Metric | QUIC | HTTP/2 |
|--------|------|--------|
| Connection Establishment | 0-RTT (0ms) | 1-RTT (~50ms) |
| Head-of-Line Blocking | No | Yes |
| Stream Multiplexing | Yes (100+) | Yes (100) |
| Connection Migration | Yes | No |
| Packet Loss Recovery | Stream-level | Connection-level |
| Header Compression | QPACK | HPACK |
| Use Case | Real-time, mobile | General purpose |
### Scalability Benchmarks
**Mesh Topology**:
- 5 agents: ~10ms avg latency, 1000 msg/s
- 10 agents: ~20ms avg latency, 800 msg/s
- 20 agents: ~40ms avg latency, 500 msg/s
**Hierarchical Topology**:
- 10 workers + 1 coordinator: ~15ms avg latency, 2000 msg/s
- 50 workers + 5 coordinators: ~25ms avg latency, 8000 msg/s
- 100 workers + 10 coordinators: ~35ms avg latency, 15000 msg/s
## Security Considerations
### TLS 1.3
- **Encryption**: All QUIC connections use TLS 1.3
- **Certificates**: Configurable certificate paths
- **Peer Verification**: Optional peer certificate verification
### Configuration
```typescript
const config = {
certPath: './certs/cert.pem',
keyPath: './certs/key.pem',
verifyPeer: true
};
```
## Error Handling & Resilience
### Automatic Fallback
- QUIC connection failure HTTP/2 fallback
- Transparent to application layer
- Configurable fallback behavior
### Connection Recovery
- Automatic reconnection on failure
- Exponential backoff strategy
- Connection pool management
### Health Monitoring
- Periodic QUIC health checks
- Automatic protocol switching
- Statistics-based quality monitoring
## Usage Examples
### Example 1: Simple Mesh Swarm
```typescript
import { initSwarm } from './swarm/index.js';
const swarm = await initSwarm({
swarmId: 'compute-swarm',
topology: 'mesh',
transport: 'quic',
maxAgents: 5,
quicPort: 4433
});
// Register compute agents
for (let i = 1; i <= 5; i++) {
await swarm.registerAgent({
id: `compute-${i}`,
role: 'worker',
host: `compute-${i}.local`,
port: 4433 + i,
capabilities: ['compute', 'analyze']
});
}
console.log('Swarm initialized:', swarm.getStats());
```
### Example 2: Hierarchical Task Distribution
```typescript
const swarm = await initSwarm({
swarmId: 'task-swarm',
topology: 'hierarchical',
transport: 'auto',
maxAgents: 20
});
// Register coordinator
await swarm.registerAgent({
id: 'coordinator',
role: 'coordinator',
host: 'coordinator.local',
port: 4433,
capabilities: ['orchestrate', 'aggregate']
});
// Register workers
for (let i = 1; i <= 10; i++) {
await swarm.registerAgent({
id: `worker-${i}`,
role: 'worker',
host: `worker-${i}.local`,
port: 4434 + i,
capabilities: ['compute']
});
}
```
### Example 3: Ring-Based Processing
```typescript
const swarm = await initSwarm({
swarmId: 'pipeline-swarm',
topology: 'ring',
transport: 'quic',
maxAgents: 8
});
// Register processing stages
const stages = ['ingest', 'transform', 'enrich', 'validate', 'store'];
for (let i = 0; i < stages.length; i++) {
await swarm.registerAgent({
id: `stage-${stages[i]}`,
role: 'worker',
host: `stage-${i}.local`,
port: 4433 + i,
capabilities: [stages[i]]
});
}
```
## Configuration Reference
### QuicCoordinator Options
```typescript
interface QuicCoordinatorConfig {
swarmId: string; // Unique swarm identifier
topology: SwarmTopology; // mesh | hierarchical | ring | star
maxAgents: number; // Maximum agents in swarm
quicClient: QuicClient; // QUIC client instance
connectionPool: QuicConnectionPool; // Connection pool
heartbeatInterval?: number; // Heartbeat interval (ms)
statesSyncInterval?: number; // State sync interval (ms)
enableCompression?: boolean; // Enable message compression
}
```
### TransportRouter Options
```typescript
interface TransportConfig {
protocol: TransportProtocol; // quic | http2 | auto
enableFallback: boolean; // Enable HTTP/2 fallback
quicConfig?: {
host: string;
port: number;
maxConnections: number;
certPath?: string;
keyPath?: string;
};
http2Config?: {
host: string;
port: number;
maxConnections: number;
secure: boolean;
};
}
```
### Swarm Init Options
```typescript
interface SwarmInitOptions {
swarmId: string; // Unique swarm identifier
topology: SwarmTopology; // Swarm topology type
transport?: TransportProtocol; // Transport protocol (default: auto)
maxAgents?: number; // Maximum agents (default: 10)
quicPort?: number; // QUIC port (default: 4433)
quicHost?: string; // QUIC host (default: localhost)
enableFallback?: boolean; // Enable fallback (default: true)
}
```
## Migration Guide
### From HTTP-only to QUIC-enabled Swarms
**Before**:
```typescript
// Old HTTP-only swarm initialization
const swarm = await initHttpSwarm({
topology: 'mesh',
maxAgents: 10
});
```
**After**:
```typescript
// New QUIC-enabled swarm initialization
const swarm = await initSwarm({
swarmId: 'my-swarm',
topology: 'mesh',
transport: 'quic', // or 'auto' for automatic
maxAgents: 10,
quicPort: 4433
});
```
**Benefits**:
- 10-50x faster connection establishment (0-RTT)
- No head-of-line blocking
- Better mobile network support
- Connection migration support
- Transparent HTTP/2 fallback
## Troubleshooting
### QUIC Connection Failures
**Symptom**: "QUIC not available" errors
**Solutions**:
1. Check WASM module is properly loaded
2. Verify TLS certificates exist
3. Ensure firewall allows UDP traffic on QUIC port
4. Enable fallback to HTTP/2: `enableFallback: true`
### High Latency
**Symptom**: Messages taking >100ms
**Solutions**:
1. Check network conditions
2. Verify QUIC is being used (not HTTP/2 fallback)
3. Reduce state sync interval
4. Enable compression
5. Check for packet loss in QUIC stats
### Connection Pool Exhaustion
**Symptom**: "Maximum connections reached" errors
**Solutions**:
1. Increase `maxConnections` in config
2. Implement connection reuse
3. Close unused connections
4. Monitor connection stats
## Future Enhancements
### Planned Features
- [ ] Dynamic topology reconfiguration
- [ ] Multi-datacenter support
- [ ] Advanced routing algorithms
- [ ] Message priority queues
- [ ] Encryption at rest for state
- [ ] WebTransport support
- [ ] gRPC-over-QUIC integration
### Performance Optimizations
- [ ] Zero-copy message passing
- [ ] Custom QPACK dictionaries
- [ ] Adaptive congestion control
- [ ] Connection bonding
- [ ] Stream prioritization
## References
- [QUIC Protocol RFC 9000](https://www.rfc-editor.org/rfc/rfc9000.html)
- [HTTP/3 RFC 9114](https://www.rfc-editor.org/rfc/rfc9114.html)
- [QPACK RFC 9204](https://www.rfc-editor.org/rfc/rfc9204.html)
- [agentic-flow Documentation](../README.md)
## License
MIT License - See LICENSE file for details