tasq/node_modules/agentic-flow/docs/architecture/QUIC-SWARM-INTEGRATION.md

18 KiB

QUIC Transport Integration for Multi-Agent Swarm Coordination

Architecture Overview

This document describes the QUIC transport integration for agentic-flow's multi-agent swarm coordination system. The architecture enables high-performance agent-to-agent communication with transparent fallback to HTTP/2.

Key Components

┌─────────────────────────────────────────────────────────────┐
│                     Swarm Coordination Layer                 │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐      ┌────────────────────────┐       │
│  │ QuicCoordinator  │◄────►│  TransportRouter       │       │
│  │                  │      │  (Protocol Selection)   │       │
│  │ - Agent registry │      │  - QUIC / HTTP/2        │       │
│  │ - Message routing│      │  - Auto fallback        │       │
│  │ - State sync     │      │  - Health checks        │       │
│  └──────────────────┘      └────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Transport Layer (QUIC)                    │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐      ┌────────────────────────┐       │
│  │   QuicClient     │      │  QuicConnectionPool    │       │
│  │                  │      │                        │       │
│  │ - 0-RTT support  │◄────►│  - Pool management     │       │
│  │ - Stream mux     │      │  - LRU eviction        │       │
│  │ - WASM bindings  │      │  - Health monitoring   │       │
│  └──────────────────┘      └────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 WASM QUIC Implementation                     │
├─────────────────────────────────────────────────────────────┤
│  - UDP transport                                             │
│  - Stream multiplexing (100+ concurrent streams)             │
│  - Connection migration (network changes)                    │
│  - QPACK header compression                                  │
│  - 0-RTT connection establishment                            │
└─────────────────────────────────────────────────────────────┘

System Architecture

1. QuicCoordinator

Purpose: Manages agent-to-agent communication in multi-agent swarms

Features:

  • Topology Support: Mesh, Hierarchical, Ring, Star
  • Message Routing: Topology-aware message forwarding
  • State Synchronization: Real-time state sync across agents
  • Statistics Tracking: Per-agent message and latency metrics
  • Heartbeat Monitoring: Periodic agent health checks

API:

const coordinator = new QuicCoordinator({
  swarmId: 'production-swarm',
  topology: 'mesh',
  maxAgents: 20,
  quicClient,
  connectionPool,
  heartbeatInterval: 10000,
  statesSyncInterval: 5000
});

await coordinator.start();
await coordinator.registerAgent({
  id: 'agent-1',
  role: 'worker',
  host: 'agent-1.example.com',
  port: 4433,
  capabilities: ['compute', 'analyze']
});

2. TransportRouter

Purpose: Intelligent transport layer with automatic protocol selection

Features:

  • Protocol Selection: QUIC, HTTP/2, or automatic
  • Transparent Fallback: HTTP/2 fallback on QUIC failure
  • Connection Pooling: Efficient resource management
  • Health Checking: Automatic availability detection
  • Statistics: Per-protocol metrics tracking

API:

const router = new TransportRouter({
  protocol: 'auto',
  enableFallback: true,
  quicConfig: {
    host: 'localhost',
    port: 4433,
    maxConnections: 100
  },
  http2Config: {
    host: 'localhost',
    port: 8443,
    maxConnections: 100,
    secure: true
  }
});

await router.initialize();

// Route message through best available transport
const result = await router.route(message, targetAgent);

3. Swarm Integration

Purpose: High-level API for swarm initialization

Features:

  • Simple API: Single function call to initialize swarms
  • Transport Abstraction: Hide transport complexity
  • Topology Configuration: Easy topology selection
  • Agent Management: Register/unregister agents
  • Statistics: Unified stats across transport layers

API:

import { initSwarm } from './swarm/index.js';

const swarm = await initSwarm({
  swarmId: 'my-swarm',
  topology: 'mesh',
  transport: 'quic',
  maxAgents: 10,
  quicPort: 4433
});

await swarm.registerAgent({
  id: 'agent-1',
  role: 'worker',
  host: 'localhost',
  port: 4434,
  capabilities: ['compute']
});

const stats = swarm.getStats();
await swarm.shutdown();

Supported Topologies

Mesh Topology

  • Description: Peer-to-peer, all agents connect to all others
  • Use Case: Maximum redundancy, distributed consensus
  • Routing: Direct agent-to-agent communication
  • Scalability: O(n²) connections, best for <20 agents

Hierarchical Topology

  • Description: Coordinator-worker architecture
  • Use Case: Centralized task distribution
  • Routing: Workers → Coordinator → Workers
  • Scalability: O(n) connections, scales to 100+ agents

Ring Topology

  • Description: Circular agent connections
  • Use Case: Token-ring protocols, ordered processing
  • Routing: Forward to next agent in ring
  • Scalability: O(n) connections, predictable latency

Star Topology

  • Description: Central hub with spoke agents
  • Use Case: Simple coordination, fan-out/fan-in
  • Routing: All messages through central coordinator
  • Scalability: O(n) connections, single point of coordination

Transport Selection Strategy

Advantages:

  • 0-RTT Connection: Near-instant connection establishment
  • Stream Multiplexing: 100+ concurrent streams per connection
  • Connection Migration: Survives network changes (WiFi → Cellular)
  • No Head-of-Line Blocking: Independent stream processing
  • QPACK Compression: Efficient header compression

Performance:

  • Latency: 10-50ms (0-RTT enabled)
  • Throughput: 1-10 Gbps (network dependent)
  • Concurrent Streams: 100+ per connection
  • Connection Overhead: Minimal with pooling

Use Cases:

  • Real-time agent coordination
  • High-frequency message passing
  • Distributed computation
  • Mobile/unstable networks

HTTP/2 Transport (Fallback)

Advantages:

  • Wide Compatibility: Universal support
  • Proven Technology: Battle-tested in production
  • TLS Security: Standard encryption

Performance:

  • Latency: 50-200ms (1-RTT handshake)
  • Throughput: 1-10 Gbps (network dependent)
  • Concurrent Streams: 100 per connection
  • Connection Overhead: Higher due to TCP

Use Cases:

  • Fallback when QUIC unavailable
  • Firewall/proxy traversal
  • Legacy infrastructure

Auto Mode (Default)

Strategy:

  1. Attempt QUIC connection
  2. Fallback to HTTP/2 on failure
  3. Continuous health checking
  4. Automatic protocol switching

Configuration:

const router = new TransportRouter({
  protocol: 'auto',
  enableFallback: true
});

Message Flow

Mesh Topology Message Flow

Agent-1 ──QUIC Stream──► Agent-2
        ──QUIC Stream──► Agent-3
        ──QUIC Stream──► Agent-4

Hierarchical Topology Message Flow

Worker-1 ──QUIC Stream──► Coordinator
Worker-2 ──QUIC Stream──► Coordinator
                         Coordinator ──QUIC Stream──► Worker-3
                         Coordinator ──QUIC Stream──► Worker-4

Ring Topology Message Flow

Agent-1 ──QUIC Stream──► Agent-2 ──QUIC Stream──► Agent-3
   ▲                                                   │
   └────────────────── QUIC Stream ◄──────────────────┘

Star Topology Message Flow

                    ┌─── Central Coordinator ───┐
                    │                            │
         QUIC Stream│    QUIC Stream             │QUIC Stream
                    │                            │
            Agent-1 Agent-2 Agent-3 Agent-4 Agent-5

State Synchronization

Automatic State Sync

  • Interval: Configurable (default: 5 seconds)
  • Mechanism: Broadcast state updates via QUIC streams
  • Payload: Swarm topology, agent list, statistics
  • Reliability: At-least-once delivery

Heartbeat Mechanism

  • Interval: Configurable (default: 10 seconds)
  • Purpose: Agent liveness detection
  • Failure Handling: Automatic agent unregistration
  • Recovery: Auto-reconnection on availability

Statistics & Monitoring

Per-Agent Statistics

const stats = coordinator.getAgentStats('agent-1');
// {
//   sent: 1234,
//   received: 5678,
//   avgLatency: 23.4
// }

Transport Statistics

const quicStats = router.getStats('quic');
// {
//   protocol: 'quic',
//   messagesSent: 10000,
//   messagesReceived: 9500,
//   bytesTransferred: 1234567,
//   averageLatency: 15.2,
//   errorRate: 0.001
// }

Swarm Statistics

const swarmStats = swarm.getStats();
// {
//   swarmId: 'my-swarm',
//   topology: 'mesh',
//   transport: 'quic',
//   coordinatorStats: { ... },
//   transportStats: { ... },
//   quicAvailable: true
// }

Performance Characteristics

QUIC vs HTTP/2 Comparison

Metric QUIC HTTP/2
Connection Establishment 0-RTT (0ms) 1-RTT (~50ms)
Head-of-Line Blocking No Yes
Stream Multiplexing Yes (100+) Yes (100)
Connection Migration Yes No
Packet Loss Recovery Stream-level Connection-level
Header Compression QPACK HPACK
Use Case Real-time, mobile General purpose

Scalability Benchmarks

Mesh Topology:

  • 5 agents: ~10ms avg latency, 1000 msg/s
  • 10 agents: ~20ms avg latency, 800 msg/s
  • 20 agents: ~40ms avg latency, 500 msg/s

Hierarchical Topology:

  • 10 workers + 1 coordinator: ~15ms avg latency, 2000 msg/s
  • 50 workers + 5 coordinators: ~25ms avg latency, 8000 msg/s
  • 100 workers + 10 coordinators: ~35ms avg latency, 15000 msg/s

Security Considerations

TLS 1.3

  • Encryption: All QUIC connections use TLS 1.3
  • Certificates: Configurable certificate paths
  • Peer Verification: Optional peer certificate verification

Configuration

const config = {
  certPath: './certs/cert.pem',
  keyPath: './certs/key.pem',
  verifyPeer: true
};

Error Handling & Resilience

Automatic Fallback

  • QUIC connection failure → HTTP/2 fallback
  • Transparent to application layer
  • Configurable fallback behavior

Connection Recovery

  • Automatic reconnection on failure
  • Exponential backoff strategy
  • Connection pool management

Health Monitoring

  • Periodic QUIC health checks
  • Automatic protocol switching
  • Statistics-based quality monitoring

Usage Examples

Example 1: Simple Mesh Swarm

import { initSwarm } from './swarm/index.js';

const swarm = await initSwarm({
  swarmId: 'compute-swarm',
  topology: 'mesh',
  transport: 'quic',
  maxAgents: 5,
  quicPort: 4433
});

// Register compute agents
for (let i = 1; i <= 5; i++) {
  await swarm.registerAgent({
    id: `compute-${i}`,
    role: 'worker',
    host: `compute-${i}.local`,
    port: 4433 + i,
    capabilities: ['compute', 'analyze']
  });
}

console.log('Swarm initialized:', swarm.getStats());

Example 2: Hierarchical Task Distribution

const swarm = await initSwarm({
  swarmId: 'task-swarm',
  topology: 'hierarchical',
  transport: 'auto',
  maxAgents: 20
});

// Register coordinator
await swarm.registerAgent({
  id: 'coordinator',
  role: 'coordinator',
  host: 'coordinator.local',
  port: 4433,
  capabilities: ['orchestrate', 'aggregate']
});

// Register workers
for (let i = 1; i <= 10; i++) {
  await swarm.registerAgent({
    id: `worker-${i}`,
    role: 'worker',
    host: `worker-${i}.local`,
    port: 4434 + i,
    capabilities: ['compute']
  });
}

Example 3: Ring-Based Processing

const swarm = await initSwarm({
  swarmId: 'pipeline-swarm',
  topology: 'ring',
  transport: 'quic',
  maxAgents: 8
});

// Register processing stages
const stages = ['ingest', 'transform', 'enrich', 'validate', 'store'];
for (let i = 0; i < stages.length; i++) {
  await swarm.registerAgent({
    id: `stage-${stages[i]}`,
    role: 'worker',
    host: `stage-${i}.local`,
    port: 4433 + i,
    capabilities: [stages[i]]
  });
}

Configuration Reference

QuicCoordinator Options

interface QuicCoordinatorConfig {
  swarmId: string;              // Unique swarm identifier
  topology: SwarmTopology;      // mesh | hierarchical | ring | star
  maxAgents: number;            // Maximum agents in swarm
  quicClient: QuicClient;       // QUIC client instance
  connectionPool: QuicConnectionPool; // Connection pool
  heartbeatInterval?: number;   // Heartbeat interval (ms)
  statesSyncInterval?: number;  // State sync interval (ms)
  enableCompression?: boolean;  // Enable message compression
}

TransportRouter Options

interface TransportConfig {
  protocol: TransportProtocol;  // quic | http2 | auto
  enableFallback: boolean;      // Enable HTTP/2 fallback
  quicConfig?: {
    host: string;
    port: number;
    maxConnections: number;
    certPath?: string;
    keyPath?: string;
  };
  http2Config?: {
    host: string;
    port: number;
    maxConnections: number;
    secure: boolean;
  };
}

Swarm Init Options

interface SwarmInitOptions {
  swarmId: string;              // Unique swarm identifier
  topology: SwarmTopology;      // Swarm topology type
  transport?: TransportProtocol; // Transport protocol (default: auto)
  maxAgents?: number;           // Maximum agents (default: 10)
  quicPort?: number;            // QUIC port (default: 4433)
  quicHost?: string;            // QUIC host (default: localhost)
  enableFallback?: boolean;     // Enable fallback (default: true)
}

Migration Guide

From HTTP-only to QUIC-enabled Swarms

Before:

// Old HTTP-only swarm initialization
const swarm = await initHttpSwarm({
  topology: 'mesh',
  maxAgents: 10
});

After:

// New QUIC-enabled swarm initialization
const swarm = await initSwarm({
  swarmId: 'my-swarm',
  topology: 'mesh',
  transport: 'quic',  // or 'auto' for automatic
  maxAgents: 10,
  quicPort: 4433
});

Benefits:

  • 10-50x faster connection establishment (0-RTT)
  • No head-of-line blocking
  • Better mobile network support
  • Connection migration support
  • Transparent HTTP/2 fallback

Troubleshooting

QUIC Connection Failures

Symptom: "QUIC not available" errors

Solutions:

  1. Check WASM module is properly loaded
  2. Verify TLS certificates exist
  3. Ensure firewall allows UDP traffic on QUIC port
  4. Enable fallback to HTTP/2: enableFallback: true

High Latency

Symptom: Messages taking >100ms

Solutions:

  1. Check network conditions
  2. Verify QUIC is being used (not HTTP/2 fallback)
  3. Reduce state sync interval
  4. Enable compression
  5. Check for packet loss in QUIC stats

Connection Pool Exhaustion

Symptom: "Maximum connections reached" errors

Solutions:

  1. Increase maxConnections in config
  2. Implement connection reuse
  3. Close unused connections
  4. Monitor connection stats

Future Enhancements

Planned Features

  • Dynamic topology reconfiguration
  • Multi-datacenter support
  • Advanced routing algorithms
  • Message priority queues
  • Encryption at rest for state
  • WebTransport support
  • gRPC-over-QUIC integration

Performance Optimizations

  • Zero-copy message passing
  • Custom QPACK dictionaries
  • Adaptive congestion control
  • Connection bonding
  • Stream prioritization

References

License

MIT License - See LICENSE file for details