tasq/node_modules/@ruvector/ruvllm/README.md

# @ruvector/ruvllm

**Build AI that learns and improves from every interaction.**

RuvLLM is a self-learning language model toolkit that gets smarter over time. Unlike traditional LLMs that remain static after training, RuvLLM continuously adapts to your use case while remembering what it learned before.

## What Makes RuvLLM Different?

Traditional LLMs forget old knowledge when learning new things (called "catastrophic forgetting"). RuvLLM solves this with three key innovations:

1. **It Learns Without Forgetting** - Uses tiny parameter updates (LoRA) and memory protection (EWC++) to learn new patterns while preserving existing knowledge

2. **It Remembers Context** - Built-in vector memory stores and retrieves relevant information instantly using similarity search

3. **It Routes Intelligently** - Automatically selects the right model size and parameters based on query complexity, saving resources on simple tasks

## Key Features

| Feature | What It Does | Why It Matters |
|---------|-------------|----------------|
| **Adaptive Learning** | Learns from user feedback in real-time | Improves accuracy over time without retraining |
| **Memory System** | Stores context with instant similarity search | Finds relevant information in microseconds |
| **Smart Routing** | Picks optimal model/settings per query | Reduces costs, improves response quality |
| **SIMD Acceleration** | Uses CPU vector instructions (AVX2/NEON) | 10-50x faster vector operations |
| **Federated Learning** | Train across devices without sharing data | Privacy-preserving distributed learning |
| **LoRA Adapters** | Parameter-efficient fine-tuning with low-rank matrices | Fast adaptation with minimal memory |
| **EWC++ Protection** | Elastic Weight Consolidation prevents forgetting | Learn new tasks without losing old knowledge |
| **SafeTensors Export** | HuggingFace-compatible model serialization | Share models with the ML ecosystem |
| **Training Pipeline** | Full training infrastructure with schedulers | Production-ready model training |
| **Session Management** | Stateful conversations with streaming | Build chat applications easily |

## Installation

```bash
npm install @ruvector/ruvllm
```

Or run directly:

```bash
npx @ruvector/ruvllm info
```

## Quick Start Tutorial

### 1. Basic Query

```typescript
import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM();

// Ask a question - routing happens automatically
const response = llm.query('Explain neural networks simply');
console.log(response.text);
// Output: "Neural networks are computing systems inspired by..."

console.log(`Used model: ${response.model}`);
console.log(`Confidence: ${(response.confidence * 100).toFixed(1)}%`);
```

### 2. Teaching the System

```typescript
// Query and get a response
const response = llm.query('What is the capital of France?');

// Provide feedback - the system learns from this
llm.feedback({
  requestId: response.requestId,
  rating: 5,  // 1-5 scale
  correction: 'Paris is the capital and largest city of France'
});

// Future similar queries will be more accurate
```

### 3. Using Memory

```typescript
// Store important context
llm.addMemory('Company policy: All returns accepted within 30 days', {
  category: 'policy',
  department: 'customer-service'
});

llm.addMemory('Product X launched in March 2024 with features A, B, C', {
  category: 'product',
  name: 'Product X'
});

// Search memory for relevant context
const results = llm.searchMemory('return policy', 5);
console.log(results[0].content);
// Output: "Company policy: All returns accepted within 30 days"
console.log(`Relevance: ${(results[0].score * 100).toFixed(1)}%`);
```

### 4. Computing Similarity

```typescript
import { SimdOps } from '@ruvector/ruvllm';

const simd = new SimdOps();

// Compare two texts
const score = llm.similarity(
  'How do I reset my password?',
  'I forgot my login credentials'
);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);
// Output: "Similarity: 78.3%"

// Fast vector operations
const embedding1 = llm.embed('machine learning');
const embedding2 = llm.embed('deep learning');
const similarity = simd.cosineSimilarity(embedding1, embedding2);
```

### 5. Batch Processing

```typescript
// Process multiple queries efficiently
const batch = llm.batchQuery({
  queries: [
    'What is AI?',
    'Explain machine learning',
    'How do neural networks work?'
  ],
  config: { temperature: 0.7 }
});

batch.responses.forEach((r, i) => {
  console.log(`Query ${i + 1}: ${r.text.slice(0, 50)}...`);
});
console.log(`Total time: ${batch.totalLatencyMs}ms`);
```

## CLI Commands

```bash
# Get system information
ruvllm info

# Query the model
ruvllm query "What is quantum computing?"

# Generate text with custom settings
ruvllm generate "Write a product description for:" --temperature 0.8 --max-tokens 200

# Memory operations
ruvllm memory add "Important fact to remember"
ruvllm memory search "fact" --k 10

# Compare texts
ruvllm similarity "hello world" "hi there"

# Get embeddings
ruvllm embed "your text here"

# Run performance benchmark
ruvllm benchmark --dims 768 --iterations 5000

# View statistics
ruvllm stats --json
```

## Benchmarks

*Benchmarked in Docker (node:20-alpine, x64) - December 2024*

### Core Operations

| Operation | Time | Throughput |
|-----------|------|------------|
| Query (short) | 1.49μs | **670K ops/s** |
| Query (long) | 874ns | **1.14M ops/s** |
| Generate | 88ns | **11.4M ops/s** |
| Route | 92ns | **10.9M ops/s** |
| Embed (256d) | 10.6μs | **94K ops/s** |
| Embed (768d) | 7.1μs | **140K ops/s** |

### SIMD Vector Operations

| Operation | 128d | 256d | 512d | 768d |
|-----------|------|------|------|------|
| Dot Product | 214ns / **4.67M ops/s** | 318ns / **3.15M ops/s** | 609ns / **1.64M ops/s** | 908ns / **1.10M ops/s** |
| Cosine Similarity | 233ns / **4.30M ops/s** | 335ns / **2.99M ops/s** | 652ns / **1.53M ops/s** | 972ns / **1.03M ops/s** |
| L2 Distance | 195ns / **5.14M ops/s** | 315ns / **3.18M ops/s** | 612ns / **1.63M ops/s** | 929ns / **1.08M ops/s** |

### LoRA Adapter Performance

| Operation | 64d | 128d | 256d |
|-----------|-----|------|------|
| Forward (r=4) | 6.09μs / **164K ops/s** | 2.74μs / **365K ops/s** | 4.83μs / **207K ops/s** |
| Forward (r=8) | 2.17μs / **462K ops/s** | 4.30μs / **233K ops/s** | 8.99μs / **111K ops/s** |
| Forward (r=16) | 4.85μs / **206K ops/s** | 9.05μs / **111K ops/s** | 18.3μs / **55K ops/s** |
| Backward (r=8) | - | 110μs / **9.1K ops/s** | - |
| Batch (100) | - | 467μs / **2.1K ops/s** | - |

### Memory Operations

| Operation | Time | Throughput |
|-----------|------|------------|
| Add Memory | 5.3μs | **189K ops/s** |
| Search (k=5) | 45.6μs | **21.9K ops/s** |
| Search (k=10) | 28.3μs | **35.3K ops/s** |
| Search (k=20) | 33.1μs | **30.2K ops/s** |

### SONA Learning System

| Operation | Time | Throughput |
|-----------|------|------------|
| Pattern Store | 14.4μs | **69.5K ops/s** |
| Pattern Find Similar | 224μs | **4.5K ops/s** |
| EWC Register Task | 6.5μs | **154K ops/s** |
| EWC Compute Penalty | 501μs | **2.0K ops/s** |
| Trajectory Build | 1.24μs | **807K ops/s** |

### Federated Learning

| Operation | Time | Throughput |
|-----------|------|------------|
| Agent Create | 7.8μs | **128K ops/s** |
| Process Task | 7.9μs | **126K ops/s** |
| Apply LoRA | 12.6μs | **79.6K ops/s** |
| Export State | 48.9μs | **20.4K ops/s** |
| Aggregate | 5.26ms | **190 ops/s** |

### Session & Streaming

| Operation | Time | Throughput |
|-----------|------|------------|
| Session Create | 1.45μs | **690K ops/s** |
| Session Chat | 3.28μs | **305K ops/s** |
| Session Export | 3.91ms | **255 ops/s** |
| Session Import | 1.60ms | **625 ops/s** |

### Training Pipeline

| Operation | Time |
|-----------|------|
| Pipeline Create | 70.6μs |
| Add Data (100 samples) | 70.6μs |
| Train (32 samples, 3 epochs) | 1.33s |

### Export/Import

| Operation | Time | Throughput |
|-----------|------|------------|
| SafeTensors Write | 67.3μs | **14.9K ops/s** |
| SafeTensors Read | 102μs | **9.8K ops/s** |
| LoRA to JSON | 87.9μs | **11.4K ops/s** |
| LoRA from JSON | 86.0μs | **11.6K ops/s** |

### Performance Highlights

- **Fastest**: Generate at **11.4M ops/s**, Route at **10.9M ops/s**
- **Vector Ops**: Up to **5.14M ops/s** for L2 distance (128d)
- **LoRA Forward**: Up to **462K ops/s** (64d, rank-8)
- **Memory Search**: **35K ops/s** (k=10)
- **Session Create**: **690K ops/s**

## Configuration

```typescript
const llm = new RuvLLM({
  // Embedding settings
  embeddingDim: 768,        // Vector dimensions (384, 768, 1024)

  // Memory settings
  hnswM: 16,                // Graph connectivity (higher = better recall, more memory)
  hnswEfConstruction: 100,  // Build quality (higher = better index, slower build)
  hnswEfSearch: 64,         // Search quality (higher = better recall, slower search)

  // Learning settings
  learningEnabled: true,    // Enable adaptive learning
  qualityThreshold: 0.7,    // Min confidence to skip learning
  ewcLambda: 2000,          // Memory protection strength

  // Router settings
  routerHiddenDim: 128,     // Router network size
});
```

## Platform Support

Native acceleration available on:

| Platform | Architecture | SIMD Support |
|----------|-------------|--------------|
| macOS | Apple Silicon (M1/M2/M3) | NEON |
| macOS | Intel x64 | AVX2, SSE4.1 |
| Linux | x64 | AVX2, AVX-512, SSE4.1 |
| Linux | ARM64 | NEON |
| Windows | x64 | AVX2, SSE4.1 |

Falls back to optimized JavaScript on unsupported platforms.

## Real-World Use Cases

### Customer Support Bot
```typescript
// Store FAQ and policies
faqs.forEach(faq => llm.addMemory(faq.answer, { question: faq.question }));

// Answer questions with context
function answerQuestion(question: string) {
  const context = llm.searchMemory(question, 3);
  const prompt = `Context:\n${context.map(c => c.content).join('\n')}\n\nQuestion: ${question}`;
  return llm.query(prompt);
}
```

### Document Search
```typescript
// Index documents
documents.forEach(doc => {
  llm.addMemory(doc.content, {
    title: doc.title,
    path: doc.path
  });
});

// Semantic search
const results = llm.searchMemory('quarterly revenue growth', 10);
```

### Personalized Recommendations
```typescript
// Learn from user interactions
function recordInteraction(userId: string, itemId: string, rating: number) {
  const response = llm.query(`User ${userId} rated ${itemId}`);
  llm.feedback({ requestId: response.requestId, rating });
}

// Get recommendations
function recommend(userId: string) {
  return llm.searchMemory(`preferences for user ${userId}`, 10);
}
```

## API Reference

### RuvLLM Class

| Method | Description |
|--------|-------------|
| `query(text, config?)` | Query with automatic model routing |
| `generate(prompt, config?)` | Generate text with given prompt |
| `route(text)` | Get routing decision without executing |
| `addMemory(content, metadata?)` | Store content in vector memory |
| `searchMemory(text, k?)` | Find similar content (default k=10) |
| `feedback(fb)` | Submit feedback for learning |
| `embed(text)` | Get embedding vector for text |
| `similarity(t1, t2)` | Compute similarity between texts |
| `stats()` | Get engine statistics |
| `forceLearn()` | Trigger immediate learning cycle |

### SimdOps Class

| Method | Description |
|--------|-------------|
| `dotProduct(a, b)` | Vector dot product |
| `cosineSimilarity(a, b)` | Cosine similarity (0-1) |
| `l2Distance(a, b)` | Euclidean distance |
| `normalize(v)` | Normalize to unit length |
| `softmax(v)` | Softmax activation |
| `relu(v)` | ReLU activation |
| `gelu(v)` | GELU activation |
| `layerNorm(v, eps?)` | Layer normalization |
| `matvec(m, v)` | Matrix-vector multiply |

## Troubleshooting

**Q: Native module not loading?**
```bash
ruvllm info  # Check if native is loaded
```
If "Native: Fallback", install platform-specific package manually:
```bash
npm install @ruvector/ruvllm-darwin-arm64  # For Apple Silicon
```

**Q: Memory usage too high?**
Reduce HNSW parameters:
```typescript
const llm = new RuvLLM({ hnswM: 8, hnswEfConstruction: 50 });
```

**Q: Learning not improving results?**
Check that feedback is being processed:
```typescript
const stats = llm.stats();
console.log(`Patterns learned: ${stats.patternsLearned}`);
```

## License

MIT OR Apache-2.0

## Links

- [GitHub Repository](https://github.com/ruvnet/ruvector)
- [Documentation](https://github.com/ruvnet/ruvector/tree/main/examples/ruvLLM)
- [Issue Tracker](https://github.com/ruvnet/ruvector/issues)