407 lines
12 KiB
Markdown
407 lines
12 KiB
Markdown
# @ruvector/ruvllm
|
|
|
|
**Build AI that learns and improves from every interaction.**
|
|
|
|
RuvLLM is a self-learning language model toolkit that gets smarter over time. Unlike traditional LLMs that remain static after training, RuvLLM continuously adapts to your use case while remembering what it learned before.
|
|
|
|
## What Makes RuvLLM Different?
|
|
|
|
Traditional LLMs forget old knowledge when learning new things (called "catastrophic forgetting"). RuvLLM solves this with three key innovations:
|
|
|
|
1. **It Learns Without Forgetting** - Uses tiny parameter updates (LoRA) and memory protection (EWC++) to learn new patterns while preserving existing knowledge
|
|
|
|
2. **It Remembers Context** - Built-in vector memory stores and retrieves relevant information instantly using similarity search
|
|
|
|
3. **It Routes Intelligently** - Automatically selects the right model size and parameters based on query complexity, saving resources on simple tasks
|
|
|
|
## Key Features
|
|
|
|
| Feature | What It Does | Why It Matters |
|
|
|---------|-------------|----------------|
|
|
| **Adaptive Learning** | Learns from user feedback in real-time | Improves accuracy over time without retraining |
|
|
| **Memory System** | Stores context with instant similarity search | Finds relevant information in microseconds |
|
|
| **Smart Routing** | Picks optimal model/settings per query | Reduces costs, improves response quality |
|
|
| **SIMD Acceleration** | Uses CPU vector instructions (AVX2/NEON) | 10-50x faster vector operations |
|
|
| **Federated Learning** | Train across devices without sharing data | Privacy-preserving distributed learning |
|
|
| **LoRA Adapters** | Parameter-efficient fine-tuning with low-rank matrices | Fast adaptation with minimal memory |
|
|
| **EWC++ Protection** | Elastic Weight Consolidation prevents forgetting | Learn new tasks without losing old knowledge |
|
|
| **SafeTensors Export** | HuggingFace-compatible model serialization | Share models with the ML ecosystem |
|
|
| **Training Pipeline** | Full training infrastructure with schedulers | Production-ready model training |
|
|
| **Session Management** | Stateful conversations with streaming | Build chat applications easily |
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
npm install @ruvector/ruvllm
|
|
```
|
|
|
|
Or run directly:
|
|
|
|
```bash
|
|
npx @ruvector/ruvllm info
|
|
```
|
|
|
|
## Quick Start Tutorial
|
|
|
|
### 1. Basic Query
|
|
|
|
```typescript
|
|
import { RuvLLM } from '@ruvector/ruvllm';
|
|
|
|
const llm = new RuvLLM();
|
|
|
|
// Ask a question - routing happens automatically
|
|
const response = llm.query('Explain neural networks simply');
|
|
console.log(response.text);
|
|
// Output: "Neural networks are computing systems inspired by..."
|
|
|
|
console.log(`Used model: ${response.model}`);
|
|
console.log(`Confidence: ${(response.confidence * 100).toFixed(1)}%`);
|
|
```
|
|
|
|
### 2. Teaching the System
|
|
|
|
```typescript
|
|
// Query and get a response
|
|
const response = llm.query('What is the capital of France?');
|
|
|
|
// Provide feedback - the system learns from this
|
|
llm.feedback({
|
|
requestId: response.requestId,
|
|
rating: 5, // 1-5 scale
|
|
correction: 'Paris is the capital and largest city of France'
|
|
});
|
|
|
|
// Future similar queries will be more accurate
|
|
```
|
|
|
|
### 3. Using Memory
|
|
|
|
```typescript
|
|
// Store important context
|
|
llm.addMemory('Company policy: All returns accepted within 30 days', {
|
|
category: 'policy',
|
|
department: 'customer-service'
|
|
});
|
|
|
|
llm.addMemory('Product X launched in March 2024 with features A, B, C', {
|
|
category: 'product',
|
|
name: 'Product X'
|
|
});
|
|
|
|
// Search memory for relevant context
|
|
const results = llm.searchMemory('return policy', 5);
|
|
console.log(results[0].content);
|
|
// Output: "Company policy: All returns accepted within 30 days"
|
|
console.log(`Relevance: ${(results[0].score * 100).toFixed(1)}%`);
|
|
```
|
|
|
|
### 4. Computing Similarity
|
|
|
|
```typescript
|
|
import { SimdOps } from '@ruvector/ruvllm';
|
|
|
|
const simd = new SimdOps();
|
|
|
|
// Compare two texts
|
|
const score = llm.similarity(
|
|
'How do I reset my password?',
|
|
'I forgot my login credentials'
|
|
);
|
|
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);
|
|
// Output: "Similarity: 78.3%"
|
|
|
|
// Fast vector operations
|
|
const embedding1 = llm.embed('machine learning');
|
|
const embedding2 = llm.embed('deep learning');
|
|
const similarity = simd.cosineSimilarity(embedding1, embedding2);
|
|
```
|
|
|
|
### 5. Batch Processing
|
|
|
|
```typescript
|
|
// Process multiple queries efficiently
|
|
const batch = llm.batchQuery({
|
|
queries: [
|
|
'What is AI?',
|
|
'Explain machine learning',
|
|
'How do neural networks work?'
|
|
],
|
|
config: { temperature: 0.7 }
|
|
});
|
|
|
|
batch.responses.forEach((r, i) => {
|
|
console.log(`Query ${i + 1}: ${r.text.slice(0, 50)}...`);
|
|
});
|
|
console.log(`Total time: ${batch.totalLatencyMs}ms`);
|
|
```
|
|
|
|
## CLI Commands
|
|
|
|
```bash
|
|
# Get system information
|
|
ruvllm info
|
|
|
|
# Query the model
|
|
ruvllm query "What is quantum computing?"
|
|
|
|
# Generate text with custom settings
|
|
ruvllm generate "Write a product description for:" --temperature 0.8 --max-tokens 200
|
|
|
|
# Memory operations
|
|
ruvllm memory add "Important fact to remember"
|
|
ruvllm memory search "fact" --k 10
|
|
|
|
# Compare texts
|
|
ruvllm similarity "hello world" "hi there"
|
|
|
|
# Get embeddings
|
|
ruvllm embed "your text here"
|
|
|
|
# Run performance benchmark
|
|
ruvllm benchmark --dims 768 --iterations 5000
|
|
|
|
# View statistics
|
|
ruvllm stats --json
|
|
```
|
|
|
|
## Benchmarks
|
|
|
|
*Benchmarked in Docker (node:20-alpine, x64) - December 2024*
|
|
|
|
### Core Operations
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| Query (short) | 1.49μs | **670K ops/s** |
|
|
| Query (long) | 874ns | **1.14M ops/s** |
|
|
| Generate | 88ns | **11.4M ops/s** |
|
|
| Route | 92ns | **10.9M ops/s** |
|
|
| Embed (256d) | 10.6μs | **94K ops/s** |
|
|
| Embed (768d) | 7.1μs | **140K ops/s** |
|
|
|
|
### SIMD Vector Operations
|
|
|
|
| Operation | 128d | 256d | 512d | 768d |
|
|
|-----------|------|------|------|------|
|
|
| Dot Product | 214ns / **4.67M ops/s** | 318ns / **3.15M ops/s** | 609ns / **1.64M ops/s** | 908ns / **1.10M ops/s** |
|
|
| Cosine Similarity | 233ns / **4.30M ops/s** | 335ns / **2.99M ops/s** | 652ns / **1.53M ops/s** | 972ns / **1.03M ops/s** |
|
|
| L2 Distance | 195ns / **5.14M ops/s** | 315ns / **3.18M ops/s** | 612ns / **1.63M ops/s** | 929ns / **1.08M ops/s** |
|
|
|
|
### LoRA Adapter Performance
|
|
|
|
| Operation | 64d | 128d | 256d |
|
|
|-----------|-----|------|------|
|
|
| Forward (r=4) | 6.09μs / **164K ops/s** | 2.74μs / **365K ops/s** | 4.83μs / **207K ops/s** |
|
|
| Forward (r=8) | 2.17μs / **462K ops/s** | 4.30μs / **233K ops/s** | 8.99μs / **111K ops/s** |
|
|
| Forward (r=16) | 4.85μs / **206K ops/s** | 9.05μs / **111K ops/s** | 18.3μs / **55K ops/s** |
|
|
| Backward (r=8) | - | 110μs / **9.1K ops/s** | - |
|
|
| Batch (100) | - | 467μs / **2.1K ops/s** | - |
|
|
|
|
### Memory Operations
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| Add Memory | 5.3μs | **189K ops/s** |
|
|
| Search (k=5) | 45.6μs | **21.9K ops/s** |
|
|
| Search (k=10) | 28.3μs | **35.3K ops/s** |
|
|
| Search (k=20) | 33.1μs | **30.2K ops/s** |
|
|
|
|
### SONA Learning System
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| Pattern Store | 14.4μs | **69.5K ops/s** |
|
|
| Pattern Find Similar | 224μs | **4.5K ops/s** |
|
|
| EWC Register Task | 6.5μs | **154K ops/s** |
|
|
| EWC Compute Penalty | 501μs | **2.0K ops/s** |
|
|
| Trajectory Build | 1.24μs | **807K ops/s** |
|
|
|
|
### Federated Learning
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| Agent Create | 7.8μs | **128K ops/s** |
|
|
| Process Task | 7.9μs | **126K ops/s** |
|
|
| Apply LoRA | 12.6μs | **79.6K ops/s** |
|
|
| Export State | 48.9μs | **20.4K ops/s** |
|
|
| Aggregate | 5.26ms | **190 ops/s** |
|
|
|
|
### Session & Streaming
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| Session Create | 1.45μs | **690K ops/s** |
|
|
| Session Chat | 3.28μs | **305K ops/s** |
|
|
| Session Export | 3.91ms | **255 ops/s** |
|
|
| Session Import | 1.60ms | **625 ops/s** |
|
|
|
|
### Training Pipeline
|
|
|
|
| Operation | Time |
|
|
|-----------|------|
|
|
| Pipeline Create | 70.6μs |
|
|
| Add Data (100 samples) | 70.6μs |
|
|
| Train (32 samples, 3 epochs) | 1.33s |
|
|
|
|
### Export/Import
|
|
|
|
| Operation | Time | Throughput |
|
|
|-----------|------|------------|
|
|
| SafeTensors Write | 67.3μs | **14.9K ops/s** |
|
|
| SafeTensors Read | 102μs | **9.8K ops/s** |
|
|
| LoRA to JSON | 87.9μs | **11.4K ops/s** |
|
|
| LoRA from JSON | 86.0μs | **11.6K ops/s** |
|
|
|
|
### Performance Highlights
|
|
|
|
- **Fastest**: Generate at **11.4M ops/s**, Route at **10.9M ops/s**
|
|
- **Vector Ops**: Up to **5.14M ops/s** for L2 distance (128d)
|
|
- **LoRA Forward**: Up to **462K ops/s** (64d, rank-8)
|
|
- **Memory Search**: **35K ops/s** (k=10)
|
|
- **Session Create**: **690K ops/s**
|
|
|
|
## Configuration
|
|
|
|
```typescript
|
|
const llm = new RuvLLM({
|
|
// Embedding settings
|
|
embeddingDim: 768, // Vector dimensions (384, 768, 1024)
|
|
|
|
// Memory settings
|
|
hnswM: 16, // Graph connectivity (higher = better recall, more memory)
|
|
hnswEfConstruction: 100, // Build quality (higher = better index, slower build)
|
|
hnswEfSearch: 64, // Search quality (higher = better recall, slower search)
|
|
|
|
// Learning settings
|
|
learningEnabled: true, // Enable adaptive learning
|
|
qualityThreshold: 0.7, // Min confidence to skip learning
|
|
ewcLambda: 2000, // Memory protection strength
|
|
|
|
// Router settings
|
|
routerHiddenDim: 128, // Router network size
|
|
});
|
|
```
|
|
|
|
## Platform Support
|
|
|
|
Native acceleration available on:
|
|
|
|
| Platform | Architecture | SIMD Support |
|
|
|----------|-------------|--------------|
|
|
| macOS | Apple Silicon (M1/M2/M3) | NEON |
|
|
| macOS | Intel x64 | AVX2, SSE4.1 |
|
|
| Linux | x64 | AVX2, AVX-512, SSE4.1 |
|
|
| Linux | ARM64 | NEON |
|
|
| Windows | x64 | AVX2, SSE4.1 |
|
|
|
|
Falls back to optimized JavaScript on unsupported platforms.
|
|
|
|
## Real-World Use Cases
|
|
|
|
### Customer Support Bot
|
|
```typescript
|
|
// Store FAQ and policies
|
|
faqs.forEach(faq => llm.addMemory(faq.answer, { question: faq.question }));
|
|
|
|
// Answer questions with context
|
|
function answerQuestion(question: string) {
|
|
const context = llm.searchMemory(question, 3);
|
|
const prompt = `Context:\n${context.map(c => c.content).join('\n')}\n\nQuestion: ${question}`;
|
|
return llm.query(prompt);
|
|
}
|
|
```
|
|
|
|
### Document Search
|
|
```typescript
|
|
// Index documents
|
|
documents.forEach(doc => {
|
|
llm.addMemory(doc.content, {
|
|
title: doc.title,
|
|
path: doc.path
|
|
});
|
|
});
|
|
|
|
// Semantic search
|
|
const results = llm.searchMemory('quarterly revenue growth', 10);
|
|
```
|
|
|
|
### Personalized Recommendations
|
|
```typescript
|
|
// Learn from user interactions
|
|
function recordInteraction(userId: string, itemId: string, rating: number) {
|
|
const response = llm.query(`User ${userId} rated ${itemId}`);
|
|
llm.feedback({ requestId: response.requestId, rating });
|
|
}
|
|
|
|
// Get recommendations
|
|
function recommend(userId: string) {
|
|
return llm.searchMemory(`preferences for user ${userId}`, 10);
|
|
}
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### RuvLLM Class
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| `query(text, config?)` | Query with automatic model routing |
|
|
| `generate(prompt, config?)` | Generate text with given prompt |
|
|
| `route(text)` | Get routing decision without executing |
|
|
| `addMemory(content, metadata?)` | Store content in vector memory |
|
|
| `searchMemory(text, k?)` | Find similar content (default k=10) |
|
|
| `feedback(fb)` | Submit feedback for learning |
|
|
| `embed(text)` | Get embedding vector for text |
|
|
| `similarity(t1, t2)` | Compute similarity between texts |
|
|
| `stats()` | Get engine statistics |
|
|
| `forceLearn()` | Trigger immediate learning cycle |
|
|
|
|
### SimdOps Class
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| `dotProduct(a, b)` | Vector dot product |
|
|
| `cosineSimilarity(a, b)` | Cosine similarity (0-1) |
|
|
| `l2Distance(a, b)` | Euclidean distance |
|
|
| `normalize(v)` | Normalize to unit length |
|
|
| `softmax(v)` | Softmax activation |
|
|
| `relu(v)` | ReLU activation |
|
|
| `gelu(v)` | GELU activation |
|
|
| `layerNorm(v, eps?)` | Layer normalization |
|
|
| `matvec(m, v)` | Matrix-vector multiply |
|
|
|
|
## Troubleshooting
|
|
|
|
**Q: Native module not loading?**
|
|
```bash
|
|
ruvllm info # Check if native is loaded
|
|
```
|
|
If "Native: Fallback", install platform-specific package manually:
|
|
```bash
|
|
npm install @ruvector/ruvllm-darwin-arm64 # For Apple Silicon
|
|
```
|
|
|
|
**Q: Memory usage too high?**
|
|
Reduce HNSW parameters:
|
|
```typescript
|
|
const llm = new RuvLLM({ hnswM: 8, hnswEfConstruction: 50 });
|
|
```
|
|
|
|
**Q: Learning not improving results?**
|
|
Check that feedback is being processed:
|
|
```typescript
|
|
const stats = llm.stats();
|
|
console.log(`Patterns learned: ${stats.patternsLearned}`);
|
|
```
|
|
|
|
## License
|
|
|
|
MIT OR Apache-2.0
|
|
|
|
## Links
|
|
|
|
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
|
- [Documentation](https://github.com/ruvnet/ruvector/tree/main/examples/ruvLLM)
|
|
- [Issue Tracker](https://github.com/ruvnet/ruvector/issues)
|