| .. | ||
| bin | ||
| dist | ||
| package.json | ||
| README.md | ||
@ruvector/ruvllm
Build AI that learns and improves from every interaction.
RuvLLM is a self-learning language model toolkit that gets smarter over time. Unlike traditional LLMs that remain static after training, RuvLLM continuously adapts to your use case while remembering what it learned before.
What Makes RuvLLM Different?
Traditional LLMs forget old knowledge when learning new things (called "catastrophic forgetting"). RuvLLM solves this with three key innovations:
-
It Learns Without Forgetting - Uses tiny parameter updates (LoRA) and memory protection (EWC++) to learn new patterns while preserving existing knowledge
-
It Remembers Context - Built-in vector memory stores and retrieves relevant information instantly using similarity search
-
It Routes Intelligently - Automatically selects the right model size and parameters based on query complexity, saving resources on simple tasks
Key Features
| Feature | What It Does | Why It Matters |
|---|---|---|
| Adaptive Learning | Learns from user feedback in real-time | Improves accuracy over time without retraining |
| Memory System | Stores context with instant similarity search | Finds relevant information in microseconds |
| Smart Routing | Picks optimal model/settings per query | Reduces costs, improves response quality |
| SIMD Acceleration | Uses CPU vector instructions (AVX2/NEON) | 10-50x faster vector operations |
| Federated Learning | Train across devices without sharing data | Privacy-preserving distributed learning |
| LoRA Adapters | Parameter-efficient fine-tuning with low-rank matrices | Fast adaptation with minimal memory |
| EWC++ Protection | Elastic Weight Consolidation prevents forgetting | Learn new tasks without losing old knowledge |
| SafeTensors Export | HuggingFace-compatible model serialization | Share models with the ML ecosystem |
| Training Pipeline | Full training infrastructure with schedulers | Production-ready model training |
| Session Management | Stateful conversations with streaming | Build chat applications easily |
Installation
npm install @ruvector/ruvllm
Or run directly:
npx @ruvector/ruvllm info
Quick Start Tutorial
1. Basic Query
import { RuvLLM } from '@ruvector/ruvllm';
const llm = new RuvLLM();
// Ask a question - routing happens automatically
const response = llm.query('Explain neural networks simply');
console.log(response.text);
// Output: "Neural networks are computing systems inspired by..."
console.log(`Used model: ${response.model}`);
console.log(`Confidence: ${(response.confidence * 100).toFixed(1)}%`);
2. Teaching the System
// Query and get a response
const response = llm.query('What is the capital of France?');
// Provide feedback - the system learns from this
llm.feedback({
requestId: response.requestId,
rating: 5, // 1-5 scale
correction: 'Paris is the capital and largest city of France'
});
// Future similar queries will be more accurate
3. Using Memory
// Store important context
llm.addMemory('Company policy: All returns accepted within 30 days', {
category: 'policy',
department: 'customer-service'
});
llm.addMemory('Product X launched in March 2024 with features A, B, C', {
category: 'product',
name: 'Product X'
});
// Search memory for relevant context
const results = llm.searchMemory('return policy', 5);
console.log(results[0].content);
// Output: "Company policy: All returns accepted within 30 days"
console.log(`Relevance: ${(results[0].score * 100).toFixed(1)}%`);
4. Computing Similarity
import { SimdOps } from '@ruvector/ruvllm';
const simd = new SimdOps();
// Compare two texts
const score = llm.similarity(
'How do I reset my password?',
'I forgot my login credentials'
);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);
// Output: "Similarity: 78.3%"
// Fast vector operations
const embedding1 = llm.embed('machine learning');
const embedding2 = llm.embed('deep learning');
const similarity = simd.cosineSimilarity(embedding1, embedding2);
5. Batch Processing
// Process multiple queries efficiently
const batch = llm.batchQuery({
queries: [
'What is AI?',
'Explain machine learning',
'How do neural networks work?'
],
config: { temperature: 0.7 }
});
batch.responses.forEach((r, i) => {
console.log(`Query ${i + 1}: ${r.text.slice(0, 50)}...`);
});
console.log(`Total time: ${batch.totalLatencyMs}ms`);
CLI Commands
# Get system information
ruvllm info
# Query the model
ruvllm query "What is quantum computing?"
# Generate text with custom settings
ruvllm generate "Write a product description for:" --temperature 0.8 --max-tokens 200
# Memory operations
ruvllm memory add "Important fact to remember"
ruvllm memory search "fact" --k 10
# Compare texts
ruvllm similarity "hello world" "hi there"
# Get embeddings
ruvllm embed "your text here"
# Run performance benchmark
ruvllm benchmark --dims 768 --iterations 5000
# View statistics
ruvllm stats --json
Benchmarks
Benchmarked in Docker (node:20-alpine, x64) - December 2024
Core Operations
| Operation | Time | Throughput |
|---|---|---|
| Query (short) | 1.49μs | 670K ops/s |
| Query (long) | 874ns | 1.14M ops/s |
| Generate | 88ns | 11.4M ops/s |
| Route | 92ns | 10.9M ops/s |
| Embed (256d) | 10.6μs | 94K ops/s |
| Embed (768d) | 7.1μs | 140K ops/s |
SIMD Vector Operations
| Operation | 128d | 256d | 512d | 768d |
|---|---|---|---|---|
| Dot Product | 214ns / 4.67M ops/s | 318ns / 3.15M ops/s | 609ns / 1.64M ops/s | 908ns / 1.10M ops/s |
| Cosine Similarity | 233ns / 4.30M ops/s | 335ns / 2.99M ops/s | 652ns / 1.53M ops/s | 972ns / 1.03M ops/s |
| L2 Distance | 195ns / 5.14M ops/s | 315ns / 3.18M ops/s | 612ns / 1.63M ops/s | 929ns / 1.08M ops/s |
LoRA Adapter Performance
| Operation | 64d | 128d | 256d |
|---|---|---|---|
| Forward (r=4) | 6.09μs / 164K ops/s | 2.74μs / 365K ops/s | 4.83μs / 207K ops/s |
| Forward (r=8) | 2.17μs / 462K ops/s | 4.30μs / 233K ops/s | 8.99μs / 111K ops/s |
| Forward (r=16) | 4.85μs / 206K ops/s | 9.05μs / 111K ops/s | 18.3μs / 55K ops/s |
| Backward (r=8) | - | 110μs / 9.1K ops/s | - |
| Batch (100) | - | 467μs / 2.1K ops/s | - |
Memory Operations
| Operation | Time | Throughput |
|---|---|---|
| Add Memory | 5.3μs | 189K ops/s |
| Search (k=5) | 45.6μs | 21.9K ops/s |
| Search (k=10) | 28.3μs | 35.3K ops/s |
| Search (k=20) | 33.1μs | 30.2K ops/s |
SONA Learning System
| Operation | Time | Throughput |
|---|---|---|
| Pattern Store | 14.4μs | 69.5K ops/s |
| Pattern Find Similar | 224μs | 4.5K ops/s |
| EWC Register Task | 6.5μs | 154K ops/s |
| EWC Compute Penalty | 501μs | 2.0K ops/s |
| Trajectory Build | 1.24μs | 807K ops/s |
Federated Learning
| Operation | Time | Throughput |
|---|---|---|
| Agent Create | 7.8μs | 128K ops/s |
| Process Task | 7.9μs | 126K ops/s |
| Apply LoRA | 12.6μs | 79.6K ops/s |
| Export State | 48.9μs | 20.4K ops/s |
| Aggregate | 5.26ms | 190 ops/s |
Session & Streaming
| Operation | Time | Throughput |
|---|---|---|
| Session Create | 1.45μs | 690K ops/s |
| Session Chat | 3.28μs | 305K ops/s |
| Session Export | 3.91ms | 255 ops/s |
| Session Import | 1.60ms | 625 ops/s |
Training Pipeline
| Operation | Time |
|---|---|
| Pipeline Create | 70.6μs |
| Add Data (100 samples) | 70.6μs |
| Train (32 samples, 3 epochs) | 1.33s |
Export/Import
| Operation | Time | Throughput |
|---|---|---|
| SafeTensors Write | 67.3μs | 14.9K ops/s |
| SafeTensors Read | 102μs | 9.8K ops/s |
| LoRA to JSON | 87.9μs | 11.4K ops/s |
| LoRA from JSON | 86.0μs | 11.6K ops/s |
Performance Highlights
- Fastest: Generate at 11.4M ops/s, Route at 10.9M ops/s
- Vector Ops: Up to 5.14M ops/s for L2 distance (128d)
- LoRA Forward: Up to 462K ops/s (64d, rank-8)
- Memory Search: 35K ops/s (k=10)
- Session Create: 690K ops/s
Configuration
const llm = new RuvLLM({
// Embedding settings
embeddingDim: 768, // Vector dimensions (384, 768, 1024)
// Memory settings
hnswM: 16, // Graph connectivity (higher = better recall, more memory)
hnswEfConstruction: 100, // Build quality (higher = better index, slower build)
hnswEfSearch: 64, // Search quality (higher = better recall, slower search)
// Learning settings
learningEnabled: true, // Enable adaptive learning
qualityThreshold: 0.7, // Min confidence to skip learning
ewcLambda: 2000, // Memory protection strength
// Router settings
routerHiddenDim: 128, // Router network size
});
Platform Support
Native acceleration available on:
| Platform | Architecture | SIMD Support |
|---|---|---|
| macOS | Apple Silicon (M1/M2/M3) | NEON |
| macOS | Intel x64 | AVX2, SSE4.1 |
| Linux | x64 | AVX2, AVX-512, SSE4.1 |
| Linux | ARM64 | NEON |
| Windows | x64 | AVX2, SSE4.1 |
Falls back to optimized JavaScript on unsupported platforms.
Real-World Use Cases
Customer Support Bot
// Store FAQ and policies
faqs.forEach(faq => llm.addMemory(faq.answer, { question: faq.question }));
// Answer questions with context
function answerQuestion(question: string) {
const context = llm.searchMemory(question, 3);
const prompt = `Context:\n${context.map(c => c.content).join('\n')}\n\nQuestion: ${question}`;
return llm.query(prompt);
}
Document Search
// Index documents
documents.forEach(doc => {
llm.addMemory(doc.content, {
title: doc.title,
path: doc.path
});
});
// Semantic search
const results = llm.searchMemory('quarterly revenue growth', 10);
Personalized Recommendations
// Learn from user interactions
function recordInteraction(userId: string, itemId: string, rating: number) {
const response = llm.query(`User ${userId} rated ${itemId}`);
llm.feedback({ requestId: response.requestId, rating });
}
// Get recommendations
function recommend(userId: string) {
return llm.searchMemory(`preferences for user ${userId}`, 10);
}
API Reference
RuvLLM Class
| Method | Description |
|---|---|
query(text, config?) |
Query with automatic model routing |
generate(prompt, config?) |
Generate text with given prompt |
route(text) |
Get routing decision without executing |
addMemory(content, metadata?) |
Store content in vector memory |
searchMemory(text, k?) |
Find similar content (default k=10) |
feedback(fb) |
Submit feedback for learning |
embed(text) |
Get embedding vector for text |
similarity(t1, t2) |
Compute similarity between texts |
stats() |
Get engine statistics |
forceLearn() |
Trigger immediate learning cycle |
SimdOps Class
| Method | Description |
|---|---|
dotProduct(a, b) |
Vector dot product |
cosineSimilarity(a, b) |
Cosine similarity (0-1) |
l2Distance(a, b) |
Euclidean distance |
normalize(v) |
Normalize to unit length |
softmax(v) |
Softmax activation |
relu(v) |
ReLU activation |
gelu(v) |
GELU activation |
layerNorm(v, eps?) |
Layer normalization |
matvec(m, v) |
Matrix-vector multiply |
Troubleshooting
Q: Native module not loading?
ruvllm info # Check if native is loaded
If "Native: Fallback", install platform-specific package manually:
npm install @ruvector/ruvllm-darwin-arm64 # For Apple Silicon
Q: Memory usage too high? Reduce HNSW parameters:
const llm = new RuvLLM({ hnswM: 8, hnswEfConstruction: 50 });
Q: Learning not improving results? Check that feedback is being processed:
const stats = llm.stats();
console.log(`Patterns learned: ${stats.patternsLearned}`);
License
MIT OR Apache-2.0