tasq/node_modules/@ruvector/ruvllm/README.md

12 KiB

@ruvector/ruvllm

Build AI that learns and improves from every interaction.

RuvLLM is a self-learning language model toolkit that gets smarter over time. Unlike traditional LLMs that remain static after training, RuvLLM continuously adapts to your use case while remembering what it learned before.

What Makes RuvLLM Different?

Traditional LLMs forget old knowledge when learning new things (called "catastrophic forgetting"). RuvLLM solves this with three key innovations:

  1. It Learns Without Forgetting - Uses tiny parameter updates (LoRA) and memory protection (EWC++) to learn new patterns while preserving existing knowledge

  2. It Remembers Context - Built-in vector memory stores and retrieves relevant information instantly using similarity search

  3. It Routes Intelligently - Automatically selects the right model size and parameters based on query complexity, saving resources on simple tasks

Key Features

Feature What It Does Why It Matters
Adaptive Learning Learns from user feedback in real-time Improves accuracy over time without retraining
Memory System Stores context with instant similarity search Finds relevant information in microseconds
Smart Routing Picks optimal model/settings per query Reduces costs, improves response quality
SIMD Acceleration Uses CPU vector instructions (AVX2/NEON) 10-50x faster vector operations
Federated Learning Train across devices without sharing data Privacy-preserving distributed learning
LoRA Adapters Parameter-efficient fine-tuning with low-rank matrices Fast adaptation with minimal memory
EWC++ Protection Elastic Weight Consolidation prevents forgetting Learn new tasks without losing old knowledge
SafeTensors Export HuggingFace-compatible model serialization Share models with the ML ecosystem
Training Pipeline Full training infrastructure with schedulers Production-ready model training
Session Management Stateful conversations with streaming Build chat applications easily

Installation

npm install @ruvector/ruvllm

Or run directly:

npx @ruvector/ruvllm info

Quick Start Tutorial

1. Basic Query

import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM();

// Ask a question - routing happens automatically
const response = llm.query('Explain neural networks simply');
console.log(response.text);
// Output: "Neural networks are computing systems inspired by..."

console.log(`Used model: ${response.model}`);
console.log(`Confidence: ${(response.confidence * 100).toFixed(1)}%`);

2. Teaching the System

// Query and get a response
const response = llm.query('What is the capital of France?');

// Provide feedback - the system learns from this
llm.feedback({
  requestId: response.requestId,
  rating: 5,  // 1-5 scale
  correction: 'Paris is the capital and largest city of France'
});

// Future similar queries will be more accurate

3. Using Memory

// Store important context
llm.addMemory('Company policy: All returns accepted within 30 days', {
  category: 'policy',
  department: 'customer-service'
});

llm.addMemory('Product X launched in March 2024 with features A, B, C', {
  category: 'product',
  name: 'Product X'
});

// Search memory for relevant context
const results = llm.searchMemory('return policy', 5);
console.log(results[0].content);
// Output: "Company policy: All returns accepted within 30 days"
console.log(`Relevance: ${(results[0].score * 100).toFixed(1)}%`);

4. Computing Similarity

import { SimdOps } from '@ruvector/ruvllm';

const simd = new SimdOps();

// Compare two texts
const score = llm.similarity(
  'How do I reset my password?',
  'I forgot my login credentials'
);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);
// Output: "Similarity: 78.3%"

// Fast vector operations
const embedding1 = llm.embed('machine learning');
const embedding2 = llm.embed('deep learning');
const similarity = simd.cosineSimilarity(embedding1, embedding2);

5. Batch Processing

// Process multiple queries efficiently
const batch = llm.batchQuery({
  queries: [
    'What is AI?',
    'Explain machine learning',
    'How do neural networks work?'
  ],
  config: { temperature: 0.7 }
});

batch.responses.forEach((r, i) => {
  console.log(`Query ${i + 1}: ${r.text.slice(0, 50)}...`);
});
console.log(`Total time: ${batch.totalLatencyMs}ms`);

CLI Commands

# Get system information
ruvllm info

# Query the model
ruvllm query "What is quantum computing?"

# Generate text with custom settings
ruvllm generate "Write a product description for:" --temperature 0.8 --max-tokens 200

# Memory operations
ruvllm memory add "Important fact to remember"
ruvllm memory search "fact" --k 10

# Compare texts
ruvllm similarity "hello world" "hi there"

# Get embeddings
ruvllm embed "your text here"

# Run performance benchmark
ruvllm benchmark --dims 768 --iterations 5000

# View statistics
ruvllm stats --json

Benchmarks

Benchmarked in Docker (node:20-alpine, x64) - December 2024

Core Operations

Operation Time Throughput
Query (short) 1.49μs 670K ops/s
Query (long) 874ns 1.14M ops/s
Generate 88ns 11.4M ops/s
Route 92ns 10.9M ops/s
Embed (256d) 10.6μs 94K ops/s
Embed (768d) 7.1μs 140K ops/s

SIMD Vector Operations

Operation 128d 256d 512d 768d
Dot Product 214ns / 4.67M ops/s 318ns / 3.15M ops/s 609ns / 1.64M ops/s 908ns / 1.10M ops/s
Cosine Similarity 233ns / 4.30M ops/s 335ns / 2.99M ops/s 652ns / 1.53M ops/s 972ns / 1.03M ops/s
L2 Distance 195ns / 5.14M ops/s 315ns / 3.18M ops/s 612ns / 1.63M ops/s 929ns / 1.08M ops/s

LoRA Adapter Performance

Operation 64d 128d 256d
Forward (r=4) 6.09μs / 164K ops/s 2.74μs / 365K ops/s 4.83μs / 207K ops/s
Forward (r=8) 2.17μs / 462K ops/s 4.30μs / 233K ops/s 8.99μs / 111K ops/s
Forward (r=16) 4.85μs / 206K ops/s 9.05μs / 111K ops/s 18.3μs / 55K ops/s
Backward (r=8) - 110μs / 9.1K ops/s -
Batch (100) - 467μs / 2.1K ops/s -

Memory Operations

Operation Time Throughput
Add Memory 5.3μs 189K ops/s
Search (k=5) 45.6μs 21.9K ops/s
Search (k=10) 28.3μs 35.3K ops/s
Search (k=20) 33.1μs 30.2K ops/s

SONA Learning System

Operation Time Throughput
Pattern Store 14.4μs 69.5K ops/s
Pattern Find Similar 224μs 4.5K ops/s
EWC Register Task 6.5μs 154K ops/s
EWC Compute Penalty 501μs 2.0K ops/s
Trajectory Build 1.24μs 807K ops/s

Federated Learning

Operation Time Throughput
Agent Create 7.8μs 128K ops/s
Process Task 7.9μs 126K ops/s
Apply LoRA 12.6μs 79.6K ops/s
Export State 48.9μs 20.4K ops/s
Aggregate 5.26ms 190 ops/s

Session & Streaming

Operation Time Throughput
Session Create 1.45μs 690K ops/s
Session Chat 3.28μs 305K ops/s
Session Export 3.91ms 255 ops/s
Session Import 1.60ms 625 ops/s

Training Pipeline

Operation Time
Pipeline Create 70.6μs
Add Data (100 samples) 70.6μs
Train (32 samples, 3 epochs) 1.33s

Export/Import

Operation Time Throughput
SafeTensors Write 67.3μs 14.9K ops/s
SafeTensors Read 102μs 9.8K ops/s
LoRA to JSON 87.9μs 11.4K ops/s
LoRA from JSON 86.0μs 11.6K ops/s

Performance Highlights

  • Fastest: Generate at 11.4M ops/s, Route at 10.9M ops/s
  • Vector Ops: Up to 5.14M ops/s for L2 distance (128d)
  • LoRA Forward: Up to 462K ops/s (64d, rank-8)
  • Memory Search: 35K ops/s (k=10)
  • Session Create: 690K ops/s

Configuration

const llm = new RuvLLM({
  // Embedding settings
  embeddingDim: 768,        // Vector dimensions (384, 768, 1024)

  // Memory settings
  hnswM: 16,                // Graph connectivity (higher = better recall, more memory)
  hnswEfConstruction: 100,  // Build quality (higher = better index, slower build)
  hnswEfSearch: 64,         // Search quality (higher = better recall, slower search)

  // Learning settings
  learningEnabled: true,    // Enable adaptive learning
  qualityThreshold: 0.7,    // Min confidence to skip learning
  ewcLambda: 2000,          // Memory protection strength

  // Router settings
  routerHiddenDim: 128,     // Router network size
});

Platform Support

Native acceleration available on:

Platform Architecture SIMD Support
macOS Apple Silicon (M1/M2/M3) NEON
macOS Intel x64 AVX2, SSE4.1
Linux x64 AVX2, AVX-512, SSE4.1
Linux ARM64 NEON
Windows x64 AVX2, SSE4.1

Falls back to optimized JavaScript on unsupported platforms.

Real-World Use Cases

Customer Support Bot

// Store FAQ and policies
faqs.forEach(faq => llm.addMemory(faq.answer, { question: faq.question }));

// Answer questions with context
function answerQuestion(question: string) {
  const context = llm.searchMemory(question, 3);
  const prompt = `Context:\n${context.map(c => c.content).join('\n')}\n\nQuestion: ${question}`;
  return llm.query(prompt);
}
// Index documents
documents.forEach(doc => {
  llm.addMemory(doc.content, {
    title: doc.title,
    path: doc.path
  });
});

// Semantic search
const results = llm.searchMemory('quarterly revenue growth', 10);

Personalized Recommendations

// Learn from user interactions
function recordInteraction(userId: string, itemId: string, rating: number) {
  const response = llm.query(`User ${userId} rated ${itemId}`);
  llm.feedback({ requestId: response.requestId, rating });
}

// Get recommendations
function recommend(userId: string) {
  return llm.searchMemory(`preferences for user ${userId}`, 10);
}

API Reference

RuvLLM Class

Method Description
query(text, config?) Query with automatic model routing
generate(prompt, config?) Generate text with given prompt
route(text) Get routing decision without executing
addMemory(content, metadata?) Store content in vector memory
searchMemory(text, k?) Find similar content (default k=10)
feedback(fb) Submit feedback for learning
embed(text) Get embedding vector for text
similarity(t1, t2) Compute similarity between texts
stats() Get engine statistics
forceLearn() Trigger immediate learning cycle

SimdOps Class

Method Description
dotProduct(a, b) Vector dot product
cosineSimilarity(a, b) Cosine similarity (0-1)
l2Distance(a, b) Euclidean distance
normalize(v) Normalize to unit length
softmax(v) Softmax activation
relu(v) ReLU activation
gelu(v) GELU activation
layerNorm(v, eps?) Layer normalization
matvec(m, v) Matrix-vector multiply

Troubleshooting

Q: Native module not loading?

ruvllm info  # Check if native is loaded

If "Native: Fallback", install platform-specific package manually:

npm install @ruvector/ruvllm-darwin-arm64  # For Apple Silicon

Q: Memory usage too high? Reduce HNSW parameters:

const llm = new RuvLLM({ hnswM: 8, hnswEfConstruction: 50 });

Q: Learning not improving results? Check that feedback is being processed:

const stats = llm.stats();
console.log(`Patterns learned: ${stats.patternsLearned}`);

License

MIT OR Apache-2.0