Marc Rejohn Castillano 5cb6561924 added ruflo

2026-04-09 19:01:53 +08:00

12 KiB

Raw Blame History

@ruvector/ruvllm

Build AI that learns and improves from every interaction.

RuvLLM is a self-learning language model toolkit that gets smarter over time. Unlike traditional LLMs that remain static after training, RuvLLM continuously adapts to your use case while remembering what it learned before.

What Makes RuvLLM Different?

Traditional LLMs forget old knowledge when learning new things (called "catastrophic forgetting"). RuvLLM solves this with three key innovations:

It Learns Without Forgetting - Uses tiny parameter updates (LoRA) and memory protection (EWC++) to learn new patterns while preserving existing knowledge
It Remembers Context - Built-in vector memory stores and retrieves relevant information instantly using similarity search
It Routes Intelligently - Automatically selects the right model size and parameters based on query complexity, saving resources on simple tasks

Key Features

Feature	What It Does	Why It Matters
Adaptive Learning	Learns from user feedback in real-time	Improves accuracy over time without retraining
Memory System	Stores context with instant similarity search	Finds relevant information in microseconds
Smart Routing	Picks optimal model/settings per query	Reduces costs, improves response quality
SIMD Acceleration	Uses CPU vector instructions (AVX2/NEON)	10-50x faster vector operations
Federated Learning	Train across devices without sharing data	Privacy-preserving distributed learning
LoRA Adapters	Parameter-efficient fine-tuning with low-rank matrices	Fast adaptation with minimal memory
EWC++ Protection	Elastic Weight Consolidation prevents forgetting	Learn new tasks without losing old knowledge
SafeTensors Export	HuggingFace-compatible model serialization	Share models with the ML ecosystem
Training Pipeline	Full training infrastructure with schedulers	Production-ready model training
Session Management	Stateful conversations with streaming	Build chat applications easily

Installation

npm install @ruvector/ruvllm

Or run directly:

npx @ruvector/ruvllm info

Quick Start Tutorial

1. Basic Query

import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM();

// Ask a question - routing happens automatically
const response = llm.query('Explain neural networks simply');
console.log(response.text);
// Output: "Neural networks are computing systems inspired by..."

console.log(`Used model: ${response.model}`);
console.log(`Confidence: ${(response.confidence * 100).toFixed(1)}%`);

2. Teaching the System

// Query and get a response
const response = llm.query('What is the capital of France?');

// Provide feedback - the system learns from this
llm.feedback({
  requestId: response.requestId,
  rating: 5,  // 1-5 scale
  correction: 'Paris is the capital and largest city of France'
});

// Future similar queries will be more accurate

3. Using Memory

// Store important context
llm.addMemory('Company policy: All returns accepted within 30 days', {
  category: 'policy',
  department: 'customer-service'
});

llm.addMemory('Product X launched in March 2024 with features A, B, C', {
  category: 'product',
  name: 'Product X'
});

// Search memory for relevant context
const results = llm.searchMemory('return policy', 5);
console.log(results[0].content);
// Output: "Company policy: All returns accepted within 30 days"
console.log(`Relevance: ${(results[0].score * 100).toFixed(1)}%`);

4. Computing Similarity

import { SimdOps } from '@ruvector/ruvllm';

const simd = new SimdOps();

// Compare two texts
const score = llm.similarity(
  'How do I reset my password?',
  'I forgot my login credentials'
);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);
// Output: "Similarity: 78.3%"

// Fast vector operations
const embedding1 = llm.embed('machine learning');
const embedding2 = llm.embed('deep learning');
const similarity = simd.cosineSimilarity(embedding1, embedding2);

5. Batch Processing

// Process multiple queries efficiently
const batch = llm.batchQuery({
  queries: [
    'What is AI?',
    'Explain machine learning',
    'How do neural networks work?'
  ],
  config: { temperature: 0.7 }
});

batch.responses.forEach((r, i) => {
  console.log(`Query ${i + 1}: ${r.text.slice(0, 50)}...`);
});
console.log(`Total time: ${batch.totalLatencyMs}ms`);

CLI Commands

# Get system information
ruvllm info

# Query the model
ruvllm query "What is quantum computing?"

# Generate text with custom settings
ruvllm generate "Write a product description for:" --temperature 0.8 --max-tokens 200

# Memory operations
ruvllm memory add "Important fact to remember"
ruvllm memory search "fact" --k 10

# Compare texts
ruvllm similarity "hello world" "hi there"

# Get embeddings
ruvllm embed "your text here"

# Run performance benchmark
ruvllm benchmark --dims 768 --iterations 5000

# View statistics
ruvllm stats --json

Benchmarks

Benchmarked in Docker (node:20-alpine, x64) - December 2024

Core Operations

Operation	Time	Throughput
Query (short)	1.49μs	670K ops/s
Query (long)	874ns	1.14M ops/s
Generate	88ns	11.4M ops/s
Route	92ns	10.9M ops/s
Embed (256d)	10.6μs	94K ops/s
Embed (768d)	7.1μs	140K ops/s

SIMD Vector Operations

Operation	128d	256d	512d	768d
Dot Product	214ns / 4.67M ops/s	318ns / 3.15M ops/s	609ns / 1.64M ops/s	908ns / 1.10M ops/s
Cosine Similarity	233ns / 4.30M ops/s	335ns / 2.99M ops/s	652ns / 1.53M ops/s	972ns / 1.03M ops/s
L2 Distance	195ns / 5.14M ops/s	315ns / 3.18M ops/s	612ns / 1.63M ops/s	929ns / 1.08M ops/s

LoRA Adapter Performance

Operation	64d	128d	256d
Forward (r=4)	6.09μs / 164K ops/s	2.74μs / 365K ops/s	4.83μs / 207K ops/s
Forward (r=8)	2.17μs / 462K ops/s	4.30μs / 233K ops/s	8.99μs / 111K ops/s
Forward (r=16)	4.85μs / 206K ops/s	9.05μs / 111K ops/s	18.3μs / 55K ops/s
Backward (r=8)	-	110μs / 9.1K ops/s	-
Batch (100)	-	467μs / 2.1K ops/s	-

Memory Operations

Operation	Time	Throughput
Add Memory	5.3μs	189K ops/s
Search (k=5)	45.6μs	21.9K ops/s
Search (k=10)	28.3μs	35.3K ops/s
Search (k=20)	33.1μs	30.2K ops/s

SONA Learning System

Operation	Time	Throughput
Pattern Store	14.4μs	69.5K ops/s
Pattern Find Similar	224μs	4.5K ops/s
EWC Register Task	6.5μs	154K ops/s
EWC Compute Penalty	501μs	2.0K ops/s
Trajectory Build	1.24μs	807K ops/s

Federated Learning

Operation	Time	Throughput
Agent Create	7.8μs	128K ops/s
Process Task	7.9μs	126K ops/s
Apply LoRA	12.6μs	79.6K ops/s
Export State	48.9μs	20.4K ops/s
Aggregate	5.26ms	190 ops/s

Session & Streaming

Operation	Time	Throughput
Session Create	1.45μs	690K ops/s
Session Chat	3.28μs	305K ops/s
Session Export	3.91ms	255 ops/s
Session Import	1.60ms	625 ops/s

Training Pipeline

Operation	Time
Pipeline Create	70.6μs
Add Data (100 samples)	70.6μs
Train (32 samples, 3 epochs)	1.33s

Export/Import

Operation	Time	Throughput
SafeTensors Write	67.3μs	14.9K ops/s
SafeTensors Read	102μs	9.8K ops/s
LoRA to JSON	87.9μs	11.4K ops/s
LoRA from JSON	86.0μs	11.6K ops/s

Performance Highlights

Fastest: Generate at 11.4M ops/s, Route at 10.9M ops/s
Vector Ops: Up to 5.14M ops/s for L2 distance (128d)
LoRA Forward: Up to 462K ops/s (64d, rank-8)
Memory Search: 35K ops/s (k=10)
Session Create: 690K ops/s

Configuration

const llm = new RuvLLM({
  // Embedding settings
  embeddingDim: 768,        // Vector dimensions (384, 768, 1024)

  // Memory settings
  hnswM: 16,                // Graph connectivity (higher = better recall, more memory)
  hnswEfConstruction: 100,  // Build quality (higher = better index, slower build)
  hnswEfSearch: 64,         // Search quality (higher = better recall, slower search)

  // Learning settings
  learningEnabled: true,    // Enable adaptive learning
  qualityThreshold: 0.7,    // Min confidence to skip learning
  ewcLambda: 2000,          // Memory protection strength

  // Router settings
  routerHiddenDim: 128,     // Router network size
});

Platform Support

Native acceleration available on:

Platform	Architecture	SIMD Support
macOS	Apple Silicon (M1/M2/M3)	NEON
macOS	Intel x64	AVX2, SSE4.1
Linux	x64	AVX2, AVX-512, SSE4.1
Linux	ARM64	NEON
Windows	x64	AVX2, SSE4.1

Falls back to optimized JavaScript on unsupported platforms.

Real-World Use Cases

Customer Support Bot

// Store FAQ and policies
faqs.forEach(faq => llm.addMemory(faq.answer, { question: faq.question }));

// Answer questions with context
function answerQuestion(question: string) {
  const context = llm.searchMemory(question, 3);
  const prompt = `Context:\n${context.map(c => c.content).join('\n')}\n\nQuestion: ${question}`;
  return llm.query(prompt);
}

Document Search

// Index documents
documents.forEach(doc => {
  llm.addMemory(doc.content, {
    title: doc.title,
    path: doc.path
  });
});

// Semantic search
const results = llm.searchMemory('quarterly revenue growth', 10);

Personalized Recommendations

// Learn from user interactions
function recordInteraction(userId: string, itemId: string, rating: number) {
  const response = llm.query(`User ${userId} rated ${itemId}`);
  llm.feedback({ requestId: response.requestId, rating });
}

// Get recommendations
function recommend(userId: string) {
  return llm.searchMemory(`preferences for user ${userId}`, 10);
}

API Reference

RuvLLM Class

Method	Description
`query(text, config?)`	Query with automatic model routing
`generate(prompt, config?)`	Generate text with given prompt
`route(text)`	Get routing decision without executing
`addMemory(content, metadata?)`	Store content in vector memory
`searchMemory(text, k?)`	Find similar content (default k=10)
`feedback(fb)`	Submit feedback for learning
`embed(text)`	Get embedding vector for text
`similarity(t1, t2)`	Compute similarity between texts
`stats()`	Get engine statistics
`forceLearn()`	Trigger immediate learning cycle

SimdOps Class

Method	Description
`dotProduct(a, b)`	Vector dot product
`cosineSimilarity(a, b)`	Cosine similarity (0-1)
`l2Distance(a, b)`	Euclidean distance
`normalize(v)`	Normalize to unit length
`softmax(v)`	Softmax activation
`relu(v)`	ReLU activation
`gelu(v)`	GELU activation
`layerNorm(v, eps?)`	Layer normalization
`matvec(m, v)`	Matrix-vector multiply

Troubleshooting

Q: Native module not loading?

ruvllm info  # Check if native is loaded

If "Native: Fallback", install platform-specific package manually:

npm install @ruvector/ruvllm-darwin-arm64  # For Apple Silicon

Q: Memory usage too high? Reduce HNSW parameters:

const llm = new RuvLLM({ hnswM: 8, hnswEfConstruction: 50 });

Q: Learning not improving results? Check that feedback is being processed:

const stats = llm.stats();
console.log(`Patterns learned: ${stats.patternsLearned}`);

License

MIT OR Apache-2.0

12 KiB Raw Blame History