ruv commited on
Commit
1452d87
·
verified ·
1 Parent(s): 70d1236

Add TurboQuant compatibility, v2.1.0 ecosystem tags

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -16,6 +16,17 @@ tags:
16
  - llama-cpp
17
  - text-generation-inference
18
  - first-of-its-kind
 
 
 
 
 
 
 
 
 
 
 
19
  pipeline_tag: text-generation
20
  model-index:
21
  - name: ruvltra-claude-code
@@ -415,3 +426,48 @@ Apache 2.0 - Free for commercial and personal use.
415
  *The future of AI-assisted development is self-learning.*
416
 
417
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - llama-cpp
17
  - text-generation-inference
18
  - first-of-its-kind
19
+ - turboquant
20
+ - kv-cache-compression
21
+ - flash-attention
22
+ - speculative-decoding
23
+ - graph-rag
24
+ - hybrid-search
25
+ - vector-database
26
+ - ruvector
27
+ - diskann
28
+ - mamba-ssm
29
+ - colbert
30
  pipeline_tag: text-generation
31
  model-index:
32
  - name: ruvltra-claude-code
 
426
  *The future of AI-assisted development is self-learning.*
427
 
428
  </div>
429
+
430
+
431
+ ---
432
+
433
+ ## âš¡ TurboQuant KV-Cache Compression
434
+
435
+ RuvLTRA models are fully compatible with **TurboQuant** — 2-4 bit KV-cache quantization that reduces inference memory by 6-8x with <0.5% quality loss.
436
+
437
+ | Quantization | Compression | Quality Loss | Best For |
438
+ |-------------|-------------|--------------|----------|
439
+ | 3-bit | 10.7x | <1% | **Recommended** — best balance |
440
+ | 4-bit | 8x | <0.5% | High quality, long context |
441
+ | 2-bit | 32x | ~2% | Edge devices, max savings |
442
+
443
+ ### Usage with RuvLLM
444
+
445
+ ```bash
446
+ cargo add ruvllm # Rust
447
+ npm install @ruvector/ruvllm # Node.js
448
+ ```
449
+
450
+ ```rust
451
+ use ruvllm::quantize::turbo_quant::{TurboQuantCompressor, TurboQuantConfig, TurboQuantBits};
452
+
453
+ let config = TurboQuantConfig {
454
+ bits: TurboQuantBits::Bit3_5, // 10.7x compression
455
+ use_qjl: true,
456
+ ..Default::default()
457
+ };
458
+ let compressor = TurboQuantCompressor::new(config)?;
459
+ let compressed = compressor.compress_batch(&kv_vectors)?;
460
+ let scores = compressor.inner_product_batch_optimized(&query, &compressed)?;
461
+ ```
462
+
463
+ ### v2.1.0 Ecosystem
464
+
465
+ - **Hybrid Search** — Sparse + dense vectors with RRF fusion (20-49% better retrieval)
466
+ - **Graph RAG** — Knowledge graph + community detection for multi-hop queries
467
+ - **DiskANN** — Billion-scale SSD-backed ANN with <10ms latency
468
+ - **FlashAttention-3** — IO-aware tiled attention, O(N) memory
469
+ - **MLA** — Multi-Head Latent Attention (~93% KV-cache compression)
470
+ - **Mamba SSM** — Linear-time selective state space models
471
+ - **Speculative Decoding** — 2-3x generation speedup
472
+
473
+ [RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm)