SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving Paper • 2604.19157 • Published 30 days ago • 1
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work Paper • 2604.18555 • Published about 1 month ago • 1
Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem Paper • 2605.00834 • Published 13 days ago • 1
Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions Paper • 2604.23418 • Published 26 days ago • 1
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval Paper • 2502.11276 • Published Feb 16, 2025 • 1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World Paper • 2603.19223 • Published Mar 19 • 33
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments Paper • 2604.19528 • Published 21 days ago • 1
PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective Paper • 2505.21799 • Published Feb 5 • 1
Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers Paper • 2605.07297 • Published 13 days ago • 1
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 14 days ago • 185
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion Paper • 2605.07013 • Published 14 days ago • 2
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 21 days ago • 51
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers Paper • 2410.08304 • Published Oct 10, 2024 • 1