UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper • 2512.03383 • Published Dec 3, 2025 • 4
Performance Prediction for Large Systems via Text-to-Text Regression Paper • 2506.21718 • Published Jun 26, 2025 • 6
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Paper • 2502.20969 • Published Feb 28, 2025 • 11
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 11