Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 10 days ago • 138
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 25 days ago • 87
Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models Paper • 2602.12586 • Published Feb 13 • 2
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Paper • 2507.08800 • Published Jul 11, 2025 • 81
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3, 2025 • 17
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 55
view article Article 🦸🏻#1: Open-endedness and AI Agents – A Path from Generative to Creative AI? Kseniase • Dec 25, 2024 • 16
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published Mar 4, 2025 • 10
Q-Filters Collection Pre-computed Q-Filters for efficient KV cache compression. • 15 items • Updated Mar 3, 2025 • 7
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations Paper • 2410.18860 • Published Oct 24, 2024 • 11
Analysing the Residual Stream of Language Models Under Knowledge Conflicts Paper • 2410.16090 • Published Oct 21, 2024 • 8
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21, 2024 • 20
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9, 2024 • 42
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra • Aug 21, 2024 • 41
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 136 items • Updated 11 days ago • 119
view article Article Introducing RWKV - An RNN with the advantages of a transformer +2 BlinkDL, Hazzzardous, sgugger, ybelkada • May 15, 2023 • 25