MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published 7 days ago • 57
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Paper • 2601.23161 • Published 6 days ago • 10
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published 7 days ago • 14
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Paper • 2601.18631 • Published 10 days ago • 47
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published 11 days ago • 25
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs Paper • 2601.17058 • Published 14 days ago • 184
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Paper • 2601.16746 • Published 13 days ago • 89
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper • 2601.16515 • Published 14 days ago • 15
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning Paper • 2506.04034 • Published Jun 4, 2025 • 4
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Paper • 2601.10305 • Published 21 days ago • 36
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published Dec 19, 2025 • 28
MolmoAct Collection All models for the MolmoAct (Multimodal Open Language Model for Action) release. • 10 items • Updated Dec 23, 2025 • 35
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 7 items • Updated about 23 hours ago • 131