When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published 1 day ago • 104
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On Paper • 2604.08526 • Published 1 day ago • 11
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Paper • 2604.08476 • Published about 22 hours ago • 3
Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published 1 day ago • 2
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 1 day ago • 12
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 3 days ago • 74
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 1 day ago • 5
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 1 day ago • 137
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published 1 day ago • 30
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills Paper • 2604.05333 • Published 4 days ago • 16
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching Paper • 2604.06757 • Published 3 days ago • 6
Combee: Scaling Prompt Learning for Self-Improving Language Model Agents Paper • 2604.04247 • Published 6 days ago • 24
VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics Paper • 2604.06182 • Published Feb 6 • 1
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling Paper • 2604.06916 • Published 3 days ago • 12
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling Paper • 2604.07209 • Published 3 days ago • 23
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders Paper • 2604.07340 • Published 3 days ago • 11
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 3 days ago • 26