Papers
updated
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
KlingAvatar 2.0 Technical Report
Paper
• 2512.13313
• Published • 44
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published • 94
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published • 222
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
• 2512.19693
• Published • 67
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published • 56
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper
• 2512.19673
• Published • 66
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published • 14
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper
• 2601.10332
• Published • 31
Self-Refining Video Sampling
Paper
• 2601.18577
• Published • 25
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
• 2602.02493
• Published • 46
LatentMem: Customizing Latent Memory for Multi-Agent Systems
Paper
• 2602.03036
• Published • 14
Reinforcement World Model Learning for LLM-based Agents
Paper
• 2602.05842
• Published • 27
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
Paper
• 2602.21548
• Published • 49
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper
• 2603.03205
• Published • 13
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling
Paper
• 2603.04791
• Published • 18
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper
• 2602.23166
• Published • 45
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Paper
• 2603.04257
• Published • 19
Dynamic Chunking Diffusion Transformer
Paper
• 2603.06351
• Published • 15
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
Paper
• 2603.09877
• Published • 47
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper
• 2603.13594
• Published • 147
LoST: Level of Semantics Tokenization for 3D Shapes
Paper
• 2603.17995
• Published • 31
GPA: Learning GUI Process Automation from Demonstrations
Paper
• 2604.01676
• Published • 16
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
Paper
• 2604.02190
• Published • 25
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
Paper
• 2603.29620
• Published • 46