LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 13 days ago • 53
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper • 2512.16924 • Published 18 days ago • 25
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 118
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper • 2507.06165 • Published Jul 8, 2025 • 58
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10, 2025 • 33
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published May 26, 2025 • 18
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11, 2025 • 42
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published Apr 8, 2025 • 64
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published Dec 23, 2024 • 24
RoLoRA Collection [EMNLP2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization • 3 items • Updated Sep 26, 2024 • 3