SS4D: Native 4D Generative Model via Structured Spacetime Latents Paper • 2512.14284 • Published 18 days ago • 13
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published 19 days ago • 72
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published about 1 month ago • 47
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 20
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 18
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30, 2025 • 34
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26, 2025 • 17
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published Sep 26, 2025 • 32
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 18
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Paper • 2503.21745 • Published Mar 27, 2025 • 1
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity Paper • 2508.05609 • Published Aug 7, 2025 • 29
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience Paper • 2508.04700 • Published Aug 6, 2025 • 52
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21, 2025 • 38
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25, 2025 • 74
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18, 2025 • 48
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper • 2504.07083 • Published Apr 9, 2025 • 22
CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing Paper • 2503.06940 • Published Mar 10, 2025 • 11
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation Paper • 2408.13252 • Published Aug 23, 2024 • 26
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation Paper • 2409.18261 • Published Sep 26, 2024 • 1