Vision - a neutrino12 Collection

neutrino12 's Collections

Datasets & Evals

Personal Interests

Agent

Vision

Vision

updated Oct 1, 2025

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Paper • 2508.07981 • Published Aug 11, 2025 • 58
CharacterShot: Controllable and Consistent 4D Character Animation

Paper • 2508.07409 • Published Aug 10, 2025 • 39
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Paper • 2508.10881 • Published Aug 14, 2025 • 52
Puppeteer: Rig and Animate Your 3D Models

Paper • 2508.10898 • Published Aug 14, 2025 • 33
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21, 2025 • 38
Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23, 2025 • 88
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23, 2025 • 35
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings

Paper • 2508.00632 • Published Aug 1, 2025 • 3
Matrix-3D: Omnidirectional Explorable 3D World Generation

Paper • 2508.08086 • Published Aug 11, 2025 • 75
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

Paper • 2508.05405 • Published Aug 7, 2025 • 64
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Paper • 2508.14811 • Published Aug 20, 2025 • 42
Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21, 2025 • 36
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Paper • 2508.13009 • Published Aug 18, 2025 • 25
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Paper • 2508.08240 • Published Aug 11, 2025 • 45
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 38
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

Paper • 2508.19320 • Published Aug 26, 2025 • 29
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published Aug 24, 2025 • 26
Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Paper • 2508.16292 • Published Aug 22, 2025 • 9
ROSE: Remove Objects with Side Effects in Videos

Paper • 2508.18633 • Published Aug 26, 2025 • 7
Collaborative Multi-Modal Coding for High-Quality 3D Generation

Paper • 2508.15228 • Published Aug 21, 2025 • 4
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Paper • 2508.17811 • Published Aug 25, 2025 • 6
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22, 2025 • 66