neutrino12
's Collections
Vision
updated
Omni-Effects: Unified and Spatially-Controllable Visual Effects
Generation
Paper
•
2508.07981
•
Published
•
58
CharacterShot: Controllable and Consistent 4D Character Animation
Paper
•
2508.07409
•
Published
•
39
ToonComposer: Streamlining Cartoon Production with Generative
Post-Keyframing
Paper
•
2508.10881
•
Published
•
52
Puppeteer: Rig and Animate Your 3D Models
Paper
•
2508.10898
•
Published
•
33
SeC: Advancing Complex Video Object Segmentation via Progressive Concept
Construction
Paper
•
2507.15852
•
Published
•
38
Yume: An Interactive World Generation Model
Paper
•
2507.17744
•
Published
•
88
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
Paper
•
2507.17745
•
Published
•
35
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings
Paper
•
2508.00632
•
Published
•
3
Matrix-3D: Omnidirectional Explorable 3D World Generation
Paper
•
2508.08086
•
Published
•
75
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning
Paper
•
2508.05405
•
Published
•
64
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From
Sparse Inputs without Per-Scene Optimization
Paper
•
2508.14811
•
Published
•
42
Waver: Wave Your Way to Lifelike Video Generation
Paper
•
2508.15761
•
Published
•
36
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive
World Model
Paper
•
2508.13009
•
Published
•
25
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
•
2508.19247
•
Published
•
43
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for
Long-Horizon Tasks
Paper
•
2508.08240
•
Published
•
45
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from
Pixels
Paper
•
2508.17437
•
Published
•
38
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time
Autoregressive Video Generation
Paper
•
2508.19320
•
Published
•
29
Mixture of Contexts for Long Video Generation
Paper
•
2508.21058
•
Published
•
35
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image
Generation
Paper
•
2508.17472
•
Published
•
26
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
Paper
•
2508.16292
•
Published
•
9
ROSE: Remove Objects with Side Effects in Videos
Paper
•
2508.18633
•
Published
•
7
Collaborative Multi-Modal Coding for High-Quality 3D Generation
Paper
•
2508.15228
•
Published
•
4
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
Splatting
Paper
•
2508.17811
•
Published
•
6
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
Transformer Models
Paper
•
2509.17627
•
Published
•
66