DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning Paper • 2603.12257 • Published 8 days ago • 31
Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data Paper • 2603.07534 • Published 12 days ago • 5
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions Paper • 2601.17640 • Published Jan 25 • 5
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published Jan 26 • 126
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis Paper • 2601.14417 • Published Jan 20 • 5
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 48
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 49
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 214
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3, 2025 • 10