My notification - a nithin12342 Collection

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 21 days ago • 20

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs

Paper • 2601.17058 • Published 20 days ago • 187

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Paper • 2411.15399 • Published Nov 23, 2024 • 1

nvidia/personaplex-7b-v1

Audio-to-Audio • Updated 2 days ago • 228k • 1.73k

Qwen/Qwen3-ASR-0.6B

Automatic Speech Recognition • 0.9B • Updated 12 days ago • 49.6k • 192

Qwen3-ASR Technical Report

Paper • 2601.21337 • Published 13 days ago • 33

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 15 days ago • 23

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published 13 days ago • 68

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Paper • 2601.20354 • Published 14 days ago • 110

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Paper • 2601.21406 • Published 13 days ago • 4

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published 15 days ago • 7

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published 13 days ago • 42

SERA: Soft-Verified Efficient Repository Agents

Paper • 2601.20789 • Published 14 days ago • 11

moonshotai/Kimi-K2.5

Image-Text-to-Text • 171B • Updated 6 days ago • 504k • • 2.01k

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Paper • 2601.22904 • Published 12 days ago • 15

Phr00t/LTX2-Rapid-Merges

Image-Text-to-Video • Updated 7 days ago • 290

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper • 2601.23184 • Published 12 days ago • 35

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Paper • 2602.02092 • Published 9 days ago • 18

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published 9 days ago • 41

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published 12 days ago • 33

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published 9 days ago • 31

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published 9 days ago • 124

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Paper • 2601.21358 • Published 13 days ago • 7

Balancing Understanding and Generation in Discrete Diffusion Models

Paper • 2602.01362 • Published 10 days ago • 14

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published 8 days ago • 55

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Paper • 2602.01785 • Published 9 days ago • 90

LIVE: Long-horizon Interactive Video World Modeling

Paper • 2602.03747 • Published 8 days ago • 12

Qwen/Qwen3-Coder-Next

Text Generation • 80B • Updated 8 days ago • 141k • • 751

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Paper • 2602.03510 • Published 8 days ago • 26

RISE-Video: Can Video Generators Decode Implicit World Rules?

Paper • 2602.05986 • Published 6 days ago • 26

FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published 8 days ago • 143

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published 6 days ago • 40