-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2501.06282
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 11 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 13 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published • 1
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
defog/sqlcoder-7b-2
Text Generation • 7B • Updated • 79.7k • 420 -
microsoft/speecht5_tts
Text-to-Speech • Updated • 101k • 823 -
neo4j/text2cypher-gemma-2-9b-it-finetuned-2024v1
Text Generation • Updated • 567 • 33
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Paper • 2502.13128 • Published • 41 -
PAFT: Prompt-Agnostic Fine-Tuning
Paper • 2502.12859 • Published • 15 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
defog/sqlcoder-7b-2
Text Generation • 7B • Updated • 79.7k • 420 -
microsoft/speecht5_tts
Text-to-Speech • Updated • 101k • 823 -
neo4j/text2cypher-gemma-2-9b-it-finetuned-2024v1
Text Generation • Updated • 567 • 33
-
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper • 2501.06282 • Published • 53 -
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Paper • 2502.13128 • Published • 41 -
PAFT: Prompt-Agnostic Fine-Tuning
Paper • 2502.12859 • Published • 15 -
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper • 2601.08303 • Published • 18
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 11 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 13 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published • 1
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34