-
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper • 2409.06666 • Published • 60 -
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Paper • 2505.02625 • Published • 23 -
ICTNLP/Llama-3.1-8B-Omni
Updated • 71 • 418 -
ICTNLP/LLaMA-Omni2-7B
9B • Updated • 44 • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2409.06666
-
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper • 2409.06666 • Published • 60 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 26 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 151
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 30 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 122
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
-
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper • 2409.06666 • Published • 60 -
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Paper • 2505.02625 • Published • 23 -
ICTNLP/Llama-3.1-8B-Omni
Updated • 71 • 418 -
ICTNLP/LLaMA-Omni2-7B
9B • Updated • 44 • 4
-
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper • 2409.06666 • Published • 60 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 26 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 151
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 30 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 122
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71