EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
Paper
•
2507.03905
•
Published
1Core Contributor 2Corresponding Authors
conda create -n echomimic_v3 python=3.10
conda activate echomimic_v3
pip install -r requirements.txt
| Models | Download Link | Notes |
|---|---|---|
| Wan2.1-Fun-1.3B-InP | 🤗 Huggingface | Base model |
| wav2vec2-base | 🤗 Huggingface | Audio encoder |
| EchoMimicV3 | 🤗 Huggingface | Our weights |
-- The weights is organized as follows.
./models/
├── Wan2.1-Fun-1.3B-InP
├── wav2vec2-base-960h
└── transformer
└── diffusion_pytorch_model.safetensors
### 🔑 Quick Inference
python infer.py
> Tips
> - Audio CFG: Audio CFG works optimally between 2~3. Increase the audio CFG value for better lip synchronization, while decreasing the audio CFG value can improve the visual quality.
> - Text CFG: Text CFG works optimally between 4~6. Increase the text CFG value for better prompt following, while decreasing the text CFG value can improve the visual quality.
> - TeaCache: The optimal range for `--teacache_thresh` is between 0~0.1.
> - Sampling steps: 5 steps for talking head, 15~25 steps for talking body.
> - Long video generation: If you want to generate a video longer than 138 frames, you can use Long Video CFG.
## 📝 TODO List
| Status | Milestone |
|:--------:|:-------------------------------------------------------------------------|
| 2025.08.08 | The inference code of EchoMimicV3 meet everyone on GitHub |
| 🚀 | Preview version Pretrained models trained on English and Chinese on HuggingFace |
| 🚀 | Preview version Pretrained models trained on English and Chinese on ModelScope |
| 🚀 | 720P Pretrained models trained on English and Chinese on HuggingFace |
| 🚀 | 720P Pretrained models trained on English and Chinese on ModelScope |
| 🚀 | The training code of EchoMimicV3 meet everyone on GitHub |
## 📒 Citation
If you find our work useful for your research, please consider citing the paper :
@misc{meng2025echomimicv3, title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation}, author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma}, year={2025}, eprint={2507.03905}, archivePrefix={arXiv} }
## 🌟 Star History
[](https://star-history.com/#antgroup/echomimic_v3&Date)