# EchoMimic **Repository Path**: ccy_com/EchoMimic ## Basic Information - **Project Name**: EchoMimic - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2024-07-12 - **Last Updated**: 2024-11-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Zhiyuan Chen^* Jiajiong Cao^* Zhiquan Chen Yuming Li Chenguang Ma

*Equal Contribution.

Terminal Technology Department, Alipay, Ant Group.

## Gallery ### Audio Driven (Sing)

### Audio Driven (English)

### Audio Driven (Chinese)

### Landmark Driven

### Audio + Selected Landmark Driven

**（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）** ## Installation ### Download the Codes ```bash git clone https://github.com/BadToBest/EchoMimic cd EchoMimic ``` ### Python Environment Setup - Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7 - Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G) - Tested Python Version: 3.8 / 3.10 / 3.11 Create conda environment (Recommended): ```bash conda create -n echomimic python=3.8 conda activate echomimic ``` Install packages with `pip` ```bash pip install -r requirements.txt ``` ### Download ffmpeg-static Download and decompress [ffmpeg-static](https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.4-amd64-static.tar.xz), then ``` export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static ``` ### Download pretrained weights ```shell git lfs install git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights ``` The **pretrained_weights** is organized as follows. ``` ./pretrained_weights/ ├── denoising_unet.pth ├── reference_unet.pth ├── motion_module.pth ├── face_locator.pth ├── sd-vae-ft-mse │ └── ... ├── sd-image-variations-diffusers │ └── ... └── audio_processor └── whisper_tiny.pt ``` In which **denoising_unet.pth** / **reference_unet.pth** / **motion_module.pth** / **face_locator.pth** are the main checkpoints of **EchoMimic**. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works: - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse) - [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) - [audio_processor(whisper)](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt) ### Audio-Drived Algo Inference Run the python inference script: ```bash python -u infer_audio2vid.py ``` ### Audio-Drived Algo Inference On Your Own Cases Edit the inference config file **./configs/prompts/animation.yaml**, and add your own case: ```bash test_cases: "path/to/your/image": - "path/to/your/audio" ``` The run the python inference script: ```bash python -u infer_audio2vid.py ``` ## Release Plans | Status | Milestone | ETA | |:--------:|:-------------------------------------------------------------------------|:--:| | ✅ | The inference source code of the Audio-Driven algo meet everyone on GitHub | 9th July, 2024 | | ✅ | Pretrained models trained on English and Mandarin Chinese to be released | 9th July, 2024 | | 🚀 | The inference source code of the Pose-Driven algo meet everyone on GitHub | 13th July, 2024 | | 🚀 | Pretrained models with better pose control to be released | 13th July, 2024 | | 🚀 | Pretrained models with better sing performance to be released | TBD | | 🚀 | Accelerated models to be released | TBD | | 🚀 | Large-Scale and High-resolution Chinese-Based Talking Head Dataset | TBD | ## Acknowledgements We would like to thank the contributors to the [AnimateDiff](https://github.com/guoyww/AnimateDiff), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) and [MuseTalk](https://github.com/TMElyralab/MuseTalk) repositories, for their open research and exploration. We are also grateful to [V-Express](https://github.com/tencent-ailab/V-Express) and [hallo](https://github.com/fudan-generative-vision/hallo) for their outstanding work in the area of diffusion-based talking heads. If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately. ## EchoMimic Communities There are numerous developers actively engaged in projects centered around EchoMimic, and we are compelled to express our profound gratitude for their invaluable contributions. In acknowledgment of their efforts, we are pleased to highlight a selection of exemplary repositories below. These repositories have significantly augmented the capabilities of EchoMimic, thereby enhancing its potency and versatility in application. WebUi version: https://github.com/greengerong/EchoMimic ## Citation If you find our work useful for your research, please consider citing the paper: ``` @misc{chen2024echomimic, title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning}, author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma}, year={2024}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```