TL;DR: single portrait image 🙎♂️ + audio 🎤 = talking head video 🎞.
https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4
full image mode
is online! checkout here for more details.still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | input image @bagbag1815 |
---|---|---|
🔥 Several new mode, eg, still mode
, reference mode
, resize mode
are online for better and custom applications.
🔥 Happy to see more community demos at bilibili, Youtube and twitter #sadtalker.
[2023.06.12]: add more new features in WEBUI extension, see the discussion here.
[2023.06.05]: release a new 512 beta face model. Fixed some bugs and improve the performance.
[2023.04.15]: Adding automatic1111 colab by @camenduru, thanks for this awesome colab: .
[2023.04.12]: adding a more detailed sd-webui installation document, fixed reinstallation problem.
[2023.04.12]: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in sd-webui-extension
.
[2023.04.08]: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.
[2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
Tutorials from communities: 中文windows教程 | 日本語コース
Installing anaconda, python and git.
Creating the env and install the requirements.
git clone https://github.com/Winfredy/SadTalker.git
cd SadTalker
conda create -n sadtalker python=3.8
conda activate sadtalker
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
### tts is optional for gradio demo.
### pip install TTS
scoop install git
via scoop).ffmpeg
, following this instruction (OR using scoop install ffmpeg
via scoop).git clone https://github.com/Winfredy/SadTalker.git
.checkpoint
and gfpgan
below↓.start.bat
from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.More tips about installnation on Macbook and the Docker file can be founded here
You can run the following script to put all the models in the right place.
bash scripts/download_models.sh
Other alternatives:
we also provide an offline patch (
gfpgan/
), thus, no model will be downloaded when generating.
Google Driver: download our pre-trained model from this link (main checkpoints) and gfpgan (offline patch)
Github Release Page: download all the files from the lastest github release page, and then, put it in ./checkpoints.
百度云盘: we provided the downloaded model in checkpoints, 提取码: sadt. And gfpgan, 提取码: sadt.
Model explains:
Model | Description |
---|---|
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render). |
checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render). |
gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan . |
Model | Description |
---|---|
checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |
checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |
checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |
checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |
checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in dilb. |
checkpoints/BFM | 3DMM library file. |
checkpoints/hub | Face detection models used in face alignment. |
gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan . |
The final folder will be shown as:
Online: Huggingface | SDWebUI-Colab | Colab
Local Autiomatic1111 stable-diffusion webui extension: please refer to Autiomatic1111 stable-diffusion webui docs.
Local gradio demo(highly recommanded!): Similar to our hugging-face demo can be run by:
## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app.py
Local gradio demo(highly recommanded!):
webui.bat
, the requirements will be installed automatically.bash webui.sh
to start the webui.python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--enhancer gfpgan
The results will be saved in results/$SOME_TIMESTAMP/*.mp4
.
Using --still
to generate a natural full body video. You can add enhancer
to improve the quality of the generated video.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--result_dir <a file to store results> \
--still \
--preprocess full \
--enhancer gfpgan
More examples and configuration and tips can be founded in the >>> best practice documents <<<.
If you find our work useful in your research, please consider citing:
@article{zhang2022sadtalker,
title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
journal={arXiv preprint arXiv:2211.12194},
year={2022}
}
Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.
See also these wonderful 3rd libraries we use:
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .
All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。