Our model only works on REAL people or the portrait image similar to REAL person. The anime talking head genreation method will be released in future.
Advanced confiuration options for inference.py
:
Name | Configuration | default | Explaination |
---|---|---|---|
Enhance Mode | --enhancer |
None | Using gfpgan or RestoreFormer to enhance the generated face via face restoration network |
Background Enhancer | --background_enhancer |
None | Using realesrgan to enhance the full video. |
Still Mode | --still |
False | Using the same pose parameters as the original image, fewer head motion. |
Expressive Mode | --expression_scale |
1.0 | a larger value will make the expression motion stronger. |
save path | --result_dir |
./results |
The file will be save in the newer location. |
preprocess | --preprocess |
crop |
Run and produce the results in the croped input image. Other choices: resize , where the images will be resized to the specific resolution. full Run the full image animation, use with --still to get better results. |
ref Mode (eye) | --ref_eyeblink |
None | A video path, where we borrow the eyeblink from this reference video to provide more natural eyebrow movement. |
ref Mode (pose) | --ref_pose |
None | A video path, where we borrow the pose from the head reference video. |
3D Mode | --face3dvis |
False | Need additional installation. More details to generate the 3d face can be founded here. |
free-view Mode |
--input_yaw ,--input_pitch ,--input_roll
|
None | Genearting novel view or free-view 4D talking head from a single image. More details can be founded here. |
--preprocess
Our system automatically handles the input images via crop
, resize
and full
.
In crop
mode, we only generate the croped image via the facial keypoints and generated the facial anime avator. The animation of both expression and head pose are realistic.
Still mode will stop the eyeblink and head pose movement.
input image @bagbag1815 | crop | crop w/still |
---|---|---|
![]() |
![]() |
![]() |
In resize
mode, we resize the whole images to generate the fully talking head video. Thus, an image similar to the ID photo can be produced. ⚠️ It will produce bad results for full person images.
![]() |
![]() |
---|---|
❌ not suitable for resize mode | ✅ good for resize mode |
![]() |
![]() |
In full
mode, our model will automatically process the croped region and paste back to the original image. Remember to use --still
to keep the original head pose.
input | --still |
--still & enhancer
|
---|---|---|
![]() |
![]() |
![]() |
--enhancer
For higher resolution, we intergate gfpgan and real-esrgan for different purpose. Just adding --enhancer <gfpgan or RestoreFormer>
or --background_enhancer <realesrgan>
for the enhancement of the face and the full image.
# make sure above packages are available:
pip install gfpgan
pip install realesrgan
--face3dvis
This flag indicate that we can generated the 3d-rendered face and it's 3d facial landmarks. More details can be founded here.
Input | Animated 3d face |
---|---|
![]() |
Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
Input, w/ reference video , reference video |
---|
![]() |
If the reference video is shorter than the input audio, we will loop the reference video . |
We use input_yaw
, input_pitch
, input_roll
to control head pose. For example, --input_yaw -20 30 10
means the input head yaw degree changes from -20 to 30 and then changes from 30 to 10.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--result_dir <a file to store results> \
--input_yaw -20 30 10
Results, Free-view results, Novel view results |
---|
![]() |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。