Best Practices and Tips for configuration

Our model only works on REAL people or the portrait image similar to REAL person. The anime talking head genreation method will be released in future.

Advanced confiuration options for inference.py:

Name	Configuration	default	Explaination
Enhance Mode	`--enhancer`	None	Using `gfpgan` or `RestoreFormer` to enhance the generated face via face restoration network
Background Enhancer	`--background_enhancer`	None	Using `realesrgan` to enhance the full video.
Still Mode	`--still`	False	Using the same pose parameters as the original image, fewer head motion.
Expressive Mode	`--expression_scale`	1.0	a larger value will make the expression motion stronger.
save path	`--result_dir`	`./results`	The file will be save in the newer location.
preprocess	`--preprocess`	`crop`	Run and produce the results in the croped input image. Other choices: `resize`, where the images will be resized to the specific resolution. `full` Run the full image animation, use with `--still` to get better results.
ref Mode (eye)	`--ref_eyeblink`	None	A video path, where we borrow the eyeblink from this reference video to provide more natural eyebrow movement.
ref Mode (pose)	`--ref_pose`	None	A video path, where we borrow the pose from the head reference video.
3D Mode	`--face3dvis`	False	Need additional installation. More details to generate the 3d face can be founded here.
free-view Mode	`--input_yaw`, `--input_pitch`, `--input_roll`	None	Genearting novel view or free-view 4D talking head from a single image. More details can be founded here.

About `--preprocess`

Our system automatically handles the input images via crop, resize and full.

In crop mode, we only generate the croped image via the facial keypoints and generated the facial anime avator. The animation of both expression and head pose are realistic.

Still mode will stop the eyeblink and head pose movement.

input image @bagbag1815	crop	crop w/still

In resize mode, we resize the whole images to generate the fully talking head video. Thus, an image similar to the ID photo can be produced. ⚠️ It will produce bad results for full person images.


❌ not suitable for resize mode	✅ good for resize mode

In full mode, our model will automatically process the croped region and paste back to the original image. Remember to use --still to keep the original head pose.

input	`--still`	`--still` & `enhancer`

About `--enhancer`

For higher resolution, we intergate gfpgan and real-esrgan for different purpose. Just adding --enhancer <gfpgan or RestoreFormer> or --background_enhancer <realesrgan> for the enhancement of the face and the full image.

# make sure above packages are available:
pip install gfpgan
pip install realesrgan

About `--face3dvis`

This flag indicate that we can generated the 3d-rendered face and it's 3d facial landmarks. More details can be founded here.

Input	Animated 3d face

Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.

Reference eye-link mode.

Input, w/ reference video , reference video

If the reference video is shorter than the input audio, we will loop the reference video .

Generating 4D free-view talking examples from audio and a single image

We use input_yaw, input_pitch, input_roll to control head pose. For example, --input_yaw -20 30 10 means the input head yaw degree changes from -20 to 30 and then changes from 30 to 10.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --input_yaw -20 30 10

Results, Free-view results, Novel view results

持年 / SadTalker