# audio-driven-talkingface-project **Repository Path**: Talking-Face-Project/audio-driven-talkingface-project ## Basic Information - **Project Name**: audio-driven-talkingface-project - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2020-08-13 - **Last Updated**: 2022-03-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # code note -------- ## UPDATE 2020.07.31 - 这个代码目前的readme 主要见 pipeline.py文件,下面的部分只做参考。 - -------------------- ## UPDATE 2020.4.1 ## 1. Fine-tune on a target person's short video ### a). prepare a talking face video which should be satisfied with: > 1). contain a single person > > 2). 25 fps > > 3). longer than 12 seconds (which should > 300 frames) > > 4). without large body tranlation (eg. move from the left to the right screen) rename to `[person_id].mp4` --> copy to `data` folder **NOTE**: You can make a video to 25fps by ``` ffmpeg -i xxx.mp4 -r 25 xxx-25fps.mp4 ``` ### b). Extract frames and landmarks by ``` python utils/extract_video_frames.py --video_path ./data/xxx.mp4 ``` ### c). Conduct 3D face reconstruction - (1) complie code > fllow the [readme](src=Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/kernels) > compile `tf_mesh_renderer` kernel to **.so** file - (2) modify the directory path of code [rasterize_triangles.py](Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/rasterize_triangles.py) - (3) reconstruct 3D face of input video (need about 2 min on TitanXP) > will output `coeff` and `render` files ``` cd Deep3DFaceReconstruction ################ for news dataset # (1) extract frames python dataset/extract_frames_News.py # (2) reconstruction CUDA_VISIBLE_DEVICES=1 python demo_19news.py -v ../data/31 ################# fro LRW dataset # (1) extract frames python dataset/extract_frames_LRW.py # (2) reconstruction CUDA_VISIBLE_DEVICES=1 python run_lrw.py ``` **NOTE:** GPU memory shoud >= 11G, I failed with GTX2070, BUT OK with Titan X ### d). Fine-tune the Audio Networks - (1) modify the directory path of code [rasterize_triangles.py](Audio/code/mesh_renderer/rasterize_triangles.py) - (2) prepare data ``` cd audio ####### for LRW dataset # (0) reconstruction LRW dataset see reconstruction # (1) combine coeff of LRW dataset python examples/LRW/run_combine_coeff.py # (3) packing train list to .pkl files python examples/LRW/run_generate_coeff_list.py # (2) extract mfcc feature python examples/LRW/run_extract_mfcc.py ####### for news dataset # (1) python examples/News/run_check_news.py --person_id 31 ``` - (3) train model ``` cd audio ####### for LRW datast bash scripts/train_lrw.py ####### for news dataset bash scripts/train_19news.py ``` ### e). Fine-tune the GAN network - (1) blend original_image + render_iamge ``` cd render bash dataset/S1_blend_iamges.sh ``` - (2) prepare train/test image list ``` python dataset/S2_get_image_list.py ``` - (3) get images' ArcFace features ``` bash dataset/S3_get_arcface_features.py ``` - (4) run the code ``` cd render-to-video python train_19news_1.py [person_id] [gpu_id] ``` NOTE: the saved model are in `render-to-video/checkpoints/memory_seq_p2p/[person+id]` ### f). Test on target person - (1) place the audio file for test under `data/audio/input` - (2) run the code ``` cd Audio/code/ python test_personalized.py [audio] [person_id] [gpu_id] ```