# audio-driven-talkingface-project

**Repository Path**: Talking-Face-Project/audio-driven-talkingface-project

## Basic Information

- **Project Name**: audio-driven-talkingface-project
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 2
- **Created**: 2020-08-13
- **Last Updated**: 2022-03-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# code note 


--------
## UPDATE 2020.07.31
- 这个代码目前的readme 主要见 pipeline.py文件，下面的部分只做参考。
- 


--------------------
## UPDATE 2020.4.1
## 1. Fine-tune on a target person's short video

### a). prepare a talking face video 

which should be satisfied with:
> 1). contain a single person
> 
> 2). 25 fps
> 
> 3). longer than 12 seconds (which should > 300 frames)
> 
> 4). without large body tranlation (eg. move from the left to the right screen)

rename to `[person_id].mp4` --> copy to `data` folder

**NOTE**: You can make a video to 25fps by 
```
ffmpeg -i xxx.mp4 -r 25 xxx-25fps.mp4
``` 

### b). Extract frames and landmarks by 
```
python utils/extract_video_frames.py --video_path ./data/xxx.mp4
```

### c). Conduct 3D face reconstruction
- (1) complie code 
    > fllow the [readme](src=Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/kernels)  
    > compile `tf_mesh_renderer` kernel to **.so** file
- (2) modify the directory path of code [rasterize_triangles.py](Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/rasterize_triangles.py)  

- (3) reconstruct 3D face of input video (need about 2 min on TitanXP)
    > will output `coeff` and `render` files
    ```
    cd Deep3DFaceReconstruction
    
    ################ for news dataset
    # (1) extract frames
    python dataset/extract_frames_News.py 

    # (2) reconstruction
    CUDA_VISIBLE_DEVICES=1 python demo_19news.py -v ../data/31

    ################# fro LRW dataset
    # (1) extract frames 
    python dataset/extract_frames_LRW.py

    # (2) reconstruction
    CUDA_VISIBLE_DEVICES=1 python run_lrw.py

    ```
    **NOTE:** GPU memory shoud >= 11G, I failed with GTX2070, BUT OK with Titan X

### d). Fine-tune the Audio Networks
- (1) modify the directory path of code [rasterize_triangles.py](Audio/code/mesh_renderer/rasterize_triangles.py)
- (2) prepare data
    ```
    cd audio

    #######  for LRW dataset 
    # (0) reconstruction LRW dataset 
    see reconstruction 

    # (1) combine coeff of LRW dataset  
    python examples/LRW/run_combine_coeff.py
    
    # (3) packing train list to .pkl files
    python examples/LRW/run_generate_coeff_list.py

    # (2) extract mfcc feature
    python examples/LRW/run_extract_mfcc.py
    
    ####### for news dataset
    # (1) python examples/News/run_check_news.py --person_id 31

    ```
- (3) train model
    ```
    cd audio

    ####### for LRW datast
    bash scripts/train_lrw.py

    ####### for news dataset
    bash scripts/train_19news.py
     ```

### e). Fine-tune the GAN network
- (1) blend original_image + render_iamge
```
cd render
bash dataset/S1_blend_iamges.sh
```

- (2) prepare train/test image list
```
python dataset/S2_get_image_list.py
```

- (3) get images' ArcFace features
```
bash dataset/S3_get_arcface_features.py
```

- (4) run the code
```
cd render-to-video
python train_19news_1.py [person_id] [gpu_id] 
```
NOTE: the saved model are in `render-to-video/checkpoints/memory_seq_p2p/[person+id]`

### f). Test on target person
- (1) place the audio file for test under `data/audio/input`
- (2) run the code 
```
cd Audio/code/
python test_personalized.py [audio] [person_id] [gpu_id]
```