# Audio2Face **Repository Path**: doc5/Audio2Face ## Basic Information - **Project Name**: Audio2Face - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-20 - **Last Updated**: 2024-06-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Audio2Face ## Notice The Test part and The UE project for xiaomei created by FACEGOOD is not available for commercial use.they are for testing purposes only. ## Description --- ![ue](rsc/ue.png) the case video [video](https://youtu.be/g28pvNaxN0I) https://user-images.githubusercontent.com/11623487/217445125-41ccc812-d93c-4789-b93a-73b1a008d821.mp4 [high resolution](https://youtu.be/g28pvNaxN0I) We create a project that transforms audio to blendshape weights,and drives the digital human,xiaomei,in UE project. ## Base Module --- ![figure1](rsc/net.png) ![figure2](rsc/layers.png) The framework we used contains three parts. In Formant network step, we perform fixed-function analysis of the input audio clip. In the articulation network, we concatenate an emotional state vector to the output of each convolution layer after the ReLU activation. The fully-connected layers at the end expand the 256+E abstract features to blendshape weights. ## Usage --- ![pipeline](/rsc/pipeline.PNG) this pipeline shows how we use FACEGOOD Audio2Face. [Test video 1](https://www.youtube.com/watch?v=f6DcsZCsOWM&ab_channel=MicrosoftDeveloper) [Test video 2](https://www.youtube.com/watch?v=bvT9yg3Uab4) Ryan Yun from columbia.edu ### Prepare data - step1: record voice and video ,and create animation from video in maya. note: the voice must contain vowel ,exaggerated talking and normal talking.Dialogue covers as many pronunciations as possible. - step2: we deal the voice with LPC,to split the voice into segment frames corresponding to the animation frames in maya. ### Input data Use ExportBsWeights.py to export weights file from Maya.Then we can get BS_name.npy and BS_value.npy . Use step1_LPC.py to deal with wav file to get lpc_*.npy . Preprocess the wav to 2d data. ### train We recommand that uses FACEGOOD avatary to produces trainning data.its fast and accurate. http://www.avatary.com All train code in folder code/train. the data for train is stored in dataSet In windows, you can simply download the dataset in the following link, then use the script to train the model. ```powershell/cmd ./train.bat ``` or you can use the following commands to generate data, train and test the model. [Note: you need to modify the path in every files to your own path.] ```shell cd code/train python step1_LPC.py // deal with wav file to get lpc_*.npy python step3_concat_select_split.py // generate train data and label python step4_train.py // train model python step5_inference.py // inference model ``` ### test In folder /test,we supply a test application named AiSpeech. wo provide a pretrained model,zsmeif.pb In floder /example/ueExample, we provide a packaged ue project that contains a digit human created by FACEGOOD can drived by /AiSpeech/zsmeif.py. you can follow the steps below to use it: 1. make sure you connect the microphone to computer. 2. run the script in terminal. > python zsmeif.py 3. when the terminal show the message "run main", please run FaceGoodLiveLink.exe which is placed in /example/ueExample/ folder. 4. click and hold on the left mouse button on the screen in UE project, then you can talk with the AI model and wait for the voice and animation response. ## Dependences tersorflow-gpu 2.6 cudatoolkit 11.3.1 cudnn 8.2.1 scipy 1.7.1 python-libs: pyaudio requests websocket websocket-client note: test can run with cpu. ## Data --- dataSet dims ``` dataSet train_data train_label_var val_data val_label_var 1 (24370, 32, 64, 1) (24370, 38) (1000, 32, 64, 1) (1000, 38) 2 (27271, 32, 64, 1) (27271, 38) (1000, 32, 64, 1) (1000, 38) 3 (12994, 32, 64, 1) (12994, 38) (1000, 32, 64, 1) (1000, 38) 4 (12994, 32, 64, 1) (12994, 37) (1000, 32, 64, 1) (1000, 37) 5 (22075, 32, 64, 1) (22075, 37) (1000, 32, 64, 1) (1000, 37) 6 (24976, 32, 64, 1) (24976, 37) (1000, 32, 64, 1) (1000, 37) 7 (22075, 32, 64, 1) (22075, 37) (1000, 32, 64, 1) (1000, 37) 8 (24976, 32, 64, 1) (24976, 37) (1000, 32, 64, 1) (1000, 37) 9 (17368, 32, 64, 1) (17368, 37) (1000, 32, 64, 1) (1000, 37) 10 (20269, 32, 64, 1) (20269, 37) (1000, 32, 64, 1) (1000, 37) 11 (25370, 32, 64, 1) (25370, 37) (1000, 32, 64, 1) (1000, 37) 12 (28271, 32, 64, 1) (28271, 37) (1000, 32, 64, 1) (1000, 37) 13 (20019, 32, 64, 1) (20019, 37) (1000, 32, 64, 1) (1000, 37) 14 (22920, 32, 64, 1) (22920, 37) (1000, 32, 64, 1) (1000, 37) 15 (22920, 32, 64, 1) (22920, 37) (1000, 32, 64, 1) (1000, 37) 16 (25485, 32, 64, 1) (25485, 37) (1000, 32, 64, 1) (1000, 37) ``` we only used dataSet 4 to dataSet 16 The testing data, Maya model, and ue4 test project can be downloaded from the link below. the index of dataset 4-16 is for trainning. [data_all](https://pan.baidu.com/s/1CGSzn639PUE7cUYnX4I3fQ) code : n6ty [GoogleDrive](https://drive.google.com/drive/folders/1r7b7sfMebhtG0NSZk1yHzMaHRosb8xd1?usp=sharing) Add maya model in maya-model folder ## Update uploaded the LPC source into code folder. ## Reference --- [Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion](chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://research.nvidia.com/sites/default/files/publications/karras2017siggraph-paper_0.pdf) ## Contact Wechat: FACEGOOD_CHINA Email:support@facegood.cc Discord: https://discord.gg/V46y6uTdw8 ## License Audio2Face Core is released under the terms of the MIT license.See COPYING for more information or see https://opensource.org/licenses/MIT.