# Txt2Vid **Repository Path**: xuebashuoge/txt2vid ## Basic Information - **Project Name**: Txt2Vid - **Description**: Paddle version of Txt2Vid - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-08-28 - **Last Updated**: 2022-08-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Prepare for ffmpeg ## Install x264 clone from github https://github.com/corecodec/x264 build from source ```bash ../configre --enable-shared --disable-asm --prefix= --disable-opencl ``` add to path ```bash export LD_LIBRARY_PATH=$HOME/App/x264/lib:$LD_LIBRARY_PATH ``` modify path ```bash export PKG_CONFIG_PATH==$HOME/App/x264/lib/pkgconfig ``` ## Install SDL2 download from https://github.com/libsdl-org/SDL/releases/tag/release-2.0.22 build from source ```bash ../configre --prefix= ``` add to path ```bash export PATH=$HOME/App/SDL2/bin:$PATH ``` ## Install ffmpeg download from https://git.ffmpeg.org/ffmpeg.git (version 4.4.1) build from source ```bash ../configure --prefix= --enable-shared --enable-libx264 --enable-gpl --extra-cflags=-I/include --enable-sdl2 --enable-ffplay ``` add to path ```bash export LD_LIBRARY_PATH=$HOME/App/ffmpeg/lib:$LD_LIBRARY_PATH export PATH=$HOME/App/ffmpeg/bin:$PATH ``` install python package ```bash pip install ffmpeg-python ``` # Prepare for Opencv ```bash pip install opencv-python ``` # Prepare for PaddleSpeech ## Install conda dependencies for paddlespeech : ```bash conda install -y -c conda-forge sox libsndfile swig bzip2 ``` ## Install PaddlePaddle You can choose the PaddlePaddle version based on your system. For example, for CUDA 11.1, CuDNN8.0 install paddlepaddle-gpu 2.2.2: ```bash python -m pip install paddlepaddle-gpu==2.2.2.post111 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` ## Install PaddleSpeech https://github.com/PaddlePaddle/PaddleSpeech You can install paddlespeech by the following command,then you can use the ready-made examples in paddlespeech : ```bash # Some users may fail to install `kaldiio` due to the default download source, you can install `pytest-runner` at first; pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple # Make sure you are in the root directory of PaddleSpeech pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple ``` # Examples ## from pre-recorded text and PaddleSpeech TTS: ```bash python inference_streaming_pipeline.py -it text \ -TTS Paddle \ --checkpoint_path checkpoints/wav2lip_gan.pth \ --face sample_data/einstein.jpg \ -tif file \ --text_file_path sample_data/einstein_text.txt \ -vot file \ --video_file_out results/test_video.mp4 \ -am fastspeech2_ljspeech \ -voc pwgan_ljspeech ``` ## Streaming txt2vid video on a port using text or audio file available at server
server (AV-synced streamed video) -----> receiver (view AV stream)
^
|
pre-recorded audio/text + driving picture/video
On server launch the streaming inference script, and port forward to stream the generated txt2vid video. **Example Code:** Using PaddleSpeech for TTS - On receiver, ssh into the server with port-forwarding enabled. - This step is not needed if server = receiver (i.e. receiver is the local machine and has GPU access). ```bash ssh -Y -L localhost:8080:localhost:8080 @ ``` `8080` is the default port but any port should work with appropriately modified `` argument in `inference_streaming_pipeline` below. - On server launch the streaming inference script, and write txt2vid video to a port. ```bash python inference_streaming_pipeline.py -it text \ -TTS Paddle \ --checkpoint_path checkpoints/wav2lip_gan.pth \ --face sample_data/einstein.jpg \ -tif file \ --text_file_path sample_data/einstein_text.txt \ -vot socket \ --output_port 8080 \ -am fastspeech2_ljspeech \ -voc pwgan_ljspeech ``` Wait till it says `Model Loaded`. The code will halt here waiting for the ffplay command to ask for streaming content. - View streaming output on the receiver. Ensure `ffplay` is installed. ```bash ffplay -f avi http://localhost:8080 ``` ## Streaming txt2vid video on a port using streaming input from a sender
sender -----> server (AV-synced streamed video) -----> receiver (view AV stream)
^             ^
|             |
audio/video   driving picture/video
On server launch the streaming inference script, and port forward to stream the generated txt2vid video. **Example Code:** Using PaddleSpeech for TTS - On receiver, ssh into the server with port-forwarding enabled. - This step is not needed if server = receiver (i.e. receiver is the local machine and has GPU access). ```bash ssh -Y -L localhost:8080:localhost:8080 @ ``` `8080` is the default port but any port should work with appropriately modified `` argument in `inference_streaming_pipeline` below. - On server launch the streaming inference script, and write txt2vid video to a port. ```bash python inference_streaming_pipeline.py -it text \ -TTS Paddle \ --checkpoint_path checkpoints/wav2lip_gan.pth \ --face sample_data/einstein.jpg \ -tif socket \ --text_port 50007 \ -vot socket \ --output_port 8080 \ -am fastspeech2_ljspeech \ -voc pwgan_ljspeech ``` Wait till it says `Model Loaded`. The code will halt here waiting for the ffplay command to ask for streaming content.
`50007` is default port where the Server will stream the text. Can be specified to be something else, but also change `PORT` in below command. ```bash python input_stream_socket.py -it text \ -tif terminal \ --HOST 192.168.123.159 \ --PORT 50007 ``` # TODO, text to speech missing (maybe)