# Tiktok Short Video Summarization
**Repository Path**: Howie0126/tiktok-short-video-summarization
## Basic Information
- **Project Name**: Tiktok Short Video Summarization
- **Description**: 抖音平台短视频摘要
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-08-28
- **Last Updated**: 2024-08-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Tiktok Summarizer
This repository contains a project designed to generate summaries of Douyin (TikTok) short videos. The input to this project is a short video, and the output includes a summarized video with voice-over and the summarized text.
## Features
- Video Input: Accepts Douyin short video as input.
- Text Summary: Generates a concise textual summary of the video content.
- Voice Synthesis: Converts the textual summary into a voice-over.
- Merged Output: Combines the original video with the synthesized voice-over and the text summary to produce a summarized version of the video.
## Updates
Coming soon.
## Getting Started
### Prerequisites
Before you begin, ensure you have met the following requirements:
- Python 3.8 or higher
- Required Python libraries
#### Pre-trained models
Link: [Weights](https://drive.google.com/drive/folders/1Z0WV_IJAHXV16sAGW7TmC9J_iFZQ9NSs?usp=drive_link)
You can download our pre-trained weights of CSTA.
There are 5 weights for the SumMe dataset and the other 5 for the TVSum dataset(1 weight for each split).
As shown in the paper, we tested everything 10 times (without fixation of seed) but only uploaded a single model as a representative for your convenience.
The uploaded weight is acquired when the seed is 123456, and the result is almost identical to our paper.
You should put 5 weights of the SumMe in ```weights/SumMe``` and the other 5 weights of the TVSum in ```weights/TVSum```.
The structure of the directory must be like below.
```
├── weights
└── SumMe
├── split1.pt
├── split2.pt
├── split3.pt
├── split4.pt
├── split5.pt
└── TVSum
├── split1.pt
├── split2.pt
├── split3.pt
├── split4.pt
├── split5.pt
```
### Installation
1. Clone the repository:
```bash
git clone https://gitee.com/Howie0126/tiktok-short-video-summarization.git
cd tiktok-short-video-summarization
```
2. Install the required packages:
```bash
pip install -r requirements.txt
```
### Usage
1. **Prepare your video**: Place the Douyin short video in the `video/` directory.
2. **Run the summarization script**:
```bash
sh run.sh
```
3. **Output**: The summarized video will be saved in the `final_output/` directory. The output will include:
- `summary_video.wav`: The audio of textual summary used in the video.
- `summary_video.mp4`: The video combined with both the text and voice-over.
### How to Generate summary videos
> Reference: [CSTA: CNN-based Spatiotemporal Attention for Video Summarization](https://github.com/thswodnjs3/CSTA)
You can generate summary videos using our models.
You can use either videos from public datasets or custom videos.
With the code below, you can apply our pre-trained models to raw videos to produce summary videos.
```
python generate_video.py --input_is_file True or False
--file_path 'path to input video'
--dir_path 'directory of input videos'
--ext 'video file extension'
--save_path 'path to save summary video'
--weight_path 'path to loaded weights'
e.g.
1)Using a directory
python generate_video.py --input_is_file False --dir_path './videos' --ext 'mp4' --save_path './summary_videos' --weight_path './weights/SumMe/split4.pt'
2)Using a single video file
python generate_video.py --input_is_file True --file_path './videos/Jumps.mp4' --save_path './summary_videos' --weight_path './weights/SumMe/split4.pt'
```
The explanation of the arguments is as follows.
If you change the 'ext' argument and input a directory of videos, you must modify the ['fourcc'](https://github.com/thswodnjs3/CSTA/blob/7227ee36a460b0bdc4aa83cb446223779365df45/generate_video.py#L34) variable in the 'produce_video' function within the 'generate_video.py' file.
Additionally, you must update this when inputting a single video file with different extensions other than 'mp4'.
```
1. input_is_file (bool): True or False
Indicates whether the input is a file or a directory.
If this is True, the 'file_path' argument is required.
If this is False, the 'dir_path' and 'ext' arguments are required.
2. file_path (str) e.g. './SumMe/Jumps.mp4'
The path of the video file.
This is only used when 'input_is_file' is True.
3. dir_path (str) e.g. './SumMe'
The path of the directory where video files are located.
This is only used when 'input_is_file' is False.
4. ext (str) e.g. 'mp4'
The file extension of the video files.
This is only used when 'input_is_file' is False.
5. sample_rate (int) e.g. 15
The interval between selected frames in a video.
For example, if the video has 30 fps, it will become 2 fps with a sample_rate of 15.
6. save_path (str) e.g. './summary_videos'
The path where the summary videos are saved.
7. weight_path (str) e.g. './weights/SumMe/split4.pt'
The path where the model weights are loaded from.
```
We referenced the KTS code from [DSNet](https://github.com/li-plus/DSNet).
However, they applied KTS to downsampled videos (2 fps), which can result in different shot change points and sometimes make it impossible to summarize videos.
We revised it to calculate change points based on the entire frames.
## Customization
You can customize the summary generation process by modifying the following:
- Text Summarization: The summarization model and parameters can be adjusted in `summarization.py`.
- Voice Synthesis: Change the voice properties in `summarization.py` to alter the tone and speed of the generated voice-over.
## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or suggestions.
## Acknowledgements
This project utilizes the following technologies:
- [CSTA](https://github.com/thswodnjs3/CSTA): a CNN-based spatiotemporal attention for video summarization.
- [FastWhisper](https://github.com/FamousDirector/FastWhisper): an optimized implementation of OpenAI's Whisper for multilingual transcription.
- [FastTextRank](https://github.com/ArtistScript/FastTextRank) for textual summarization.
- [ZHTTS](https://github.com/Jackiexiao/zhtts) for text-to-speech conversion.
- [MoviePy](https://zulko.github.io/moviepy/) for video processing.