# mfcc_boaw **Repository Path**: cppowboy_admin/mfcc_boaw ## Basic Information - **Project Name**: mfcc_boaw - **Description**: Extract MFCCs from videos and make bag-of-audio-words (BOAW) representations. - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-01 - **Last Updated**: 2025-04-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # mfcc_boaw Extract MFCC from videos and make bag-of-audio-words (BoAW) representations. MFCCs are extracted from mono-channel audio of given video, then a K-means model is trained on all the MFCCs. BoAW representations are computed by method in this paper: [Softening quantization in bag-of-audio-words](https://ieeexplore.ieee.org/document/6853821). ## Usage ### 0.Split the video dataset with `split_video_dataset.py` ``` usage: split_video_dataset.py [-h] vid_dir num_splits split_file positional arguments: vid_dir the video directory num_splits the number of splits split_file the split stored as pickle file optional arguments: -h, --help show this help message and exit ``` Sample usage: `python split_video_dataset.py 1 split-sample.pkl` **Note** There is no need for more than 1 split unless your video dataset is really large (e.g. more than 100k videos), since extracting audio and MFCC is fast. ### 1.Extract audio and then MFCCs with `vid2mfcc.py` ``` usage: vid2mfcc.py [-h] [-n NFFT] [-w WINLEN] [-s WINSTEP] split_file split mfcc_db positional arguments: split_file the pickled split file split the split to use, e.g. split-0 mfcc_db the database to store extracted frames, HDF5 format optional arguments: -h, --help show this help message and exit -n NFFT, --nfft NFFT the FFT size -w WINLEN, --winlen WINLEN the length of the analysis window in seconds -s WINSTEP, --winstep WINSTEP the step between successive windows in seconds ``` Sample usage: `python vid2mfcc.py split-sample.pkl split-0 mfcc_db.hdf5 -n 2048 -w 0.04 -s 0.02` **Note** * There are 3 tunable parameters for MFCC, `nfft`, `winlen` and `winstep`. * The recommended values are 2048, 0.04 and 0.02, respectively. Adjust them if you need to. ### 2.Train the K-means model with `kmeans.py` ``` usage: kmeans.py [-h] [-k CLUSTER] [-n NSAMPLE] mfcc_db positional arguments: mfcc_db the database to store extracted frames, HDF5 format optional arguments: -h, --help show this help message and exit -k CLUSTER, --cluster CLUSTER the number of clusters for kmeans -n NSAMPLE, --nsample NSAMPLE the number of samples ``` Sample usage: `python kmeans.py mfcc_db.hdf5 -k 128 -n 250000` **Note** * The recommended value for `cluster` is 128, which is also the dimension for BoAW representations. * The number of samples depends on the size of the total MFCC data in `mfcc_db`, usually 1/10 of that should suffice. ### 3.Compute BoAW representations with `boaw_extract.py` ``` usage: boaw_extract.py [-h] mfcc_db boaw_db kmeans_model positional arguments: mfcc_db the database to store extracted frames, HDF5 format boaw_db the database to store bag-of-audio-words, HDF5 format kmeans_model the trained kmeans model optional arguments: -h, --help show this help message and exit ``` Sample usage: `python boaw_extract.py vtt_mfcc.hdf5 vtt_boaw.hdf5 kmeans_model.pkl` ## Dependencies * Python 2.7 * FFmpeg: Install on [Ubuntu](https://tecadmin.net/install-ffmpeg-on-linux/). Other [platforms](https://www.google.com/). * Python libraries: `pip install -r requirements.txt`,