# mmtm

**Repository Path**: d754406193/mmtm

## Basic Information

- **Project Name**: mmtm
- **Description**: Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2020-12-09
- **Last Updated**: 2020-12-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mmtm-multimodal-transfer-module-for-cnn/action-recognition-in-videos-on-ntu-rgbd)](https://paperswithcode.com/sota/action-recognition-in-videos-on-ntu-rgbd?p=mmtm-multimodal-transfer-module-for-cnn)

## MMTM: Multimodal Transfer Module for CNN Fusion

Code for the paper [MMTM: Multimodal Transfer Module for CNN Fusion](https://arxiv.org/abs/1911.08670). This is a reimplementation of the original MMTM code to reproduce the results on NTU RGB+D dataset in Table 5 of the paper.

If you use this code, please cite the paper:

```
@inproceedings{vaezi20mmtm,
 author = {Vaezi Joze, Hamid Reza and Shaban, Amirreza and Iuzzolino, Michael L. and Koishida, Kazuhito},
 booktitle = {Conference on Computer Vision and Pattern Recognition ({CVPR})},
 title = {MMTM: Multimodal Transfer Module for CNN Fusion},
 year = {2020}
}
```

## Installation
This code has been tested on Ubuntu 16.04 with Python 3.8.3 and PyTorch 1.5.0.
* Install [Pytorch](https://pytorch.org).
* Install [tqdm](https://github.com/tqdm/tqdm) by running `pip install tqdm`.
* Install opencv by running `pip install opencv-python`.
* Install matplotlib by running `pip install matplotlib`.
* Install sklearn by running `pip install sklearn`.

## Download the pre-trained checkpoints and prepare NTU RGB+D dataset
* Clone this repository along with [MFAS](https://github.com/juanmanpr/mfas) submodule by running `git clone --recurse-submodules https://github.com/haamoon/mmtm.git`
* Download and uncompress the [checkpoints](https://gtvault-my.sharepoint.com/:u:/g/personal/ashaban6_gatech_edu/EZQR-QfpPqZPnK_ClGGkbtYBuDqWgWUdlsdun5p316uHIQ?e=1Nz8FI) and place them in 'ROOT/checkpoint' dicrectory.
* Download [NTU RGB+D](http://rose1.ntu.edu.sg/datasets/actionrecognition.asp) dataset.
* Copy all skeleton files to `ROOT/NUT/nturgbd_skeletons/` directory. 
* Change all video clips resolution to 256x256 30fps and copy them to `ROOT/NTU/nturgbd_rgb/avi_256x256_30/` directory.

## Evaluation
* Run `python main_mmtm_ntu.py --datadir ROOT/NTU --checkpointdir ROOT/checkpoints --test_cp fusion_mmtm_epoch_8_val_loss_0.1873.checkpoint --no_bad_skel`.
* Reduce the batch size if run out of memeory e.g. `--batchsize 1`.
* Add `--use_dataparallel` to use multiple GPUs.

## Training
* Run `python main_mmtm_ntu.py --datadir ROOT/NTU --checkpointdir ROOT/checkpoints --train --ske_cp skeleton_32frames_85.24.checkpoint --rgb_cp rgb_8frames_83.91.checkpoint`.
* We have trained the model with `--batchsize 20 --use_dataparallel` options on 4 GPUs.