MindVideo

Introduction

MindVideo is an open source Video toolbox for computer vision research and development based on MindSpore. It collects a series of classic and SoTA vision models, such as C3D and ARN, along with their pre-trained weights and training strategies.. With the decoupled module design, it is easy to apply or adapt mindvideo to your own CV tasks.

Major Features

Modular Design

We decompose the video framework into different components and one can easily construct a customized video framework by combining different modules.

Currently, MindVideo supports the Action Recognition , Video Tracking, Video segmentation.

Benchmark Results

The performance of the models trained with MindVideo is summarized in benchmark.md, where the training recipes and weights are both available.

Installation

Dependency

Use the following commands to install dependencies:

git clone https://gitee.com/ZJUT-ERCISS/zjut_mindvideo.git
cd zjut_mindvideo

# If you use vistr, the version of Python should be 3.7
# Please first install mindspore according to instructions on the official website: https://www.mindspore.cn/install

pip install -r requirements.txt
pip install -e .

Dataset Preparation

MindVideo supported dataset can be downloaded from:

Then put all training and evaluation data into one directory and then change "data_root" to that directory in data.json, like this:

"data_root": "/home/publicfile/dataset/tracking"

Within mindvideo, all data processing methods according to each dataset used can be found under the data folder.

Quick Start

Running

Each of the models supported by mindvideo has a runnable module for beginners. After installing MindSpore and the dependencies required by this repository, under the tutorials folder, you can find folders corresponding to the names of each model. There are learning modules specially designed for beginners, and you can open the .ipynb file and run the code. We also support some parameter configurations for quick start. When processing the YAML file containing the parameters required for each model, you can use the training and inference interfaces of all models under the tools folder. For this method, I3D is used as For example, just run the following command to train:

cd tools/classification
python train.py -c ../../mindvideo/config/i3d/i3d_rgb.yaml

and run following commands for evaluation:

cd tools/classification
python eval.py -c ../../mindvideo/config/i3d/i3d_rgb.yaml

and run following commands for inference:

cd tools/classification
python infer.py -c ../../mindvideo/config/i3d/i3d_rgb.yaml

Also, paperswithcode is a good resource for browsing the models within mindvideo, each can be found at:

Model	Link
ARN	https://paperswithcode.com/paper/few-shot-action-recognition-via-improved#code
C3D	https://paperswithcode.com/paper/learning-spatiotemporal-features-with-3d#code
Fairmot	https://paperswithcode.com/paper/a-simple-baseline-for-multi-object-tracking#code
I3D	https://paperswithcode.com/paper/quo-vadis-action-recognition-a-new-model-and#code
Nonlocal	https://paperswithcode.com/paper/non-local-neural-networks
R(2+1)D	https://paperswithcode.com/paper/a-closer-look-at-spatiotemporal-convolutions
Vist	https://paperswithcode.com/paper/video-swin-transformer#code
X3D	https://paperswithcode.com/paper/x3d-expanding-architectures-for-efficient
Vistr	https://paperswithcode.com/paper/end-to-end-video-instance-segmentation-with#code

Model Checkpoints

The links to download the pre-train models are as follows:

Model	Link
ARN	arn.ckpt
C3D	c3d.ckpt
Fairmot	fairmot_dla34-30_886.ckpt
I3D	i3d_rgb_kinetics400.ckpt
Nonlocal	nonlocal_mindspore.ckpt
R(2+1)D	r2plus1d18_kinetic400.ckpt
Vistr	vistr_r50_all.ckpt
X3D	x3d_l_kinetics400.ckpt
	x3d_m_kinetics400.ckpt
	x3d_s_kinetics400.ckpt
	x3d_xs_kinetics400.ckpt
Vist	ms_swin_base_patch244_window877_kinetics400_22k.ckpt
	ms_swin_small_patch244_window877_kinetics400_1k.ckpt
	ms_swin_tiny_patch244_window877_kinetics400_1k.ckpt

## Model List

C3D for Action Recognition.
I3D for Action Recognition.
X3D for Action Recognition.
R(2+1)d for Action Recognition.
NonLocal for Action Recognition.
ViST for Action Recognition.
fairMOT for One-shot Tracking.
VisTR for Instance Segmentation.
ARN for Few-shot Action Recognition.

The master branch works with MindSpore 1.5+.

Build Documentation

Clone mindvideo

git clone https://gitee.com/ZJUT-ERCISS/zjut_mindvideo.git
cd zjut_mindvideo

Install the building dependencies of documentation

pip install -r requirements.txt

Build documentation

make html

Open build/html/index.html with browser

License

This project is released under the Apache 2.0 license.

Supported Algorithms

Supported algorithms:

Action Recognition
Video Tracking
Video segmentation

Base Structure

MindVideo is a MindSpore-based Python package that provides high-level features:

Base backbone of models like c3d and resnet series.
Domain oriented rich dataset interface.
Rich visualization and IO(Input/Output) interfaces.

Feedbacks and Contact

The dynamic version is still under development, if you find any issue or have an idea on new features, please don't hesitate to contact us via Gitee Issues.

Contributing

We appreciate all contributions to improve MindVideo. Please refer to CONTRIBUTING.md for the contributing guideline.

License

This project is released under the Apache 2.0 license.

Acknowledgement

MindSpore is an open source project that welcome any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as standardized toolkit to reimplement existing methods and develop their own new computer vision methods.The contributors are listed in CONTRIBUTERS.md

Citation

If you find this project useful in your research, please consider citing:

@misc{MindVideo 2022,
    title={{MindVideo}:MindVideo Toolbox and Benchmark},
    author={MindVideo Contributors},
    howpublished = {\url{https://gitee.com/ZJUT-ERCISS/zjut_mindvideo}},
    year={2022}
}

Yanlq / zjut_mindvideo

MindVideo

Introduction

Major Features

Benchmark Results

Installation

Dependency

Dataset Preparation

Quick Start

Running

Model Checkpoints

Build Documentation

License

Supported Algorithms

Base Structure

Feedbacks and Contact

Contributing

License

Acknowledgement

Citation

简介

发行版

贡献者

近期动态

Yanlq / zjut_mindvideo .gitee-modal { width: 500px !important; }

MindVideo

Introduction

Major Features

Benchmark Results

Installation

Dependency

Dataset Preparation

Quick Start

Running

Model Checkpoints

Build Documentation

License

Supported Algorithms

Base Structure

Feedbacks and Contact

Contributing

License

Acknowledgement

Citation

简介

发行版

贡献者

近期动态

搜索帮助

Yanlq / zjut_mindvideo