TokenFusion Description
Model Architecture
Dataset
Environment Requirements
Script Description
- Script and Sample Code
  - Training Process
  - Evaluation Process
Model Description
- Performance
  - Evaluation Performance
Description of Random Situation
ModelZoo Homepage

TokenFusion Description

TokenFusion is a multimodal token fusion method tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images.

Paper: Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang. Multimodal Token Fusion for Vision Transformers. In CVPR 2022.

Model architecture

The overall architecture of TokenFusion is show below:

Dataset

Dataset used: NYUDv2

Dataset size：colorful images and depth images, with labels in 40 segmentation classes
- Train：795 samples
- Test：654 samples
Data format：image files
- Note：Data will be processed in utils/datasets.py

Environment Requirements

Hardware (GPU)
Framework
- MindSpore
For more information, please check the resources below:
- MindSpore Tutorials
- MindSpore Python API

Script Description

Script and Sample Code

.TokenFusion
├── README.md               # descriptions about TokenFusion
├── models
│   ├── mix_transformer.py  # definition of backbone model
│   ├── segformer.py        # definition of segmentation model
│   └── modules.py          # TokenFusion operations
├── utils
│   ├── datasets.py         # data loader
│   ├── helpers.py          # utility functions
│   ├── transforms.py       # data preprocessing functions
│   └── meter.py            # utility functions
├── eval.py                 # evaluation interface
├── cfg.py                  # configure file
├── config.py               # configure file

Training process

To Be Done

Evaluation Process

Launch

# infer example

python eval.py --checkpoint_path  [CHECKPOINT_PATH]

Checkpoint can be downloaded at here or Mindspore Hub.

Result

result: IoU=54.8, ckpt= ./tokenfusion_ascend_v180_nyudv2_research_cv_acc54.8.ckpt

Parameters	Ascend
Model	TokenFusion
Model Version	tokenfusion_seg_mitb3_nyudv2
Resource	Ascend 910
Uploaded Date	2022-08-10
MindSpore Version	1.8.0
Dataset	NYUDv2
Outputs	probability
Accuracy	1pc: 54.8%
Speed	1pc：1s/step

Description of Random Situation

We set the seed inside datasets.py.

ModelZoo Homepage

Please check the official homepage.

MindSpore/models

Contents

TokenFusion Description

Model architecture

Dataset

Environment Requirements

Script Description

Script and Sample Code

Training process

Evaluation Process

Launch

Result

Description of Random Situation

ModelZoo Homepage

About

Releases

Contributors

Language(Optional)

Activities

MindSpore/models .gitee-modal { width: 500px !important; }

Contents

Launch

Result

About

Releases

The Open Source Evaluation Index is derived from the OSS Compass evaluation system, which evaluates projects around the following three dimensions

Contributors

Language(Optional)

Activities

Search

MindSpore/models