108 Star 867 Fork 1.5K

MindSpore/models

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Contents

TokenFusion Description

TokenFusion is a multimodal token fusion method tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images.

Paper: Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang. Multimodal Token Fusion for Vision Transformers. In CVPR 2022.

Model architecture

The overall architecture of TokenFusion is show below:

Dataset

Dataset used: NYUDv2

  • Dataset size:colorful images and depth images, with labels in 40 segmentation classes
    • Train:795 samples
    • Test:654 samples
  • Data format:image files
    • Note:Data will be processed in utils/datasets.py

Environment Requirements

Script Description

Script and Sample Code

.TokenFusion
├── README.md               # descriptions about TokenFusion
├── models
│   ├── mix_transformer.py  # definition of backbone model
│   ├── segformer.py        # definition of segmentation model
│   └── modules.py          # TokenFusion operations
├── utils
│   ├── datasets.py         # data loader
│   ├── helpers.py          # utility functions
│   ├── transforms.py       # data preprocessing functions
│   └── meter.py            # utility functions
├── eval.py                 # evaluation interface
├── cfg.py                  # configure file
├── config.py               # configure file

Training process

To Be Done

Evaluation Process

Launch

# infer example

python eval.py --checkpoint_path  [CHECKPOINT_PATH]

Checkpoint can be downloaded at here or Mindspore Hub.

Result

result: IoU=54.8, ckpt= ./tokenfusion_ascend_v180_nyudv2_research_cv_acc54.8.ckpt
Parameters Ascend
Model TokenFusion
Model Version tokenfusion_seg_mitb3_nyudv2
Resource Ascend 910
Uploaded Date 2022-08-10
MindSpore Version 1.8.0
Dataset NYUDv2
Outputs probability
Accuracy 1pc: 54.8%
Speed 1pc:1s/step

Description of Random Situation

We set the seed inside datasets.py.

ModelZoo Homepage

Please check the official homepage.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mindspore/models.git
git@gitee.com:mindspore/models.git
mindspore
models
models
master

搜索帮助