108 Star 864 Fork 1.5K

MindSpore/models

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
.gitee
.github
.ipynb_checkpoints
.jenkins
benchmark/ascend
community
how_to_contribute
official
research
3d/DeepLM
audio
cv
3D_DenseNet
3dcnn
ADNet
APDrawingGAN
AVA_cifar
AVA_hpa
AdaBin
AdderNGD
AdderQuant
Alexnet
AlignedReID++
AlignedReID
AlphaPose
ArbitraryStyleTransfer
ArtTrack
AttGAN
AttentionCluster
Auto-DeepLab
AutoSlim
BMN
BiMLP
C3D
CBAM
CFDT
CGAN
CLIFF
CMT
CSNL
CTSDG
CascadeRCNN
Cnet
ConstrainedBayesian
DBPN
DDAG
DDM
DDRNet
DRNet
DecoMR
DeepID
Deepsort
DeiT
DepthNet
DnCNN
DynamicQuant
E-NET
ECENet
EF-ENAS
EGnet
ESRGAN
EfficientDet_d0
FCANet
FCN8s
FCOS
FDA-BNN
FSAF
FaceAttribute
FaceDetection
FaceNet
FaceQualityAssessment
FaceRecognition
FaceRecognitionForTracking
Focus-DETR
FreeAnchor
GENet_Res50
GMMBN
GhostSR
Gold_YOLO
GridRCNN
HDR-Transformer
HMR
HRNetV2
HRNetW48_cls
HRNetW48_seg
HiFaceGAN
HireMLP
HourNAS
I3D
ICNet
IPT
IRN
ISyNet
Inception-v2
IndexNet
Instaboost
IntegralNeuralNetworks
JDE
LEO
LearningToSeeInTheDark
LightCNN
MAML
MCNN
MGN
MIMO-UNet
MODNet
MTCNN
MVD
ManiDP
MaskedFaceRecognition
NFNet
Neighbor2Neighbor
Non_local
OSVOS
OctSqueeze
PAGENet
PAMTRI
PDarts
PGAN
PSPNet
PTQ4SR
PVAnet
PWCNet
PaDiM
PatchCore
Pix2Pix
Pix2PixHD
PoseNet
ProtoNet
PyramidBox
RAOD
RCAN
RDN
REDNet30
RNA
ReIDStrongBaseline
RefSR-NeRF
RefineDet
RefineNet
RepVGG
ResNeSt50
ResNeXt
ResidualAttentionNet
S-GhostNet
SDNet
SE-Net
SE_ResNeXt50
SPADE
SPC-Net
SPPNet
SRGAN
SSIM-AE
STGAN
STPM
Segformer
SemanticHumanMatting
SiamFC
SinGAN
SpineNet
Spnas
StackedHourglass
StarGAN
StyTr-2
TCN
TNT
TinySAM
TokenFusion
fig
models
utils
README.md
cfg.py
config.py
eval.py
Twins
U-GAT-IT
UNet3+
Unet3d
VehicleNet
ViG
WGAN_GP
Yolact++
adelaide_ea
advanced_east
aecrnet
ats
augvit
autoaugment
beit
brdnet
cait
cct
cdp
centerface
centernet
centernet_det
centernet_resnet101
centernet_resnet50_v1
cmr
cnn_direction_model
cnnctc
conformer
convmixer
convnext
crnn_seq2seq_ocr
csd
cspdarknet53
darknet53
dcgan
dcrnn
delf
dem
densenet
detr
dgcnet_res101
dlinknet
dncnn
dnet_nas
dpn
east
ecolite
efficientnet-b0
eppmvsnet
erfnet
esr_ea
essay-recogination
faceboxes
fairmot
faster_rcnn_dcn
faster_rcnn_ssod
fastflow
fastscnn
fishnet99
flownet2
foveabox
frustum-pointnet
gan
ghostnet
ghostnet_d
ghostnet_quant
ghostnetv2
glore_res
googlenet
guided_anchoring
hardnet
hed
hlg
ibnnet
inception_resnet_v2
ivpf
lenet
lerf
libra-rcnn
lite-hrnet
llnet
lresnet100e_ir
m2det
meta-baseline
metric_learn
midas
mifnet
mnasnet
mobilenetV3_small_x1_0
mobilenetv3_large
ms_rcnn
nas-fpn
nasnet
nima
nima_vgg16
nnUNet
ntsnet
osnet
pcb
pcb_rpp
pnasnet
pointpillars
pointtransformer
predrnn++
proxylessnas
psenet
r2plus1d
ras
rbpn
rcnn
relationnet
renas
repvgg
res2net
res2net_deeplabv3
res2net_faster_rcnn
res2net_yolov3
resnet3d
resnet50_adv_pruning
resnet50_bam
resnetv2
resnetv2_50_frn
resnext152_64x4d
retinaface
retinanet_resnet101
retinanet_resnet152
rfcn
se_resnext50
siamRPN
simclr
simple_baselines
simple_pose
single_path_nas
sknet
slowfast
snn_mlp
sphereface
squeezenet
squeezenet1_1
sr_ea
srcnn
ssc_resnet50
ssd_ghostnet
ssd_inception_v2
ssd_inceptionv2
ssd_mobilenetV2
ssd_mobilenetV2_FPNlite
ssd_resnet34
ssd_resnet50
ssd_resnet_34
stgcn
stnet
swenet
t2t-vit
tall
textfusenet
tgcn
tinydarknet
tinynet
tnt
tracktor++
tracktor
trn
tsm
tsn
u2net
uni-uvpt
unisiam
vanillanet
vit_base
vnet
warpctc
wave_mlp
wdsr
wideresnet
yolov3_resnet18
yolov3_tiny
gflownets
gnn
hpc
l2o/hem-learning-to-cut
mm
nerf
nlp
recommend
rl
.gitkeep
README.md
README_CN.md
__init__.py
utils
.clang-format
.gitignore
CONTRIBUTING.md
CONTRIBUTING_CN.md
LICENSE
OWNERS
README.md
README_CN.md
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Contents

TokenFusion Description

TokenFusion is a multimodal token fusion method tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images.

Paper: Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang. Multimodal Token Fusion for Vision Transformers. In CVPR 2022.

Model architecture

The overall architecture of TokenFusion is show below:

Dataset

Dataset used: NYUDv2

  • Dataset size:colorful images and depth images, with labels in 40 segmentation classes
    • Train:795 samples
    • Test:654 samples
  • Data format:image files
    • Note:Data will be processed in utils/datasets.py

Environment Requirements

Script Description

Script and Sample Code

.TokenFusion
├── README.md               # descriptions about TokenFusion
├── models
│   ├── mix_transformer.py  # definition of backbone model
│   ├── segformer.py        # definition of segmentation model
│   └── modules.py          # TokenFusion operations
├── utils
│   ├── datasets.py         # data loader
│   ├── helpers.py          # utility functions
│   ├── transforms.py       # data preprocessing functions
│   └── meter.py            # utility functions
├── eval.py                 # evaluation interface
├── cfg.py                  # configure file
├── config.py               # configure file

Training process

To Be Done

Evaluation Process

Launch

# infer example

python eval.py --checkpoint_path  [CHECKPOINT_PATH]

Checkpoint can be downloaded at here or Mindspore Hub.

Result

result: IoU=54.8, ckpt= ./tokenfusion_ascend_v180_nyudv2_research_cv_acc54.8.ckpt
Parameters Ascend
Model TokenFusion
Model Version tokenfusion_seg_mitb3_nyudv2
Resource Ascend 910
Uploaded Date 2022-08-10
MindSpore Version 1.8.0
Dataset NYUDv2
Outputs probability
Accuracy 1pc: 54.8%
Speed 1pc:1s/step

Description of Random Situation

We set the seed inside datasets.py.

ModelZoo Homepage

Please check the official homepage.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mindspore/models.git
git@gitee.com:mindspore/models.git
mindspore
models
models
master

搜索帮助