# ICCV2021-Papers-with-Code
**Repository Path**: PythonIoT/ICCV2021-Papers-with-Code
## Basic Information
- **Project Name**: ICCV2021-Papers-with-Code
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2021-09-21
- **Last Updated**: 2022-11-07
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ICCV2021-Papers-with-Code
[ICCV 2021](http://iccv2021.thecvf.com/) 论文和开源项目合集(papers with code)!
1617 papers accepted - 25.9% acceptance rate
ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
> 注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
## 【ICCV 2021 论文和开源目录】
- [Backbone](#Backbone)
- [Transformer](#Transformer)
- [涨点神器](#Cool)
- [GAN](#GAN)
- [NAS](#NAS)
- [NeRF](#NeRF)
- [Loss](#Loss)
- [Zero-Shot Learning](#Zero-Shot-Learning)
- [Few-Shot Learning](#Few-Shot-Learning)
- [长尾(Long-tailed)](#Long-tailed)
- [Vision and Language](#VL)
- [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)
- [Multi-Label Image Recognition(多标签图像识别)](#MLIR)
- [2D目标检测(Object Detection)](#Object-Detection)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [Few-shot Segmentation](#Few-shot-Segmentation)
- [人体运动分割(Human Motion Segmentation)](#HMS)
- [目标跟踪(Object Tracking)](#Object-Tracking)
- [3D Point Cloud](#3D-Point-Cloud)
- [3D Object Detection(3D目标检测)](#Point-Cloud-Object-Detection)
- [3D Semantic Segmenation(3D语义分割)](#Point-Cloud-Semantic-Segmentation)
- [3D Instance Segmentation(3D实例分割)](#Point-Cloud-Instance-Segmentation)
- [3D Multi-Object Tracking(3D多目标跟踪)](#Point-Cloud-Multi-Object-Tracking)
- [Point Cloud Denoising(点云去噪)](#Point-Cloud-Denoising)
- [Point Cloud Registration(点云配准)](#Point-Cloud-Registration)
- [Point Cloud Completion(点云补全)](#PCC)
- [雷达语义分割(Radar Semantic Segmentation)](#RSS)
- [图像恢复(Image Restoration)](#Image-Restoration)
- [超分辨率(Super-Resolution)](#Super-Resolution)
- [去噪(Denoising)](#Denoising)
- [医学图像去噪(Medical Image Denoising)](#Medical-Image-Denoising)
- [去模糊(Deblurring)](#Deblurring)
- [阴影去除(Shadow Removal)](Shadow-Removal)
- [视频插帧(Video Frame Interpolation)](#VFI)
- [视频修复/补全(Video Inpainting)](#Video-Inpainting)
- [行人重识别(Person Re-identification)](#Re-ID)
- [行人搜索(Person Search)](#Person-Search)
- [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)
- [3D人头重建(3D Head Reconstruction)](#3D-Head-Reconstruction)
- [人脸识别(Face Recognition)](#FR)
- [人脸表情识别(Facial Expression Recognition)](#FER)
- [行为识别(Action Recognition)](#Action-Recognition)
- [时序动作定位(Temporal Action Localization)](#Temporal-Action-Localization)
- [动作检测(Action Detection)](Action-Detection)
- [群体活动识别(Group Activity Recognition)](#GAR)
- [手语识别(Sign Language Recognition)](#SLR)
- [文本检测(Text Detection)](#Text-Detection)
- [文本识别(Text Recognition)](#Text-Recognition)
- [文本替换(Text Repalcement)](#TR)
- [视觉问答(Visual Question Answering, VQA)](#Visual-Question-Answering)
- [对抗攻击(Adversarial Attack)](#Adversarial-Attack)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [视线估计(Gaze Estimation)](#Gaze-Estimation)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
- [图像编辑(Image Editing)](#Image-Editing)
- [图像合成(Image Synthesis)](#Image-Synthesis)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [三维重建(3D Reconstruction)](#3D-R)
- [视频稳像(Video Stabilization)](#Video-Stabilization)
- [细粒度识别(Fine-Grained Recognition)](#FGR)
- [风格迁移(Style Transfer)](#Style-Transfer)
- [神经绘画(Neural Painting)](#Neural-Painting)
- [特征匹配(Feature Matching)](#FM)
- [语义对应(Semantic Correspondence)](#Semantic-Correspondence)
- [边缘检测(Edge Detection)](#Edge-Detection)
- [相机标定(Camera Calibration)](#Camera-Calibration)
- [图像质量评估(Image Quality Assessment)](#IQA)
- [度量学习(Metric Learning)](#ML)
- [Unsupervised Domain Adaptation](#UDA)
- [Video Rescaling](#Video-Rescaling)
- [Hand-Object Interaction](#Hand-Object-Interaction)
- [Vision-and-Language Navigation](#VLN)
- [数据集(Datasets)](#Datasets)
- [其他(Others)](#Others)
# Backbone
**Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
**AutoFormer: Searching Transformers for Visual Recognition**
- Paper: https://arxiv.org/abs/2107.00651
- Code: https://github.com/microsoft/AutoML
**Bias Loss for Mobile Neural Networks**
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
**Vision Transformer with Progressive Sampling**
- Paper: https://arxiv.org/abs/2108.01684
- Code: https://github.com/yuexy/PS-ViT
**Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
- Paper: https://arxiv.org/abs/2101.11986
- Code: https://github.com/yitu-opensource/T2T-ViT
**Rethinking Spatial Dimensions of Vision Transformers**
- Paper: https://arxiv.org/abs/2103.16302
- Code: https://github.com/naver-ai/pit
**Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
- Paper: https://arxiv.org/abs/2103.14030
- Code: https://github.com/microsoft/Swin-Transformer
**Conformer: Local Features Coupling Global Representations for Visual Recognition**
- Paper: https://arxiv.org/abs/2105.03889
- Code: https://github.com/pengzhiliang/Conformer
**MicroNet: Improving Image Recognition with Extremely Low FLOPs**
- Paper: https://arxiv.org/abs/2108.05894
- Code: https://github.com/liyunsheng13/micronet
**Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
- Paper: https://arxiv.org/abs/2102.01063
- Code: https://github.com/idstcv/ZenNAS
# Visual Transformer
**Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
- Paper: https://arxiv.org/abs/2103.14030
- Code: https://github.com/microsoft/Swin-Transformer
**An Empirical Study of Training Self-Supervised Vision Transformers**
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
**Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
**Group-Free 3D Object Detection via Transformers**
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
**Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
**Rethinking and Improving Relative Position Encoding for Vision Transformer**
- Paper: https://arxiv.org/abs/2107.14222
- Code: https://github.com/microsoft/AutoML/tree/main/iRPE
**Emerging Properties in Self-Supervised Vision Transformers**
- Paper: https://arxiv.org/abs/2104.14294
- Code: https://github.com/facebookresearch/dino
**Learning Spatio-Temporal Transformer for Visual Tracking**
- Paper: https://arxiv.org/abs/2103.17154
- Code: https://github.com/researchmm/Stark
**Fast Convergence of DETR with Spatially Modulated Co-Attention**
- Paper: https://arxiv.org/abs/2101.07448
- Code: https://github.com/abc403/SMCA-replication
**Vision Transformer with Progressive Sampling**
- Paper: https://arxiv.org/abs/2108.01684
- Code: https://github.com/yuexy/PS-ViT
**Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
- Paper: https://arxiv.org/abs/2101.11986
- Code: https://github.com/yitu-opensource/T2T-ViT
**Rethinking Spatial Dimensions of Vision Transformers**
- Paper: https://arxiv.org/abs/2103.16302
- Code: https://github.com/naver-ai/pit
**The Right to Talk: An Audio-Visual Transformer Approach**
- Paper: https://arxiv.org/abs/2108.03256
- Code: None
**Joint Inductive and Transductive Learning for Video Object Segmentation**
- Paper: https://arxiv.org/abs/2108.03679
- Code: https://github.com/maoyunyao/JOINT
**Conformer: Local Features Coupling Global Representations for Visual Recognition**
- Paper: https://arxiv.org/abs/2105.03889
- Code: https://github.com/pengzhiliang/Conformer
**Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
- Paper: https://arxiv.org/abs/2108.03032
- Code: https://github.com/zhiheLu/CWT-for-FSS
**Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
- Paper: https://arxiv.org/abs/2108.03798
- Code: https://github.com/wzmsltw/PaintTransformer
**Conditional DETR for Fast Training Convergence**
- Paper: https://arxiv.org/abs/2108.06152
- Code: https://github.com/Atten4Vis/ConditionalDETR
**MUSIQ: Multi-scale Image Quality Transformer**
- Paper: https://arxiv.org/abs/2108.05997
- Code: https://github.com/google-research/google-research/tree/master/musiq
**SOTR: Segmenting Objects with Transformers**
- Paper: https://arxiv.org/abs/2108.06747
- Code: https://github.com/easton-cau/SOTR
**PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
- Paper(Oral): https://arxiv.org/abs/2108.08839
- Code: https://github.com/yuxumin/PoinTr
**SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
- Paper: https://arxiv.org/abs/2108.04444
- Code: https://github.com/AllenXiangX/SnowflakeNet
**Improving 3D Object Detection with Channel-wise Transformer**
- Paper: https://arxiv.org/abs/2108.10723
- Code: https://github.com/hlsheng1/CT3D
**TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
- Paper: https://arxiv.org/abs/2108.11116
- Code: None
**GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
- Paper: https://arxiv.org/abs/2108.12630
- Code: https://github.com/xueyee/GroupFormer
**Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
**Voxel Transformer for 3D Object Detection**
- Paper: https://arxiv.org/abs/2109.02497
- Code: None
**3D Human Texture Estimation from a Single Image with Transformers**
- Homepage: https://www.mmlab-ntu.com/project/texformer/
- Paper(Oral): https://arxiv.org/abs/2109.02563
- Code: None
**FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
- Paper: https://arxiv.org/abs/2109.02974
- Code: https://github.com/ruiliu-ai/FuseFormer
**CTRL-C: Camera calibration TRansformer with Line-Classification**
- Paper: https://arxiv.org/abs/2109.02259
- Code: https://github.com/jwlee-vcl/CTRL-C
**An End-to-End Transformer Model for 3D Object Detection**
- Homepage: https://facebookresearch.github.io/3detr/
- Paper: https://arxiv.org/abs/2109.08141
- Code: https://github.com/facebookresearch/3detr
**Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
- Paper: https://arxiv.org/abs/2109.08044
- Code: None
**PnP-DETR: Towards Efficient Visual Analysis with Transformers**
- Paper: https://arxiv.org/abs/2109.07036
- Code: https://github.com/twangnh/pnp-detr
# 涨点神器
**FaPN: Feature-aligned Pyramid Network for Dense Image Prediction**
- Paper: https://github.com/EMI-Group/FaPN
- Code: https://arxiv.org/abs/2108.07058
**Unifying Nonlocal Blocks for Neural Networks**
- Paper: https://arxiv.org/abs/2108.02451
- Code: https://github.com/zh460045050/SNL_ICCV2021
**Towards Learning Spatially Discriminative Feature Representations**
- Paper: https://arxiv.org/abs/2109.01359
- Code: None
# GAN
**Labels4Free: Unsupervised Segmentation using StyleGAN**
- Homepage: https://rameenabdal.github.io/Labels4Free/
- Paper: https://arxiv.org/abs/2103.14968
**GNeRF: GAN-based Neural Radiance Field without Posed Camera**
- Paper(Oral): https://arxiv.org/abs/2103.15606
- Code: https://github.com/MQ66/gnerf
**EigenGAN: Layer-Wise Eigen-Learning for GANs**
- Paper: https://arxiv.org/abs/2104.12476
- Code: https://github.com/LynnHo/EigenGAN-Tensorflow
**From Continuity to Editability: Inverting GANs with Consecutive Images**
- Paper: https://arxiv.org/abs/2107.13812
- Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
**Sketch Your Own GAN**
- Homepage: https://peterwang512.github.io/GANSketching/
- Paper: https://arxiv.org/abs/2108.02774
- 代码: https://github.com/peterwang512/GANSketching
**Manifold Matching via Deep Metric Learning for Generative Modeling**
- Paper: https://arxiv.org/abs/2106.10777
- Code: https://github.com/dzld00/pytorch-manifold-matching
**Dual Projection Generative Adversarial Networks for Conditional Image Generation**
- Paper: https://arxiv.org/abs/2108.09016
- Code: None
**GAN Inversion for Out-of-Range Images with Geometric Transformations**
- Paper: https://arxiv.org/abs/2108.08998
- Code: None
**ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement**
- Homepage: https://yuval-alaluf.github.io/restyle-encoder/
- Paper: https://arxiv.org/abs/2104.02699
- Code: https://github.com/yuval-alaluf/restyle-encoder
**StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery**
- Paper(Oral): https://arxiv.org/abs/2103.17249
- Code: https://github.com/orpatashnik/StyleCLIP
**Image Synthesis via Semantic Composition**
- Homepage: https://shepnerd.github.io/scg/
- Paper: https://arxiv.org/abs/2109.07053
- Code: https://github.com/dvlab-research/SCGAN
# NAS
**AutoFormer: Searching Transformers for Visual Recognition**
- Paper: https://arxiv.org/abs/2107.00651
- Code: https://github.com/microsoft/AutoML
**BN-NAS: Neural Architecture Search with Batch Normalization**
- Paper: https://arxiv.org/abs/2108.07375
- Code: https://github.com/bychen515/BNNAS
**Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
- Paper: https://arxiv.org/abs/2102.01063
- Code: https://github.com/idstcv/ZenNAS
# NeRF
**GNeRF: GAN-based Neural Radiance Field without Posed Camera**
- Paper(Oral): https://arxiv.org/abs/2103.15606
- Code: https://github.com/MQ66/gnerf
**KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs**
- Paper: https://arxiv.org/abs/2103.13744
- Code: https://github.com/creiser/kilonerf
**In-Place Scene Labelling and Understanding with Implicit Scene Representation**
- Homepage: https://shuaifengzhi.com/Semantic-NeRF/
- Paper(Oral): https://arxiv.org/abs/2103.15875
**Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis**
- Homepage: https://ajayj.com/dietnerf
- Paper(DietNeRF): https://arxiv.org/abs/2104.00677
**BARF: Bundle-Adjusting Neural Radiance Fields**
- Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
- Paper(Oral): https://arxiv.org/abs/2104.06405
- Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF
**Self-Calibrating Neural Radiance Fields**
- Paper: https://arxiv.org/abs/2108.13826
- Code: https://github.com/POSTECH-CVLab/SCNeRF
**Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
**Neural Articulated Radiance Field**
- Paper: https://arxiv.org/abs/2104.03110
- Code: https://github.com/nogu-atsu/NARF
**NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
- Paper(Oral): https://arxiv.org/abs/2109.01129
- Code: https://github.com/weiyithu/NerfingMVS
# Loss
**Rank & Sort Loss for Object Detection and Instance Segmentation**
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
**Bias Loss for Mobile Neural Networks**
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
**A Robust Loss for Point Cloud Registration**
- Paper: https://arxiv.org/abs/2108.11682
- Code: None
**Reconcile Prediction Consistency for Balanced Object Detection**
- Paper: https://arxiv.org/abs/2108.10809
- Code: None
# Zero-Shot Learning
**FREE: Feature Refinement for Generalized Zero-Shot Learning**
- Paper: https://arxiv.org/abs/2107.13807
- Code: https://github.com/shiming-chen/FREE
**Discriminative Region-based Multi-Label Zero-Shot Learning**
- Paper: https://arxiv.org/abs/2108.09301
- Code: https://arxiv.org/abs/2108.09301
# Few-Shot Learning
**Relational Embedding for Few-Shot Classification**
- Paper: https://arxiv.org/abs/2108.0966
- Code: https://github.com/dahyun-kang/renet
**Few-Shot and Continual Learning with Attentive Independent Mechanisms**
- Paper: https://arxiv.org/abs/2107.14053
- Code: https://github.com/huang50213/AIM-Fewshot-Continual
# 长尾(Long-tailed)
**Parametric Contrastive Learning**
- Paper: https://arxiv.org/abs/2107.12028
- Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
# Vision and Language
**VLGrammar: Grounded Grammar Induction of Vision and Language**
- Paper: https://arxiv.org/abs/2103.12975
- Code: https://github.com/evelinehong/VLGrammar
# 无监督/自监督(Un/Self-Supervised)
**An Empirical Study of Training Self-Supervised Vision Transformers**
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
**DetCo: Unsupervised Contrastive Learning for Object Detection**
- Paper: https://arxiv.org/abs/2102.04803
- Code: https://github.com/xieenze/DetCo
**Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
**Improving Contrastive Learning by Visualizing Feature Transformation**
- Paper(Oral): https://arxiv.org/abs/2108.02982
- Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
**Self-Supervised Visual Representations Learning by Contrastive Mask Prediction**
- Paper: https://arxiv.org/abs/2108.08012
- Code: None
**Temporal Knowledge Consistency for Unsupervised Visual Representation Learning**
- Paper: https://arxiv.org/abs/2108.10668
- Code: None
**MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving**
- Paper: https://arxiv.org/abs/2108.12178
- Code: https://github.com/KaiChen1998/MultiSiam
**Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
- Homepage: https://siyuanhuang.com/STRL/
- Paper: https://arxiv.org/abs/2109.00179
- Code: https://github.com/yichen928/STRL
**Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
- Paper: https://arxiv.org/abs/2109.02244
- Code: https://github.com/youngkyunJang/SPQ
# Multi-Label Image Recognition(多标签图像识别)
**Residual Attention: A Simple but Effective Method for Multi-Label Recognition**
- Paper: https://arxiv.org/abs/2108.02456
- Code: https://github.com/Kevinz-code/CSRA
# 2D目标检测(Object Detection)
**DetCo: Unsupervised Contrastive Learning for Object Detection**
- Paper: https://arxiv.org/abs/2102.04803
- Code: https://github.com/xieenze/DetCo
**Detecting Invisible People**
- Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
- Code: https://arxiv.org/abs/2012.08419
**Active Learning for Deep Object Detection via Probabilistic Modeling**
- Paper: https://arxiv.org/abs/2103.16130
- Code: None
**Conditional Variational Capsule Network for Open Set Recognition**
- Paper: https://arxiv.org/abs/2104.09159
- Code: https://github.com/guglielmocamporese/cvaecaposr
**MDETR : Modulated Detection for End-to-End Multi-Modal Understanding**
- Homepage: https://ashkamath.github.io/mdetr_page/
- Paper(Oral): https://arxiv.org/abs/2104.12763
- Code: https://github.com/ashkamath/mdetr
**Rank & Sort Loss for Object Detection and Instance Segmentation**
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
**SimROD: A Simple Adaptation Method for Robust Object Detection**
- Paper(Oral): https://arxiv.org/abs/2107.13389
- Code: None
**GraphFPN: Graph Feature Pyramid Network for Object Detection**
- Paper: https://arxiv.org/abs/2108.00580
- Code: None
**Fast Convergence of DETR with Spatially Modulated Co-Attention**
- Paper: https://arxiv.org/abs/2101.07448
- Code: https://github.com/abc403/SMCA-replication
**Conditional DETR for Fast Training Convergence**
- Paper: https://arxiv.org/abs/2108.06152
- Code: https://github.com/Atten4Vis/ConditionalDETR
**TOOD: Task-aligned One-stage Object Detection**
- Paper(Oral): https://arxiv.org/abs/2108.07755
- Code: https://github.com/fcjian/TOOD
**Reconcile Prediction Consistency for Balanced Object Detection**
- Paper: https://arxiv.org/abs/2108.10809
- Code: None
**Mutual Supervision for Dense Object Detection**
- Paper: https://arxiv.org/abs/2109.05986
- Code: https://github.com/MCG-NJU/MuSu-Detection
**PnP-DETR: Towards Efficient Visual Analysis with Transformers**
- Paper: https://arxiv.org/abs/2109.07036
- Code: https://github.com/twangnh/pnp-detr
## 半监督目标检测
**End-to-End Semi-Supervised Object Detection with Soft Teacher**
- Paper: https://arxiv.org/abs/2106.09018
- Code: None
## 旋转目标检测
**Oriented R-CNN for Object Detection**
- Paper: https://arxiv.org/abs/2108.05699
- Code: https://github.com/jbwang1997/OBBDetection
## Few-Shot目标检测
**DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection**
- Paper: https://arxiv.org/abs/2108.09017
- Code: https://github.com/er-muyue/DeFRCN
## 语义分割(Semantic Segmentation)
**Personalized Image Semantic Segmentation**
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
**Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
- Paper(Oral): https://arxiv.org/abs/2107.11264
- Code: None
**Enhanced Boundary Learning for Glass-like Object Segmentation**
- Paper: https://arxiv.org/abs/2103.15734
- Code: https://github.com/hehao13/EBLNet
**Self-Regulation for Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.09702
- Code: https://github.com/dongzhang89/SR-SS
**Mining Contextual Information Beyond Image for Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.11819
- Code: https://github.com/CharlesPikachu/mcibi
**Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
- Paper: https://arxiv.org/abs/2107.11264
- Code: https://github.com/shjung13/Standardized-max-logits
**ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.12382
- Code: https://github.com/SegmentationBLWX/sssegmentation
## 无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)
**Multi-Anchor Active Domain Adaptation for Semantic Segmentation**
- Paper(Oral): https://arxiv.org/abs/2108.08012
- Code: https://github.com/munanning/MADA
论文下载链接:https://arxiv.org/abs/2108.08012
## Few-Shot语义分割
**Learning Meta-class Memory for Few-Shot Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.02958'
- Code: None
**Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
- Paper: https://arxiv.org/abs/2108.03032
- Code: https://github.com/zhiheLu/CWT-for-FSS
## 半监督语义分割(Semi-supervised Semantic Segmentation)
**Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2107.11787
- Code: None
**Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation**
- Paper(Oral): https://arxiv.org/abs/2107.11279
- Code: https://github.com/CVMI-Lab/DARS
**Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.09025
- Code: None
## 弱监督语义分割(Weakly Supervised Semantic Segmentation)
**Complementary Patch for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2108.03852
- Code: None
## 无监督分割(Unsupervised Segmentation)
**Labels4Free: Unsupervised Segmentation using StyleGAN**
- Homepage: https://rameenabdal.github.io/Labels4Free/
- Paper: https://arxiv.org/abs/2103.14968
# 实例分割(Instance Segmentation)
**Instances as Queries**
- Paper: https://arxiv.org/abs/2105.01928
- Code: https://github.com/hustvl/QueryInst
**Crossover Learning for Fast Online Video Instance Segmentation**
- Paper: https://arxiv.org/abs/2104.05970
- Code: https://github.com/hustvl/CrossVIS
**Rank & Sort Loss for Object Detection and Instance Segmentation**
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
**SOTR: Segmenting Objects with Transformers**
- Paper: https://arxiv.org/abs/2108.06747
- Code: https://github.com/easton-cau/SOTR
# 医学图像分割(Medical Image Segmentation)
**Recurrent Mask Refinement for Few-Shot Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2108.00622
- Code: https://github.com/uci-cbcl/RP-Net
# 视频目标分割(Video Object Segmentation)
**Full-Duplex Strategy for Video Object Segmentation**
- Homepage: http://dpfan.net/FSNet/
- Paper: https://arxiv.org/abs/2108.03151
- Code: https://github.com/GewelsJI/FSNet
**Joint Inductive and Transductive Learning for Video Object Segmentation**
- Paper: https://arxiv.org/abs/2108.03679
- Code: https://github.com/maoyunyao/JOINT
# Few-shot Segmentation
**Mining Latent Classes for Few-shot Segmentation**
- Paper(Oral): https://arxiv.org/abs/2103.15402
- Code: https://github.com/LiheYoung/MiningFSS
# 人体运动分割(Human Motion Segmentation)
**Graph Constrained Data Representation Learning for Human Motion Segmentation**
- Paper: https://arxiv.org/abs/2107.13362
- Code: None
# 目标跟踪(Object Tracking)
**Learning to Track Objects from Unlabeled Videos**
- Paper: https://arxiv.org/abs/2108.12711
- Code: https://github.com/VISION-SJTU/USOT
**Learning Spatio-Temporal Transformer for Visual Tracking**
- Paper: https://arxiv.org/abs/2103.17154
- Code: https://github.com/researchmm/Stark
**Learning to Adversarially Blur Visual Object Tracking**
- Paper: https://arxiv.org/abs/2107.12085
- Code: https://github.com/tsingqguo/ABA
**HiFT: Hierarchical Feature Transformer for Aerial Tracking**
- Paper: https://arxiv.org/abs/2108.00202
- Code: https://github.com/vision4robotics/HiFT
**Learn to Match: Automatic Matching Network Design for Visual Tracking**
- Paper: https://arxiv.org/abs/2108.00803
- Code: https://github.com/JudasDie/SOTS
**Saliency-Associated Object Tracking**
- Paper: https://arxiv.org/abs/2108.03637
- Code: https://github.com/ZikunZhou/SAOT.git
## RGBD 目标跟踪
**DepthTrack: Unveiling the Power of RGBD Tracking**
- Paper: https://arxiv.org/abs/2108.13962
- Code: https://github.com/xiaozai/DeT
- Dataset: https://github.com/xiaozai/DeT
# 3D Point Cloud
**Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
- Homepage: https://siyuanhuang.com/STRL/
- Paper: https://arxiv.org/abs/2109.00179
- Code: https://github.com/yichen928/STRL
**Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
- Homepage: https://hansen7.github.io/OcCo/
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
**DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
- Paper: https://arxiv.org/abs/2108.04023
- Code: None
**Adaptive Graph Convolution for Point Cloud Analysis**
- Paper: https://arxiv.org/abs/2108.08035
- Code: https://github.com/hrzhou2/AdaptConv-master
**Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
# 3D Object Detection(3D目标检测)
**Group-Free 3D Object Detection via Transformers**
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
**Improving 3D Object Detection with Channel-wise Transformer**
- Paper: https://arxiv.org/abs/2108.10723
- Code: https://github.com/hlsheng1/CT3D
**AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection**
- Paper: https://arxiv.org/abs/2108.11127
- Code: https://github.com/zongdai/AutoShape
**4D-Net for Learned Multi-Modal Alignment**
- Paper: https://arxiv.org/abs/2109.01066
- Code: None
**Voxel Transformer for 3D Object Detection**
- Paper: https://arxiv.org/abs/2109.02497
- Code: None
**Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection**
- Paper: https://arxiv.org/abs/2109.02499
- Code: None
**An End-to-End Transformer Model for 3D Object Detection**
- Homepage: https://facebookresearch.github.io/3detr/
- Paper: https://arxiv.org/abs/2109.08141
- Code: https://github.com/facebookresearch/3detr
## 3D Semantic Segmentation(3D语义分割)
**ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation**
- Paper: https://arxiv.org/abs/2107.11769
- Code: None
**Learning with Noisy Labels for Robust Point Cloud Segmentation**
- Homepage: https://shuquanye.com/PNAL_website/
- Paper(Oral): https://arxiv.org/abs/2107.14230
**VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation**
- Paper(Oral): https://arxiv.org/abs/2107.13824
- Code: https://github.com/hzykent/VMNet
**Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation**
- Paper: https://arxiv.org/abs/2107.14724
- Code: https://github.com/leolyj/DsCML
**DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
- Paper: https://arxiv.org/abs/2108.04023
- Code: None
**Adaptive Graph Convolution for Point Cloud Analysis**
- Paper: https://arxiv.org/abs/2108.08035
- Code: https://github.com/hrzhou2/AdaptConv-master
## 3D Instance Segmentation(3D实例分割)
**Hierarchical Aggregation for 3D Instance Segmentation**
- Paper: https://arxiv.org/abs/2108.02350
- Code: https://github.com/hustvl/HAIS
## 3D Multi-Object Tracking(3D多目标跟踪)
**Exploring Simple 3D Multi-Object Tracking for Autonomous Driving**
- Paper: https://arxiv.org/abs/2108.10312
- Code: https://github.com/qcraftai/simtrack
## Point Cloud Denoising(点云去噪)
**Score-Based Point Cloud Denoising**
- Paper: https://arxiv.org/abs/2107.10981
- Code: None
## Point Cloud Registration(点云配准)
**HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration**
- Homepage: https://ispc-group.github.io/hregnet
- Paper: https://arxiv.org/abs/2107.11992
- Code: https://github.com/ispc-lab/HRegNet
**A Robust Loss for Point Cloud Registration**
- Paper: https://arxiv.org/abs/2108.11682
- Code: None
# Point Cloud Completion(点云补全)
**PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
- Paper(Oral): https://arxiv.org/abs/2108.08839
- Code: https://github.com/yuxumin/PoinTr
**SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
- Paper: https://arxiv.org/abs/2108.04444
- Code: https://github.com/AllenXiangX/SnowflakeNet
# 雷达语义分割(Radar Semantic Segmentation)
**Multi-View Radar Semantic Segmentation**
- Paper: https://arxiv.org/abs/2103.16214
- Code: https://github.com/valeoai/MVRSS
# 图像恢复(Image Restoration)
**Dynamic Attentive Graph Learning for Image Restoration**
- Paper: https://arxiv.org/abs/2109.06620
- Code: https://github.com/jianzhangcs/DAGL
# 超分辨率(Super-Resolution)
**Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks**
- Paper: https://arxiv.org/abs/2004.03791
- Code: https://github.com/LongguangWang/ArbSR
**Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution**
- Paper: https://arxiv.org/abs/2108.05302
- Code: https://github.com/JingyunLiang/MANet
**Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
- Paper(Oral): https://arxiv.org/abs/2108.08286
- Code: None
**Dual-Camera Super-Resolution with Aligned Attention Modules**
- Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
- Paper: https://arxiv.org/abs/2109.01349
- Code: https://github.com/Tengfei-Wang/DualCameraSR
- Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
**Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
- Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
- Code: https://github.com/IanYeung/RealVSR
- Dataset: https://github.com/IanYeung/RealVSR
# 去噪(Denoising)
**Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
- Paper(Oral): https://arxiv.org/abs/2108.08286
- Code: None
**Rethinking Deep Image Prior for Denoising**
- Paper: https://arxiv.org/abs/2108.12841
- Code: https://github.com/gistvision/DIP-denosing
# 医学图像去噪(Medical Image Denoising)
**Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
- Paper: https://arxiv.org/abs/2109.08044
- Code: None
# 去模糊(Deblurring)
**Rethinking Coarse-to-Fine Approach in Single Image Deblurring**
- Paper: https://arxiv.org/abs/2108.05054
- Code: https://github.com/chosj95/MIMO-UNet
**Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions**
- Paper: https://arxiv.org/abs/2108.09108
- Code: None
# 阴影去除(Shadow Removal)
**CANet: A Context-Aware Network for Shadow Removal**
- Paper: https://arxiv.org/abs/2108.09894
- Code: https://github.com/Zipei-Chen/CANet
# 视频插帧(Video Frame Interpolation)
**XVFI: eXtreme Video Frame Interpolation**
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
**Asymmetric Bilateral Motion Estimation for Video Frame Interpolation**
- Paper: https://arxiv.org/abs/2108.06815
- Code: https://github.com/JunHeum/ABME
# 视频修复/补全(Video Inpainting)
**FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
- Paper: https://arxiv.org/abs/2109.02974
- Code: https://github.com/ruiliu-ai/FuseFormer
# 行人重识别(Person Re-identification)
**TransReID: Transformer-based Object Re-Identification**
- Paper: https://arxiv.org/abs/2102.04378
- Code: https://github.com/heshuting555/TransReID
**IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID**
- Paper(Oral): https://arxiv.org/abs/2108.02413
- Code: https://github.com/SikaStar/IDM
# 行人搜索(Person Search)
**Weakly Supervised Person Search with Region Siamese Networks**
- Paper: https://arxiv.org/abs/2109.06109
- Code: None
# 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
## 2D 人体姿态估计
**Human Pose Regression with Residual Log-likelihood Estimation**
- Paper(Oral): https://arxiv.org/abs/2107.11291
- Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
**Online Knowledge Distillation for Efficient Pose Estimation**
- Paper: https://arxiv.org/abs/2108.02092
- Code: None
## 3D 人体姿态估计
**Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows**
- Paper: https://arxiv.org/abs/2107.13788
- Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
**Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images**
- Paper: https://arxiv.org/abs/2109.05885
- Code: None
# 3D人头重建(3D Head Reconstruction)
**H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
- Homepage: https://crisalixsa.github.io/h3d-net/
- Paper: https://arxiv.org/abs/2107.12512
# 人脸识别(Face Recognition)
**SynFace: Face Recognition with Synthetic Data**
- Paper: https://arxiv.org/abs/2108.07960
- Code: None
# Facial Expression Recognition(人脸表情识别)
**TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
- Paper: https://arxiv.org/abs/2108.11116
- Code: None
# 行为识别(Action Recognition)
**MGSampler: An Explainable Sampling Strategy for Video Action Recognition**
- Paper: https://arxiv.org/abs/2104.09952
- Code: None
**Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition**
- Paper: https://arxiv.org/abs/2107.12213
- Code: https://github.com/Uason-Chen/CTR-GCN
**Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
**Dynamic Network Quantization for Efficient Video Inference**
- Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
- Paper: https://arxiv.org/abs/2108.10394
- Code: https://github.com/sunxm2357/VideoIQ
# 时序动作定位(Temporal Action Localization)
**Enriching Local and Global Contexts for Temporal Action Localization**
- Paper: https://arxiv.org/abs/2107.12960
- Code: None
# 动作检测(Action Detection)
**Class Semantics-based Attention for Action Detection**
- Paper: https://arxiv.org/abs/2109.02613
- Code: None
# 群体活动识别(Group Activity Recognition)
**GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
- Paper: https://arxiv.org/abs/2108.12630
- Code: https://github.com/xueyee/GroupFormer
# 手语识别(Sign Language Recognition)
**Visual Alignment Constraint for Continuous Sign Language Recognition**
- Paper: https://arxiv.org/abs/2104.02330
- Code: https://github.com/ycmin95/VAC_CSLR
# 文本检测(Text Detection)
**Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection**
- Paper: https://arxiv.org/abs/2107.12664
- Code: https://github.com/GXYM/TextBPN
# 文本识别(Text Recognition)
**Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition**
- Paper: https://arxiv.org/abs/2107.12090
- Code: None
# 文本替换(Text Replacement)
**STRIVE: Scene Text Replacement In Videos**
- Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
- Paper: https://arxiv.org/abs/2109.02762
- Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
- Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
# 视觉问答(Visual Question Answering, VQA)
**Greedy Gradient Ensemble for Robust Visual Question Answering**
- Paper: https://arxiv.org/abs/2107.12651
- Code: https://github.com/GeraldHan/GGE
# 对抗攻击(Adversarial Attack)
**Feature Importance-aware Transferable Adversarial Attacks**
- Paper: https://arxiv.org/abs/2107.14185
- Code: https://github.com/hcguoO0/FIA
**AdvDrop: Adversarial Attack to DNNs by Dropping Information**
- Paper: https://arxiv.org/abs/2108.09034
- Code: https://github.com/RjDuan/AdvDrop
# 深度估计(Depth Estimation)
**NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
- Paper(Oral): https://arxiv.org/abs/2109.01129
- Code: https://github.com/weiyithu/NerfingMVS
## 单目深度估计
**MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments**
- Paper: https://arxiv.org/abs/2107.12429
- Code: None
**Towards Interpretable Deep Networks for Monocular Depth Estimation**
- Paper: https://arxiv.org/abs/2108.05312
- Code: https://github.com/youzunzhi/InterpretableMDE
**Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark**
- Paper: https://arxiv.org/abs/2108.03830
- Code: https://github.com/w2kun/RNW
**Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation**
- Paper: https://arxiv.org/abs/2108.07628
- Code: https://github.com/LINA-lln/ADDS-DepthNet
**StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation**
- Paper: https://arxiv.org/abs/2108.08574
- Code: https://github.com/SJTU-ViSYS/StructDepth
# 视线估计(Gaze Estimation)
**Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation**
- Paper: https://arxiv.org/abs/2107.13780
- Code: https://github.com/DreamtaleCore/PnP-GA
# 人群计数(Crowd Counting)
**Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework**
- Paper(Oral): https://arxiv.org/abs/2107.12746
- Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
**Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting**
- Paper: https://arxiv.org/abs/2107.12619
- Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
# 车道线检测(Lane-Detection)
**VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
- Paper: https://arxiv.org/abs/2108.08482
- Code: https://github.com/yujun0-0/MMA-Net
- Dataset: https://github.com/yujun0-0/MMA-Net
# 轨迹预测(Trajectory Prediction)
**Human Trajectory Prediction via Counterfactual Analysis**
- Paper: https://arxiv.org/abs/2107.14202
- Code: https://github.com/CHENGY12/CausalHTP
**Personalized Trajectory Prediction via Distribution Discrimination**
- Paper: https://arxiv.org/abs/2107.14204
- Code: https://github.com/CHENGY12/DisDis
**MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction**
- Paper: https://arxiv.org/abs/2108.09274
- Code: https://github.com/selflein/MG-GAN
**Social NCE: Contrastive Learning of Socially-aware Motion Representations**
- Paper: https://arxiv.org/abs/2012.11717
- Code: https://github.com/vita-epfl/social-nce
**Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving**
- Paper: https://arxiv.org/abs/2109.01510
- Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction
# 异常检测(Anomaly Detection)
**Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning**
- Paper: https://arxiv.org/abs/2101.10030
- Code: https://github.com/tianyu0207/RTFM
# 场景图生成(Scene Graph Generation)
**Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
# 图像编辑(Image Editing)
**Sketch Your Own GAN**
- Homepage: https://peterwang512.github.io/GANSketching/
- Paper: https://arxiv.org/abs/2108.02774
- 代码: https://github.com/peterwang512/GANSketching
# 图像合成(Image Synthesis)
**Image Synthesis via Semantic Composition**
- Homepage: https://shepnerd.github.io/scg/
- Paper: https://arxiv.org/abs/2109.07053
- Code: https://github.com/dvlab-research/SCGAN
# 图像检索(Image Retrieval)
**Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
- Paper: https://arxiv.org/abs/2109.02244
- Code: https://github.com/youngkyunJang/SPQ
# 三维重建(3D Reconstruction)
**Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
# 视频稳像(Video Stabilization)
**Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization**
- Paper: https://arxiv.org/abs/2108.09041
- 代码:https://github.com/Annbless/OVS_Stabilization
# 细粒度识别(Fine-Grained Recognition)
**Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
- Paper: https://arxiv.org/abs/2108.02399
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
- Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
# 风格迁移(Style Transfer)
**AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer**
- Paper: https://arxiv.org/abs/2108.03647
- Paddle Code:https://github.com/PaddlePaddle/PaddleGAN
- PyTorch Code:https://github.com/Huage001/AdaAttN
# 神经绘画(Neural Painting)
**Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
- Paper: https://arxiv.org/abs/2108.03798
- Code: https://github.com/wzmsltw/PaintTransformer
# 特征匹配(Feature Matching)
**Learning to Match Features with Seeded Graph Matching Network**
- Paper: https://arxiv.org/abs/2108.08771
- Code: https://github.com/vdvchen/SGMNet
# 语义对应(Semantic Correspondence)
**Multi-scale Matching Networks for Semantic Correspondence**
- Paper: https://arxiv.org/abs/2108.00211
- Code: https://github.com/wintersun661/MMNet
# 边缘检测(Edge Detection)
**Pixel Difference Networks for Efficient Edge Detection**
- Paper: https://arxiv.org/abs/2108.07009
- Code: https://github.com/zhuoinoulu/pidinet
**RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
- Paper: https://arxiv.org/abs/2108.00616
- Code : https://github.com/MengyangPu/RINDNet
- Dataset: https://github.com/MengyangPu/RINDNet
# 相机标定(Camera calibration)
**CTRL-C: Camera calibration TRansformer with Line-Classification**
- Paper: https://arxiv.org/abs/2109.02259
- Code: https://github.com/jwlee-vcl/CTRL-C
# 图像质量评估(Image Quality Assessment)
**MUSIQ: Multi-scale Image Quality Transformer**
- Paper: https://arxiv.org/abs/2108.05997
- Code: https://github.com/google-research/google-research/tree/master/musiq
# 度量学习(Metric Learning)
**Deep Relational Metric Learning**
- Paper: https://arxiv.org/abs/2108.10026
- Code: https://github.com/zbr17/DRML
**Towards Interpretable Deep Metric Learning with Structural Matching**
- Paper: https://arxiv.org/abs/2108.05889
- Code: https://github.com/wl-zhao/DIML
# Unsupervised Domain Adaptation
**Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation**
- Paper(Oral): https://arxiv.org/abs/2107.13467
- Code: None
# Video Rescaling
**Self-Conditioned Probabilistic Learning of Video Rescaling**
- Paper: https://arxiv.org/abs/2107.11639
- Code: None
# Hand-Object Interaction
**Learning a Contact Potential Field to Model the Hand-Object Interaction**
- Paper: https://arxiv.org/abs/2012.00924
- Code: https://lixiny.github.io/CPF
# Vision-and-Language Navigation
**Airbert: In-domain Pretraining for Vision-and-Language Navigation**
- Paper: https://arxiv.org/abs/2108.09105
- Code: https://airbert-vln.github.io/
- Dataset: https://airbert-vln.github.io/
# 数据集(Datasets)
**RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
- Paper: https://arxiv.org/abs/2108.00616
- Code : https://github.com/MengyangPu/RINDNet
- Dataset: https://github.com/MengyangPu/RINDNet
**Panoptic Narrative Grounding**
- Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
- Paper(Oral): https://arxiv.org/abs/2109.04988
- Code: https://github.com/BCV-Uniandes/PNG
- Dataset: https://github.com/BCV-Uniandes/PNG
**STRIVE: Scene Text Replacement In Videos**
- Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
- Paper: https://arxiv.org/abs/2109.02762
- Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
- Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
**Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
- Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
- Code: https://github.com/IanYeung/RealVSR
- Dataset: https://github.com/IanYeung/RealVSR
**Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes**
- Paper: https://arxiv.org/abs/2109.03585
- Code: None
**Dual-Camera Super-Resolution with Aligned Attention Modules**
- Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
- Paper: https://arxiv.org/abs/2109.01349
- Code: https://github.com/Tengfei-Wang/DualCameraSR
- Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
**DepthTrack: Unveiling the Power of RGBD Tracking**
- Paper: https://arxiv.org/abs/2108.13962
- Code: https://github.com/xiaozai/DeT
- Dataset: https://github.com/xiaozai/DeT
**Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
- Paper: https://arxiv.org/abs/2109.00512
- Code: https://github.com/facebookresearch/co3d
- Dataset: https://github.com/facebookresearch/co3d
**BioFors: A Large Biomedical Image Forensics Dataset**
- Paper: https://arxiv.org/abs/2108.12961
- Code: None
- Dataset: None
**Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
- Paper: https://arxiv.org/abs/2108.02399
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
- Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
**Airbert: In-domain Pretraining for Vision-and-Language Navigation**
- Paper: https://arxiv.org/abs/2108.09105
- Code: https://airbert-vln.github.io/
- Dataset: https://airbert-vln.github.io/
**Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
- Paper: http://arxiv.org/abs/2108.08202
- Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
- Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
**VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
- Paper: https://arxiv.org/abs/2108.08482
- Code: https://github.com/yujun0-0/MMA-Net
- Dataset: https://github.com/yujun0-0/MMA-Net
**XVFI: eXtreme Video Frame Interpolation**
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
**Personalized Image Semantic Segmentation**
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
**H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
- Homepage: https://crisalixsa.github.io/h3d-net/
- Paper: https://arxiv.org/abs/2107.12512
# 其他(Others)
**Panoptic Narrative Grounding**
- Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
- Paper(Oral): https://arxiv.org/abs/2109.04988
- Code: https://github.com/BCV-Uniandes/PNG
- Dataset: https://github.com/BCV-Uniandes/PNG
**NEAT: Neural Attention Fields for End-to-End Autonomous Driving**
- Paper: https://arxiv.org/abs/2109.04456
- https://github.com/autonomousvision/neat
**Keep CALM and Improve Visual Feature Attribution**
- Paper: https://arxiv.org/abs/2106.07861
- Code: https://github.com/naver-ai/calm
**YouRefIt: Embodied Reference Understanding with Language and Gesture**
- Paper: https://arxiv.org/abs/2109.03413
- Code: None
**Pri3D: Can 3D Priors Help 2D Representation Learning?**
- Paper: https://arxiv.org/abs/2104.11225
- Code: https://github.com/Sekunde/Pri3D
**Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain**
- Paper: https://arxiv.org/abs/2108.08487
- Code: https://github.com/iCGY96/APR
**Continual Learning for Image-Based Camera Localization**
- Paper: https://arxiv.org/abs/2108.09112
- Code: None
**Multi-Task Self-Training for Learning General Representations**
- Paper: https://arxiv.org/abs/2108.11353
- Code: None
**A Unified Objective for Novel Class Discovery**
- Homepage: https://ncd-uno.github.io/
- Paper(Oral): https://arxiv.org/abs/2108.08536
- Code: https://github.com/DonkeyShot21/UNO
**Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs**
- Paper: https://arxiv.org/abs/2108.07884
- Code: https://github.com/islamamirul/PermuteNet
**Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
- Paper: http://arxiv.org/abs/2108.08202
- Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
- Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
**Impact of Aliasing on Generalizatin in Deep Convolutional Networks**
- Paper: https://arxiv.org/abs/2108.03489
- Code: None
**Out-of-Core Surface Reconstruction via Global TGV Minimization**
- Paper: https://arxiv.org/abs/2107.14790
- Code: None
**Progressive Correspondence Pruning by Consensus Learning**
- Homepage: https://sailor-z.github.io/projects/CLNet.html
- Paper: https://arxiv.org/abs/2101.00591
- Code: https://github.com/sailor-z/CLNet
**Energy-Based Open-World Uncertainty Modeling for Confidence Calibration**
- Paper: https://arxiv.org/abs/2107.12628
- Code: None
**Generalized Shuffled Linear Regression**
- Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
- Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
**Discovering 3D Parts from Image Collections**
- Homepage: https://chhankyao.github.io/lpd/
- Paper: https://arxiv.org/abs/2107.13629
**Semi-Supervised Active Learning with Temporal Output Discrepancy**
- Paper: https://arxiv.org/abs/2107.14153
- Code: https://github.com/siyuhuang/TOD
**Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?**
Paper: https://arxiv.org/abs/2105.02498
Code: https://github.com/KingJamesSong/DifferentiableSVD
**Hand-Object Contact Consistency Reasoning for Human Grasps Generation**
- Homepage: https://hwjiang1510.github.io/GraspTTA/
- Paper(Oral): https://arxiv.org/abs/2104.03304
- Code: None
**Equivariant Imaging: Learning Beyond the Range Space**
- Paper(Oral): https://arxiv.org/abs/2103.14756
- Code: https://github.com/edongdongchen/EI
**Just Ask: Learning to Answer Questions from Millions of Narrated Videos**
- Paper(Oral): https://arxiv.org/abs/2012.00451
- Code: https://github.com/antoyang/just-ask