# erase
**Repository Path**: jdlc105/erase
## Basic Information
- **Project Name**: erase
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-01
- **Last Updated**: 2024-09-01
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# [CIKM-2024] ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance
Official code for "ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance".
[Ling-Hao Chen](https://lhchen.top/)1,2, [Yuanshuo Zhang](https://youngsoul0731.github.io/)2,3, Taohua Huang3, [Liangcai Su](https://liangcaisu.github.io/)1, Zeyi Lin2,3, Xi Xiao1, [Xiaobo Xia](https://xiaoboxia.github.io/)4, [Tongliang Liu](https://tongliang-liu.github.io/)4
1Tsinghua University, 2SwanHub.co, 3Xidian University, 4The University of Sydney
## π Abstract
Deep learning has achieved remarkable success in graph-related tasks, yet this accomplishment heavily relies on large-scale high-quality annotated datasets. However, acquiring such datasets can be cost-prohibitive, leading to the practical use of labels obtained from economically efficient sources such as web searches and user tags. Unfortunately, these labels often come with noise, compromising the generalization performance of deep networks.
To tackle this challenge and enhance the robustness of deep learning models against label noise in graph-based tasks, we propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE). The core idea of ERASE is to learn representations with error tolerance by maximizing coding rate reduction. Particularly, we introduce a decoupled label propagation method for learning representations. Before training, noisy labels are pre-corrected through structural denoising. During training, ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience, which significantly improves the generalization performance in node classification. The proposed method allows us to more effectively withstand errors caused by mislabeled nodes, thereby strengthening the robustness of deep networks in handling noisy graph data. Extensive experimental results show that our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability. Codes are released at https://github.com/eraseai/erase.
## ποΈ Preparation
### π Data
Datasets for Cora, CiteSeer, PubMed, CoraFull, and [OGBn-arxiv](https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv) are integrated in [PyG](https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html#torch_geometric.datasets.Planetoid). The datasets will be downloaded by the codes automatically. And your `./data` directory structure is shown as follows after running [training codes](https://github.com/eraseai/erase?tab=readme-ov-file#-training).
```
data
βββPlanetoid
| βββ Cora
| | βββ processed
| | | βββ data.pt
| | | βββ pre_filter.pt
| | | βββ pre_transform.pt
| | βββ raw
| βββ CiteSeer
| | βββ processed
| | | βββ data.pt
| | | βββ pre_filter.pt
| | | βββ pre_transform.pt
| | βββ raw
| βββ PubMed
| βββ processed
| | βββ data.pt
| | βββ pre_filter.pt
| | βββ pre_transform.pt
| βββ raw
βββ CitationFull
| βββ cora
| βββ processed
| | βββ data.pt
| | βββ pre_filter.pt
| | βββ pre_transform.pt
| βββ raw
βββ Ogb
βββ ogbn-arxiv
βββ mapping
βββ processed
| βββ geometric_data_processed.pt
| βββ pre_filter.pt
| βββ pre_transform.pt
βββ raw
βββ split
```
### πΉοΈ Environment Setup
```
sh install.sh
```
## π§ Training
For Cora:
```
chmod +x run.sh
sh run.sh Cora
```
For CiteSeer:
```
chmod +x run.sh
sh run.sh CiteSeer
```
For PubMed
```
chmod +x run.sh
sh run.sh PubMed
```
For CoraFull
```
chmod +x run.sh
sh run.sh CoraFull
```
For OGBn-arxiv
```
chmod +x run.sh
sh run.sh ogbn-arxiv
```
After running the command, a directory named `_ ` is created in the `./exp_output` directory (`` in `{'Cora', 'CiteSeer', 'PubMed', 'CoraFull', 'ogbn-arxiv'}`, `` is equal to the number of directories in `./exp_output`). In `./exp_output/_/asymm_noise_ratio_0.1` stores the results of 0.1 asymmetric noise ratio. In this directory, log files are stored in `train_log.txt`, and model checkpoints are stored in `ckpt`.
## π½ Visualization
Before getting visualization results, please ensure that you have saved the pre-trained models. Hereby we show the example to get visualization results on Cora when the asymmetric noise ratio is 0.1.
```
python scripts/visualize.py --dataset Cora --resume exp_output/Cora/asymm_noise_ratio_0.1/ckpt/best_model.pth -- corrupt_type asymm --corrupt_ratio 0.1
```
### π§ Example of Visualization Results
Cosine Similarity Matrix Visualization
t-SNE Visualization
PCA Visualization
## π License
The codes are distributed under a non-commercial [LICENSE](https://github.com/eraseai/erase/blob/master/LICENSE). Note that our code depends on other libraries and datasets, each with its own respective licenses that must also be followed. For commercial usage, please contact [Ling-Hao Chen](https://lhchen.top).
## πΉ Acknowledgments
The author team would sincerely acknowledge [MCR2 authors](https://github.com/ryanchankh/mcr2/blob/master) and [G2R authors](https://github.com/ahxt/G2R) for providing significant reference and codebase. Portions of this code were adapted from these open-source projects.
## π€ Citation
If you find the code is useful in your research, please cite us:
```bash
@article{chen2023erase,
title={ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance},
author={Chen, Ling-Hao and Zhang, Yuanshuo and Huang, Taohua and Su, Liangcai and Lin, Zeyi and Xiao, Xi and Xia, Xiaobo and Liu, Tongliang},
journal={Arxiv 2312.08852},
year={2023}
}
```
## π Star History
If you have any questions, please contact thu [DOT] lhchen [AT] gmail [DOT] com.