# crest **Repository Path**: mirrors_google-research/crest ## Basic Information - **Project Name**: crest - **Description**: Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-12 - **Last Updated**: 2026-02-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CReST in Tensorflow 2 Code for the paper: "[CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning](https://arxiv.org/abs/2102.09559)" by Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille and Fan Yang. - **This is not an officially supported Google product.** ## Install dependencies ```bash sudo apt install python3-dev python3-virtualenv python3-tk imagemagick virtualenv -p python3 --system-site-packages env3 . env3/bin/activate pip install -r requirements.txt ``` - The code has been tested on Ubuntu 18.04 with CUDA 10.2. ## Environment setting ```bash . env3/bin/activate export ML_DATA=/path/to/your/data export ML_DIR=/path/to/your/code export RESULT=/path/to/your/result export PYTHONPATH=$PYTHONPATH:$ML_DIR ``` ## Datasets Download or generate the datasets as follows: - CIFAR10 and CIFAR100: Follow the [steps](https://github.com/google-research/fixmatch/blob/master/README.md#install-datasets) to download and generate balanced CIFAR10 and CIFAR100 datasets. Put it under `${ML_DATA}/cifar`, for example, `${ML_DATA}/cifar/cifar10-test.tfrecord`. - Long-tailed CIFAR10 and CIFAR100: Follow the [steps](https://github.com/richardaecn/class-balanced-loss#datasets) to download the datasets prepared by Cui et al. Put it under `${ML_DATA}/cifar-lt`, for example, `${ML_DATA}/cifar-lt/cifar-10-data-im-0.1`. ## Running experiment on Long-tailed CIFAR10, CIFAR100 Run [MixMatch](mixmatch.py) ([paper](https://arxiv.org/abs/1905.02249)) and [FixMatch](fixmatch.py) ([paper](https://arxiv.org/abs/2001.07685)): - Specify method to run via `--method`. It can be `fixmatch` or `mixmatch`. - Specify dataset via `--dataset`. It can be `cifar10lt` or `cifar100lt`. - Specify the class imbalanced ratio, i.e., the number of training samples from the most minority class over that from the most majority class, via `--class_im_ratio`. - Specify the percentage of labeled data via `--percent_labeled`. - Specify the number of generations for self-training via `--num_generation`. - Specify whether to use distribution alignment via `--do_distalign`. - Specify the initial distribution alignment temperature via `--dalign_t`. - Specify how distribution alignment is applied via `--how_dalign`. It can be `constant` or `adaptive`. ```bash python -m train_and_eval_loop \ --model_dir=/tmp/model \ --method=fixmatch \ --dataset=cifar10lt \ --input_shape=32,32,3 \ --class_im_ratio=0.01 \ --percent_labeled=0.1 \ --fold=1 \ --num_epoch=64 \ --num_generation=6 \ --sched_level=1 \ --dalign_t=0.5 \ --how_dalign=adaptive \ --do_distalign=True ``` ## Results The code reproduces main results of the paper. For all settings and methods, we run experiments on 5 different folds and report the mean and standard deviations. Note that the numbers may not exactly match those from the papers as there are extra randomness coming from the training. **Results on Long-tailed CIFAR10 with 10% labeled data (Table 1 in the paper).** | | gamma=50 | gamma=100 | gamma=200 | |----------|-------------|-------------|-------------| | FixMatch | 79.4 (0.98) | 66.2 (0.83) | 59.9 (0.44) | | CReST | 83.7 (0.40) | 75.4 (1.62) | 63.9 (0.67) | | CReST+ | 84.5 (0.41) | 77.7 (1.22) | 67.5 (1.36) | ## Training with Multiple GPUs - Simply set `CUDA_VISIBLE_DEVICES=0,1,2,3` or any number of GPUs. - Make sure that `batch size` is divisible by the number of GPUs. ## Augmentation - One can concatenate different augmentation shortkeys to compose an augmentation sequence. - `d`: default augmentation, resize and shift. - `h`: horizontal flip. - `ra`: random augment with all augmentation ops. - `rc`: random augment with color augmentation ops only. - `rg`: random augment with geometric augmentation ops only. - `c`: cutout. - For example, `dhrac` applies shift, flip, random augment with all ops, followed by cutout. ## Citing this work ```bibtex @article{wei2021crest, title={CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning}, author={Chen Wei and Kihyuk Sohn and Clayton Mellina and Alan Yuille and Fan Yang}, journal={arXiv preprint arXiv:2102.09559}, year={2021}, } ```