# detcon

**Repository Path**: mirrors_deepmind/detcon

## Basic Information

- **Project Name**: detcon
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-10-22
- **Last Updated**: 2025-10-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Code for DetCon

This repository contains code for the ICCV 2021 paper
["Efficient Visual Pretraining with Contrastive Detection"](https://arxiv.org/abs/2103.10957)
by Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord,
Oriol Vinyals, João Carreira.

This repository includes sample code to run pretraining with DetCon. In
particular, we're providing a sample script for generating the Felzenzwalb
segmentations for ImageNet images (using `skimage`) and a pre-training
experiment setup (dataloader, augmentation pipeline, optimization config, and
loss definition) that describes the DetCon-B(YOL) model described in the paper.
The original code uses a large grid of TPUs and internal infrastructure for
training, but we've extracted the key DetCon loss+experiment in this folder so
that external groups can have a reference should they want to explore
a similar approaches.

This repository builds heavily from the
[BYOL open source release](https://github.com/deepmind/deepmind-research/tree/master/byol),
so speed-up tricks and features in that setup may likely translate to the code
here.

## Running this code

Running `./setup.sh` will create and activate a virtualenv and install all
necessary dependencies. To enter the environment after running `setup.sh`, run
`source /tmp/detcon_venv/bin/activate`.

Running `bash test.sh` will run a single training step on a mock
image/Felzenszwalb mask as a simple validation that all dependencies are set up
correctly and the DetCon pre-training can run smoothly. On our 16-core machine,
running on CPU, we find this takes around 3-4 minutes.

A TFRecord dataset containing each ImageNet image, label, and its corresponding
Felzenszwalb segmentation/mask can then be generated using the
`generate_fh_masks` Python script. You will first have to download two pieces of
ImageNet metadata into the same directory as the script:

`wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_metadata.txt`
`wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_lsvrc_2015_synsets.txt`

And to run the multi-threaded mask generation script:

```
python generate_fh_masks_for_imagenet.py -- \
--train_directory=imagenet-train \
--output_directory=imagenet-train-fh
```

This single-machine, multi-threaded version of the mask generation script takes
2-3 days on a 16-core CPU machine to complete CPU-based processing of the
ImageNet training and validation set. The script assumes the same ImageNet
directory structure as
[github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py](https://github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py)
(more details in the link).

You can then run the main training loop and execute multiple DetCon-B training
steps by running from the parent directory the command:

```bash
python -m detcon.main_loop \
  --dataset_directory='/tmp/imagenet-fh-train' \
  --pretrain_epochs=100`
```

Note that you will need to update the `dataset_directory` flag, to point to the
generated Felzenzwalb/image TFRecord dataset previously generated. Additionally,
to use accelerators, users will need to install the correct version of jaxlib
with CUDA support.

## Pre-trained checkpoints

For convenience, we're providing an ImageNet-pretrained [ResNet-50](https://storage.googleapis.com/dm-detcon/resnet50_detcon_b_imagenet_1k.npy) and [ResNet-200](https://storage.googleapis.com/dm-detcon/resnet200_detcon_b_imagenet_1k.npy) pre-trained using DetCon. We also provide a number of ResNet-50 COCO-pretrained checkpoints available in the same [GCS bucket](https://storage.googleapis.com/dm-detcon/). A Colab demonstrating how to load the model weights and run a forward pass on the loaded model on a sample image is linked [here](https://colab.research.google.com/drive/1Gd3sxOJXENo74iPz5TlywEcsfXX1gB8W?usp=sharing).

## Citing this work

If you use this code in your work, please consider referencing our work:

```
@article{henaff2021efficient,
  title={{Efficient Visual Pretraining with Contrastive Detection}},
  author={H{\'e}naff, Olivier J and Koppula, Skanda and Alayrac, Jean-Baptiste and Oord, Aaron van den and Vinyals, Oriol and Carreira, Jo{\~a}o},
  journal={International Conference on Computer Vision},
  year={2021}
}
```

## Disclaimer

This is not an officially supported Google product.