# xmcgan_image_generation **Repository Path**: mirrors_google-research/xmcgan_image_generation ## Basic Information - **Project Name**: xmcgan_image_generation - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-25 - **Last Updated**: 2026-04-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Cross-Modal Contrastive Learning for Text-to-Image Generation This repository hosts the open source [JAX](https://github.com/google/jax) implementation of [XMC-GAN](https://arxiv.org/abs/2101.04702). ## Setup instructions ### Environment Set up virtualenv, and install required libraries: ``` virtualenv venv source venv/bin/activate ``` Add the XMC-GAN library to PYTHONPATH: ``` export PYTHONPATH=$PYTHONPATH:/home/path/to/xmcgan/root/ ``` ### JAX Installation Note: Please follow the [official JAX instructions](https://github.com/google/jax#pip-installation) for installing a GPU compatible version of JAX. ### Other Dependencies After installing JAX, install the remaining dependencies with: ``` pip install -r requirements.txt ``` ### Preprocess COCO-2014 To create the training and eval data, first start a directory. By default, the training scripts expect to save results in `data/` in the base directory. ``` mkdir data/ ``` The TFRecords required for training and validation on COCO-2014 can be created by running a preprocessing script over the [TFDS coco_captions dataset](https://www.tensorflow.org/datasets/catalog/coco_captions): ``` python preprocess_data.py ``` This may take a while to complete, as it runs a pretrained BERT model over the captions and stores the embeddings. With a GPU, it runs in about 2.5 hours for train, and 1 hour for validation. Once it is done, the train and validation tfrecords files will be saved in the `data/` directory. The train files require around 58G of disk space, and the validation requires 29G. Note: If you run into an error related to TensorFlow gfile, one workaround is to edit `site-packages/bert/tokenization.py` and change `tf.gfile.GFile` to `tf.io.gfile.GFile`. For more details, refer to the following [link](https://github.com/google-research/bert/issues/1133#issuecomment-703818257). If you run into a `tensorflow.python.framework.errors_impl.ResourceExhaustedError` about having too many open files, you may have to increase the machine's open file limits. To do so, open the limit configuration file for editing: ``` vi /etc/security/limits.conf ``` and append the following lines to the end of the file: ``` * hard nofile 500000 * soft nofile 500000 root hard nofile 500000 root soft nofile 500000 ``` You may have to adjust the limit values depending on your machine. You will need to logout and login to your machine for these values to take effect. ### Download Pretrained ResNet To train XMC-GAN, we need a network pretrained on ImageNet to extract features. For our purposes, we train a ResNet-50 network for this. To download the weights, run: ``` gsutil cp gs://gresearch/xmcgan/resnet_pretrained.npy data/ ``` If you would like to pretrain your own network on ImageNet, please refer to the [official Flax ImageNet example](https://github.com/google/flax/tree/master/examples/imagenet). ### Training Start a training run, by first editing `train.sh` to specify an appropriate work directory. By default, the script assumes that 8 GPUs are available, and runs training on the first 7 GPUs, while `test.sh` assumes testing will run on the last GPU. After configuring the training job, start an experiment by running it on bash: ``` mkdir exp bash train.sh exp_name &> train.txt ``` Checkpoints and Tensorboard logs will be saved in `/path/to/exp/exp_name`. By default, the configs/coco_xmc.py config is used, which runs an experiment for 128px images. This is able to accommodate a batch size of 8 on each GPU, and achieves an FID of around 10.5 - 11.0 with the EMA weights. To reproduce the full results on 256px images in our paper, the full model needs to be run using a 32-core Pod slice of [Google Cloud TPU v3](https://cloud.google.com/tpu) devices. ### Evaluation To run an evaluation job, update `test.sh` with the correct settings used in the training script. Then, execute ``` bash test.sh exp_name &> eval.txt ``` to start an evaluation job. All checkpoints in `workdir` will be evaluated for FID and Inception Score. If you can spare the GPUs, you can also run `train.sh` and `test.sh` in parallel, which will continuously evaluate new checkpoints saved into the work directory. Scores will be written to Tensorboard and output to eval.txt. ### Tensorboard To start a Tensorboard for monitoring training progress, run: ``` tensorboard --logdir /path/to/exp/exp_name ``` ## Citation If you find this work useful, please consider citing: ``` @inproceedings{zhang2021cross, title={Cross-Modal Contrastive Learning for Text-to-Image Generation}, author={Zhang, Han and Koh, Jing Yu and Baldridge, Jason and Lee, Honglak and Yang, Yinfei}, journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2021} } ``` ## Disclaimer Not an official Google product.