# gemconf-dev

**Repository Path**: pharmamind/gemconf-dev

## Basic Information

- **Project Name**: gemconf-dev
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 1
- **Created**: 2024-07-20
- **Last Updated**: 2026-01-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# GEM-Conf
A version modified from DMCG (https://github.com/DirectMolecularConfGen/DMCG). Compared to DMCG, we add some additional loss including angle loss, dihedral loss. Inspired by GEM architecture, we designed multiscalce message blocks of node update block, edge update block, angle update block and dihedral update block. The revisions aim to generate more reasonable conformations without forcefiled optimization.

## Requirements and Installation

```
conda install -c conda-forge graph-tool
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu118.html
pip install tensorboard rdkit
```

## Dataset 

## Training and inference

### Reproduce small-scale GEOM-Drugs
The first time you run this code, you should specify the data path with `--base-path`, and the code will binarize data into binarized format.
```shell
# Training. We place the unzipped data folder in /workspace/drugs_processed
DATA="./data/drugs"
CUDA_VISIBLE_DEVICES=1 python train_debug.py --model-ver v3 \
    --processed-path dataset_v3 \
    --lr-warmup --use-bn --use-adamw --no-3drot  \
    --aux-loss 0.2 --num-layers 6 --batch-size 128 \
    --vae-beta-min 0.0001 --vae-beta-max 0.03 --reuse-prior \
    --node-attn --data-split confgf --pred-pos-residual \
    --dataset-name drugs --remove-hs --shared-output  \
    --ang-lam 0.2 --bond-lam 0.2 --dihedral-lam 0.2 \
    --base-path ${DATA} --checkpoint-dir output/model/v2_drugs

# Inference.
CKPT="{your_checkpoint_path}"
python evaluate_debug.py --model-ver v3 --use-bn \
    --processed-path dataset_v3 \
    --num-layers 6 --eval-from $CKPT --workers 20 --batch-size 128 \
    --reuse-prior --node-attn --data-split confgf --dataset-name drugs --remove-hs \
    --shared-output --pred-pos-residual --sample-beta 1.2
```

### Reproduce small-scale GEOM-QM9

```shell
# Training. We place the unzipped data folder in /workspace/qm9
DATA="data/qm9"
CUDA_VISIBLE_DEVICES=4 python train_debug.py --model-ver v3 \
    --processed-path dataset_v3 \
    --lr-warmup --use-bn --use-adamw --no-3drot  \
    --aux-loss 0.2 --num-layers 6 --batch-size 128 \
    --vae-beta-min 0.0001 --vae-beta-max 0.03 --reuse-prior \
    --node-attn --data-split confgf --pred-pos-residual \
    --dataset-name qm9 --remove-hs --shared-output  \
    --ang-lam 0.2 --bond-lam 0.2 --dihedral-lam 0.2 \
    --base-path ${DATA} --checkpoint-dir output/model/ --lr 2e-4

# Inference. 
CKPT="{your_checkpoint_path}"
python evaluate_debug.py --model-ver v3 --use-bn \
    --processed-path dataset_v3 \
    --num-layers 6 --eval-from $CKPT --workers 20 --batch-size 128 \
    --reuse-prior --node-attn --data-split confgf --dataset-name qm9 --remove-hs \
    --shared-output --pred-pos-residual --sample-beta 1.2
```

### Reproduce Large-scale dataset

```shell
# Training. We place the unzipped data folder in /workspace/qm9
DATA="data/allsets"  # inclue drugs, qm9, quandb, crystal structure
CUDA_VISIBLE_DEVICES=4 python train_debug.py --model-ver v3_disk \
    --processed-path dataset_v3_disk \
    --lr-warmup --use-bn --use-adamw --no-3drot  \
    --aux-loss 0.2 --num-layers 6 --batch-size 128 \
    --vae-beta-min 0.0001 --vae-beta-max 0.03 --reuse-prior \
    --node-attn --data-split confgf --pred-pos-residual \
    --dataset-name qm9 --remove-hs --shared-output  \
    --ang-lam 0.2 --bond-lam 0.2 --dihedral-lam 0.2 \
    --base-path ${DATA} --checkpoint-dir output/model/ --lr 2e-4

# Inference. 
CKPT="{your_checkpoint_path}"
python evaluate_debug.py --model-ver v3 --use-bn \
    --processed-path dataset_v3 \
    --num-layers 6 --eval-from $CKPT --workers 20 --batch-size 128 \
    --reuse-prior --node-attn --data-split confgf --dataset-name qm9 --remove-hs \
    --shared-output --pred-pos-residual --sample-beta 1.2
```


## Citation
If you find this work helpful in your research, please use the following BibTex entry to cite the following papers.


```
@article{YANG2024100074,
title = {Conf-GEM: A geometric information-assisted direct conformation generation model},
journal = {Artificial Intelligence Chemistry},
volume = {2},
number = {2},
pages = {100074},
year = {2024},
issn = {2949-7477},
doi = {https://doi.org/10.1016/j.aichem.2024.100074},
url = {https://www.sciencedirect.com/science/article/pii/S2949747724000320},
author = {Zhijiang Yang and Youjun Xu and Li Pan and Tengxin Huang and Yunfan Wang and Junjie Ding and Liangliang Wang and Junhua Xiao},
}

@article{
zhu2022direct,
title={Direct Molecular Conformation Generation},
author={Jinhua Zhu and Yingce Xia and Chang Liu and Lijun Wu and Shufang Xie and Yusong Wang and Tong Wang and Tao Qin and Wengang Zhou and Houqiang Li and Haiguang Liu and Tie-Yan Liu},
journal={Transactions on Machine Learning Research},
year={2022},
note={}
}
```