# deepfrag

**Repository Path**: liuyin-91/deepfrag

## Basic Information

- **Project Name**: deepfrag
- **Description**: 复制的GitHub deepfrag
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: 1.0
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-07-16
- **Last Updated**: 2024-06-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


# DeepFrag

This repository contains code for machine learning based lead optimization.

# Examples

See [this Colab](https://colab.research.google.com/drive/1XWin26iDXqZ2ioGtwDRuO4iRomGVpdte) for an interactive example of how to use a pre-trained DeepFrag model to generate predictions.

# Overview

- `config`: fixed configuration information (eg. TRAIN/VAL/TEST partitions)
- `configurations`: benchmark model configurations (see [`configurations/README.md`](configurations/README.md))
- `data`: training/inference data (see [`data/README.md`](data/README.md))
- `leadopt`: main module code
    - `models`: pytorch architecture definitions
    - `data_util.py`: utility code for reading packed fragment/fingerprint data files
    - `grid_util.py`: GPU-accelerated grid generation code
    - `metrics.py`: pytorch implementations of several metrics
    - `model_conf.py`: contains code to configure and train models
    - `util.py`: utility code for rdkit/openbabel processing
- `scripts`: data processing scripts (see [`scripts/README.md`](scripts/README.md))
- `train.py`: CLI interface to launch training runs

# Dependencies

You can build a virtualenv with the requirements:

```sh
$ python3 -m venv leadopt_env
$ source ./leadopt_env/bin/activate
$ pip install -r requirements.txt
```

Note: `Cuda 10.1` is required during training

# Training

To train a model, you can use the `train.py` utility script. You can specify model parameters as command line arguments or load parameters from a configuration args.json file.

```bash
python train.py \
    --save_path=/path/to/model \
    --wandb_project=my_project \
    {model_type} \
    --model_arg1=x \
    --model_arg2=y \
    ...
```

or

```bash
python train.py \
    --save_path=/path/to/model \
    --wandb_project=my_project \
    --configuration=./configurations/args.json
```

`save_path` is a directory to save the best model. The directory will be created if it doesn't exist. If this is not provided, the model will not be saved.

`wandb_project` is an optional wandb project name. If provided, the run will be logged to wandb.

See below for available models and model-specific parameters:

# Leadopt Models

In this repository, trainable models are subclasses of `model_conf.LeadoptModel`. This class encapsulates model configuration arguments and pytorch models and enables saving and loading multi-component models.

```py
from leadopt.model_conf import LeadoptModel, MODELS

model = MODELS['voxel']({args...})
model.train(save_path='./mymodel')

...

model2 = LeadoptModel.load('./mymodel')
```

Internally, model arguments are configured by setting up an `argparse` parser and passing around a `dict` of configuration parameters in `self._args`.

## VoxelNet

```
--no_partitions     If set, disable the use of TRAIN/VAL partitions during
                    training.
-f FRAGMENTS, --fragments FRAGMENTS
                    Path to fragments file.
-fp FINGERPRINTS, --fingerprints FINGERPRINTS
                    Path to fingerprints file.
-lr LEARNING_RATE, --learning_rate LEARNING_RATE
--num_epochs NUM_EPOCHS
                    Number of epochs to train for.
--test_steps TEST_STEPS
                    Number of evaluation steps per epoch.
-b BATCH_SIZE, --batch_size BATCH_SIZE
--grid_width GRID_WIDTH
--grid_res GRID_RES
--fdist_min FDIST_MIN
                    Ignore fragments closer to the receptor than this
                    distance (Angstroms).
--fdist_max FDIST_MAX
                    Ignore fragments further from the receptor than this
                    distance (Angstroms).
--fmass_min FMASS_MIN
                    Ignore fragments smaller than this mass (Daltons).
--fmass_max FMASS_MAX
                    Ignore fragments larger than this mass (Daltons).
--ignore_receptor
--ignore_parent
-rec_typer {single,single_h,simple,simple_h,desc,desc_h}
-lig_typer {single,single_h,simple,simple_h,desc,desc_h}
-rec_channels REC_CHANNELS
-lig_channels LIG_CHANNELS
--in_channels IN_CHANNELS
--output_size OUTPUT_SIZE
--pad
--blocks BLOCKS [BLOCKS ...]
--fc FC [FC ...]
--use_all_labels
--dist_fn {mse,bce,cos,tanimoto}
--loss {direct,support_v1}
```