# DeepPurpose **Repository Path**: mirrors_kexinhuang12345/DeepPurpose ## Basic Information - **Project Name**: DeepPurpose - **Description**: A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics) - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2022-01-05 - **Last Updated**: 2025-10-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

A Deep Learning Library for Compound and Protein Modeling
DTI, Drug Property, PPI, DDI, Protein Function Prediction
Applications in Drug Repurposing, Virtual Screening, QSAR, Side Effect Prediction and More
---
[](https://pypi.org/project/DeepPurpose/)
[](https://pepy.tech/project/deeppurpose)
[](https://pepy.tech/project/deeppurpose)
[](https://github.com/kexinhuang12345/DeepPurpose/stargazers)
[](https://github.com/kexinhuang12345/DeepPurpose/network/members)
This repository hosts DeepPurpose, a Deep Learning Based Molecular Modeling and Prediction Toolkit on Drug-Target Interaction Prediction, Compound Property Prediction, Protein-Protein Interaction Prediction, and Protein Function prediction (using PyTorch). We focus on DTI and its applications in Drug Repurposing and Virtual Screening, but support various other molecular encoding tasks. It allows very easy usage (several lines of codes only) to facilitate deep learning for life science research.
### News!
- [05/21] `0.1.2` Support 5 new graph neural network based models for compound encoding (DGL_GCN, DGL_NeuralFP, DGL_GIN_AttrMasking, DGL_GIN_ContextPred, DGL_AttentiveFP), implemented using [DGL Life Science](https://github.com/awslabs/dgl-lifesci)! An example is provided [here](DEMO/GNN_Models_Release_Example.ipynb)!
- [12/20] DeepPurpose is now supported by TDC data loader, which contains a large collection of ML for therapeutics datasets, including many drug property, DTI datasets. Here is a [tutorial](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_104_ML_Model_DeepPurpose.ipynb)!
- [12/20] DeepPurpose can now be installed via `pip`!
- [11/20] DeepPurpose is published in [Bioinformatics](https://doi.org/10.1093/bioinformatics/btaa1005)!
- [11/20] Added 5 more pretrained models on BindingDB IC50 Units (around 1Million data points).
- [10/20] Google Colab Installation Instructions are provided [here](https://colab.research.google.com/drive/1eF60BwGX6PnB91vpx5dRxFa72e6-MYuZ?usp=sharing). Thanks to @hima111997 !
- [10/20] Using DeepPurpose, we made a humans-in-the-loop molecular design web UI interface, check it out! \[[Website](http://deeppurpose.sunlab.org/), [paper](https://arxiv.org/abs/2010.03951)\]
- [09/20] DeepPurpose has now supported three more tasks: DDI, PPI and Protein Function Prediction! You can simply call `from DeepPurpose import DDI/PPI/ProteinPred` to use, checkout examples below!
- [07/20] A simple web UI for DTI prediction can be created under 10 lines using [Gradio](https://github.com/gradio-app/gradio)! A demo is provided [here](https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/web_ui_gradio.ipynb).
- [07/20] A [blog](https://towardsdatascience.com/drug-discovery-with-deep-learning-under-10-lines-of-codes-742ee306732a) is posted on the Towards Data Science Medium column, check this out!
- [07/20] Two tutorials are online to go through DeepPurpose's framework to do drug-target interaction prediction and drug property prediction ([DTI](Tutorial_1_DTI_Prediction.ipynb), [Drug Property](Tutorial_2_Drug_Property_Pred_Assay_Data.ipynb)).
- [05/20] Support drug property prediction for screening data that does not have target proteins such as bacteria! An example using RDKit2D with DNN for training and repurposing for pseudomonas aeruginosa (MIT AI Cures's [open task](https://www.aicures.mit.edu/data)) is provided as a [demo](DEMO/Drug_Property_Prediction_Bacterial_Activity-RDKit2D_MIT_AiCures.ipynb).
- [05/20] Now supports hyperparameter tuning via Bayesian Optimization through the [Ax platform](https://ax.dev/)! A demo is provided in [here](DEMO/Drug_Property_Pred-Ax-Hyperparam-Tune.ipynb).
### Features
- 15+ powerful encodings for drugs and proteins, ranging from deep neural network on classic cheminformatics fingerprints, CNN, transformers to message passing graph neural network, with 50+ combined models! Most of the combinations of the encodings are not yet in existing works. All of these under 10 lines but with lots of flexibility! Switching encoding is as simple as changing the encoding names!
- Realistic and user-friendly design:
- support DTI, DDI, PPI, molecular property prediction, protein function predictions!
- automatic identification to do drug target binding affinity (regression) or drug target interaction prediction (binary) task.
- support cold target, cold drug settings for robust model evaluations and support single-target high throughput sequencing assay data setup.
- many dataset loading/downloading/unzipping scripts to ease the tedious preprocessing, including antiviral, COVID19 targets, BindingDB, DAVIS, KIBA, ...
- many pretrained checkpoints.
- easy monitoring of training process with detailed training metrics output such as test set figures (AUCs) and tables, also support early stopping.
- detailed output records such as rank list for repurposing result.
- various evaluation metrics: ROC-AUC, PR-AUC, F1 for binary task, MSE, R-squared, Concordance Index for regression task.
- label unit conversion for skewed label distribution such as Kd.
- time reference for computational expensive encoding.
- PyTorch based, support CPU, GPU, Multi-GPUs.
*NOTE: We are actively looking for constructive advices/user feedbacks/experiences on using DeepPurpose! Please open an issue or [contact us](mailto:kexinhuang@hsph.harvard.edu).*
## Cite Us
If you found this package useful, please cite [our paper](https://doi.org/10.1093/bioinformatics/btaa1005):
```
@article{huang2020deeppurpose,
title={DeepPurpose: A Deep Learning Library for Drug-Target Interaction Prediction},
author={Huang, Kexin and Fu, Tianfan and Glass, Lucas M and Zitnik, Marinka and Xiao, Cao and Sun, Jimeng},
journal={Bioinformatics},
year={2020}
}
```
## Installation
Try it on [Binder](https://mybinder.org)! Binder is a cloud Jupyter Notebook interface that will install our environment dependency for you.
[](https://mybinder.org/v2/gh/kexinhuang12345/DeepPurpose/master)
[Video tutorial](https://www.youtube.com/watch?v=ghUyZknxq5o) to install Binder.
We recommend to install it locally since Binder needs to be refreshed every time launching. To install locally, we recommend to install from `pip`:
### `pip`
```bash
conda create -n DeepPurpose python=3.6
conda activate DeepPurpose
conda install -c conda-forge notebook
pip install git+https://github.com/bp-kelley/descriptastorus
pip install DeepPurpose
```
### Build from Source
First time:
```bash
git clone https://github.com/kexinhuang12345/DeepPurpose.git ## Download code repository
cd DeepPurpose ## Change directory to DeepPurpose
conda env create -f environment.yml ## Build virtual environment with all packages installed using conda
conda activate DeepPurpose ## Activate conda environment (use "source activate DeepPurpose" for anaconda 4.4 or earlier)
jupyter notebook ## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate ## when done, exit conda environment
```
In the future:
```bash
cd DeepPurpose ## Change directory to DeepPurpose
conda activate DeepPurpose ## Activate conda environment
jupyter notebook ## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate ## when done, exit conda environment
```
[Video tutorial](https://youtu.be/bqinehjnWvE) to install locally from source.
## Example
### Case Study 1(a): A Framework for Drug Target Interaction Prediction, with less than 10 lines of codes.
In addition to the DTI prediction, we also provide repurpose and virtual screening functions to rapidly generation predictions.
Click here for the code!
```python
from DeepPurpose import DTI as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
SAVE_PATH='./saved_path'
import os
if not os.path.exists(SAVE_PATH):
os.makedirs(SAVE_PATH)
# Load Data, an array of SMILES for drug, an array of Amino Acid Sequence for Target and an array of binding values/0-1 label.
# e.g. ['Cc1ccc(CNS(=O)(=O)c2ccc(s2)S(N)(=O)=O)cc1', ...], ['MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTH...', ...], [0.46, 0.49, ...]
# In this example, BindingDB with Kd binding score is used.
X_drug, X_target, y = process_BindingDB(download_BindingDB(SAVE_PATH),
y = 'Kd',
binary = False,
convert_to_log = True)
# Type in the encoding names for drug/protein.
drug_encoding, target_encoding = 'CNN', 'Transformer'
# Data processing, here we select cold protein split setup.
train, val, test = data_process(X_drug, X_target, y,
drug_encoding, target_encoding,
split_method='cold_protein',
frac=[0.7,0.1,0.2])
# Generate new model using default parameters; also allow model tuning via input parameters.
config = generate_config(drug_encoding, target_encoding, transformer_n_layer_target = 8)
net = models.model_initialize(**config)
# Train the new model.
# Detailed output including a tidy table storing validation loss, metrics, AUC curves figures and etc. are stored in the ./result folder.
net.train(train, val, test)
# or simply load pretrained model from a model directory path or reproduced model name such as DeepDTA
net = models.model_pretrained(MODEL_PATH_DIR or MODEL_NAME)
# Repurpose using the trained model or pre-trained model
# In this example, loading repurposing dataset using Broad Repurposing Hub and SARS-CoV 3CL Protease Target.
X_repurpose, drug_name, drug_cid = load_broad_repurposing_hub(SAVE_PATH)
target, target_name = load_SARS_CoV_Protease_3CL()
_ = models.repurpose(X_repurpose, target, net, drug_name, target_name)
# Virtual screening using the trained model or pre-trained model
X_repurpose, drug_name, target, target_name = ['CCCCCCCOc1cccc(c1)C([O-])=O', ...], ['16007391', ...], ['MLARRKPVLPALTINPTIAEGPSPTSEGASEANLVDLQKKLEEL...', ...], ['P36896', 'P00374']
_ = models.virtual_screening(X_repurpose, target, net, drug_name, target_name)
```
Click here for the code!
```python
from DeepPurpose import CompoundPred as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
SAVE_PATH='./saved_path'
import os
if not os.path.exists(SAVE_PATH):
os.makedirs(SAVE_PATH)
# load AID1706 Assay Data
X_drugs, _, y = load_AID1706_SARS_CoV_3CL()
drug_encoding = 'rdkit_2d_normalized'
train, val, test = data_process(X_drug = X_drugs, y = y,
drug_encoding = drug_encoding,
split_method='random',
random_seed = 1)
config = generate_config(drug_encoding = drug_encoding,
cls_hidden_dims = [512],
train_epoch = 20,
LR = 0.001,
batch_size = 128,
)
model = models.model_initialize(**config)
model.train(train, val, test)
X_repurpose, drug_name, drug_cid = load_broad_repurposing_hub(SAVE_PATH)
_ = models.repurpose(X_repurpose, model, drug_name)
```
Click here for the code!
```python
from DeepPurpose import DDI as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
# load DB Binary Data
X_drugs, X_drugs_, y = read_file_training_dataset_drug_drug_pairs("toy_data/ddi.txt")
drug_encoding = 'rdkit_2d_normalized'
train, val, test = data_process(X_drug = X_drugs, X_drug_ = X_drugs_, y = y,
drug_encoding = drug_encoding,
split_method='random',
random_seed = 1)
config = generate_config(drug_encoding = drug_encoding,
cls_hidden_dims = [512],
train_epoch = 20,
LR = 0.001,
batch_size = 128,
)
model = models.model_initialize(**config)
model.train(train, val, test)
```
Click here for the code!
```python
from DeepPurpose import PPI as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
# load DB Binary Data
X_targets, X_targets_, y = read_file_training_dataset_protein_protein_pairs("toy_data/ppi.txt")
target_encoding = 'CNN'
train, val, test = data_process(X_target = X_targets, X_target_ = X_targets_, y = y,
target_encoding = target_encoding,
split_method='random',
random_seed = 1)
config = generate_config(target_encoding = target_encoding,
cls_hidden_dims = [512],
train_epoch = 20,
LR = 0.001,
batch_size = 128,
)
model = models.model_initialize(**config)
model.train(train, val, test)
```
Click here for the code!
```python
from DeepPurpose import ProteinPred as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
# load DB Binary Data
X_targets, y = read_file_protein_function()
target_encoding = 'CNN'
train, val, test = data_process(X_target = X_targets, y = y,
target_encoding = target_encoding,
split_method='random',
random_seed = 1)
config = generate_config(target_encoding = target_encoding,
cls_hidden_dims = [512],
train_epoch = 20,
LR = 0.001,
batch_size = 128,
)
model = models.model_initialize(**config)
model.train(train, val, test)
```
Click here for the code!
```python
from DeepPurpose import oneliner
from DeepPurpose.dataset import *
oneliner.repurpose(*load_SARS_CoV2_Protease_3CL(), *load_antiviral_drugs(no_cid = True))
```
```
----output----
Drug Repurposing Result for SARS-CoV2 3CL Protease
+------+----------------------+------------------------+---------------+
| Rank | Drug Name | Target Name | Binding Score |
+------+----------------------+------------------------+---------------+
| 1 | Sofosbuvir | SARS-CoV2 3CL Protease | 190.25 |
| 2 | Daclatasvir | SARS-CoV2 3CL Protease | 214.58 |
| 3 | Vicriviroc | SARS-CoV2 3CL Protease | 315.70 |
| 4 | Simeprevir | SARS-CoV2 3CL Protease | 396.53 |
| 5 | Etravirine | SARS-CoV2 3CL Protease | 409.34 |
| 6 | Amantadine | SARS-CoV2 3CL Protease | 419.76 |
| 7 | Letermovir | SARS-CoV2 3CL Protease | 460.28 |
| 8 | Rilpivirine | SARS-CoV2 3CL Protease | 470.79 |
| 9 | Darunavir | SARS-CoV2 3CL Protease | 472.24 |
| 10 | Lopinavir | SARS-CoV2 3CL Protease | 473.01 |
| 11 | Maraviroc | SARS-CoV2 3CL Protease | 474.86 |
| 12 | Fosamprenavir | SARS-CoV2 3CL Protease | 487.45 |
| 13 | Ritonavir | SARS-CoV2 3CL Protease | 492.19 |
....
```
Click here for the code!
```python
from DeepPurpose import oneliner
from DeepPurpose.dataset import *
oneliner.repurpose(*load_SARS_CoV_Protease_3CL(), *load_antiviral_drugs(no_cid = True), *load_AID1706_SARS_CoV_3CL(), \
split='HTS', convert_y = False, frac=[0.8,0.1,0.1], pretrained = False, agg = 'max_effect')
```
```
----output----
Drug Repurposing Result for SARS-CoV 3CL Protease
+------+----------------------+-----------------------+-------------+-------------+
| Rank | Drug Name | Target Name | Interaction | Probability |
+------+----------------------+-----------------------+-------------+-------------+
| 1 | Remdesivir | SARS-CoV 3CL Protease | YES | 0.99 |
| 2 | Efavirenz | SARS-CoV 3CL Protease | YES | 0.98 |
| 3 | Vicriviroc | SARS-CoV 3CL Protease | YES | 0.98 |
| 4 | Tipranavir | SARS-CoV 3CL Protease | YES | 0.96 |
| 5 | Methisazone | SARS-CoV 3CL Protease | YES | 0.94 |
| 6 | Letermovir | SARS-CoV 3CL Protease | YES | 0.88 |
| 7 | Idoxuridine | SARS-CoV 3CL Protease | YES | 0.77 |
| 8 | Loviride | SARS-CoV 3CL Protease | YES | 0.76 |
| 9 | Baloxavir | SARS-CoV 3CL Protease | YES | 0.74 |
| 10 | Ibacitabine | SARS-CoV 3CL Protease | YES | 0.70 |
| 11 | Taribavirin | SARS-CoV 3CL Protease | YES | 0.65 |
| 12 | Indinavir | SARS-CoV 3CL Protease | YES | 0.62 |
| 13 | Podophyllotoxin | SARS-CoV 3CL Protease | YES | 0.60 |
....
```
Click here for the format expected!
For drug target pairs:
```
Drug1_SMILES Target1_Seq Score/Label
Drug2_SMILES Target2_Seq Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_drug_target_pairs(PATH)
```
For bioassay training data:
```
Target_Seq
Drug1_SMILES Score/Label
Drug2_SMILES Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_bioassay(PATH)
```
For drug property prediction training data:
```
Drug1_SMILES Score/Label
Drug2_SMILES Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, y = dataset.read_file_compound_property(PATH)
```
For protein function prediction training data:
```
Target1_Seq Score/Label
Target2_Seq Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, y = dataset.read_file_protein_function(PATH)
```
For drug-drug pairs:
```
Drug1_SMILES Drug1_SMILES_ Score/Label
Drug2_SMILES Drug2_SMILES_ Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_drug_drug_pairs(PATH)
```
For protein-protein pairs:
```
Target1_Seq Target1_Seq_ Score/Label
Target2_Seq Target2_Seq_ Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_protein_protein_pairs(PATH)
```
For drug repurposing library:
```
Drug1_Name Drug1_SMILES
Drug2_Name Drug2_SMILES
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_drug_names = dataset.read_file_repurposing_library(PATH)
```
For target sequence to be repurposed:
```
Target_Name Target_seq
```
Then, use
```python
from DeepPurpose import dataset
Target_seq, Target_name = dataset.read_file_target_sequence(PATH)
```
For virtual screening library:
```
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, X_drug_names, X_target_names = dataset.read_file_virtual_screening_drug_target_pairs(PATH)
```
Click here for the models supported!
|Model Name|
|------|
|CNN_CNN_BindingDB_IC50|
|Morgan_CNN_BindingDB_IC50|
|Morgan_AAC_BindingDB_IC50|
|MPNN_CNN_BindingDB_IC50|
|Daylight_AAC_BindingDB_IC50|
|CNN_CNN_DAVIS|
|CNN_CNN_BindingDB|
|Morgan_CNN_BindingDB|
|Morgan_CNN_KIBA|
|Morgan_CNN_DAVIS|
|MPNN_CNN_BindingDB|
|MPNN_CNN_KIBA|
|MPNN_CNN_DAVIS|
|Transformer_CNN_BindingDB|
|Daylight_AAC_DAVIS|
|Daylight_AAC_KIBA|
|Daylight_AAC_BindingDB|
|Morgan_AAC_BindingDB|
|Morgan_AAC_KIBA|
|Morgan_AAC_DAVIS|