# DeepPurpose **Repository Path**: greitzmann/DeepPurpose ## Basic Information - **Project Name**: DeepPurpose - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-16 - **Last Updated**: 2021-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
A Drug-Target Interaction Prediction Toolkit with state-of-the-art Deep Learning Methods
and its Applications in Drug Repurposing and Virtual Screening
---
This repository hosts DeepPurpose, a Deep Learning Based Drug-Target Interaction Prediction Toolkit with Applications in Drug Repurposing and Virtual Screening Toolkit (using PyTorch). It allows very easy usage (only one line of code!) for non-computational domain researchers to be able to obtain a list of potential drugs using deep learning while facilitating deep learning method research in this topic by providing a flexible framework (less than 10 lines of codes!) and baselines.
### News!
- [07/20] A [blog](https://towardsdatascience.com/drug-discovery-with-deep-learning-under-10-lines-of-codes-742ee306732a) is posted on the Towards Data Science Medium column, check this out!
- [07/20] Two tutorials are online to go through DeepPurpose's framework to do drug-target interaction prediction and drug property prediction ([DTI](Tutorial_1_DTI_Prediction.ipynb), [Drug Property](Tutorial_2_Drug_Property_Pred_Assay_Data.ipynb)).
- [05/20] Support drug property prediction for screening data that does not have target proteins such as bacteria! An example using RDKit2D with DNN for training and repurposing for pseudomonas aeruginosa (MIT AI Cures's [open task](https://www.aicures.mit.edu/data)) is provided as a [demo](DEMO/Drug_Property_Prediction_Bacterial_Activity-RDKit2D_MIT_AiCures.ipynb).
- [05/20] Now supports hyperparameter tuning via Bayesian Optimization through the [Ax platform](https://ax.dev/)! A demo is provided in [here](DEMO/Drug_Property_Pred-Ax-Hyperparam-Tune.ipynb).
### Features
- For non-computational researchers, ONE line of code from raw data to output drug repurposing/virtual screening result, aiming to allow wet-lab biochemists to leverage the power of deep learning. The result is ensembled from five pretrained deep learning models!
- For computational researchers, 15+ powerful encodings for drugs and proteins, ranging from deep neural network on classic cheminformatics fingerprints, CNN, transformers to message passing graph neural network, with 50+ combined models! Most of the combinations of the encodings are not yet in existing works. All of these under 10 lines but with lots of flexibility! Switching encoding is as simple as changing the encoding names!
- Realistic and user-friendly design:
- automatic identification to do drug target binding affinity (regression) or drug target interaction prediction (binary) task.
- support cold target, cold drug settings for robust model evaluations and support single-target high throughput sequencing assay data setup.
- many dataset loading/downloading/unzipping scripts to ease the tedious preprocessing, including antiviral, COVID19 targets, BindingDB, DAVIS, KIBA, ...
- many pretrained checkpoints.
- easy monitoring of training process with detailed training metrics output such as test set figures (AUCs) and tables, also support early stopping.
- detailed output records such as rank list for repurposing result.
- various evaluation metrics: ROC-AUC, PR-AUC, F1 for binary task, MSE, R-squared, Concordance Index for regression task.
- label unit conversion for skewed label distribution such as Kd.
- time reference for computational expensive encoding.
- PyTorch based, support CPU, GPU, Multi-GPUs.
*NOTE: We are actively looking for constructive advices/user feedbacks/experiences on using DeepPurpose! Please open an issue or [contact us](mailto:kexinhuang@hsph.harvard.edu).*
## Example
### Case Study 1 (a): Antiviral Drugs Repurposing for SARS-CoV2 3CLPro, using One Line.
Given a new target sequence (e.g. SARS-CoV2 3CL Protease), retrieve a list of repurposing drugs from a curated drug library of 81 antiviral drugs. The Binding Score is the Kd values. Results aggregated from five pretrained model on BindingDB dataset!
Click here for the code!
```python
from DeepPurpose import oneliner
from DeepPurpose.dataset import *
oneliner.repurpose(*load_SARS_CoV2_Protease_3CL(), *load_antiviral_drugs(no_cid = True))
```
```
----output----
Drug Repurposing Result for SARS-CoV2 3CL Protease
+------+----------------------+------------------------+---------------+
| Rank | Drug Name | Target Name | Binding Score |
+------+----------------------+------------------------+---------------+
| 1 | Sofosbuvir | SARS-CoV2 3CL Protease | 190.25 |
| 2 | Daclatasvir | SARS-CoV2 3CL Protease | 214.58 |
| 3 | Vicriviroc | SARS-CoV2 3CL Protease | 315.70 |
| 4 | Simeprevir | SARS-CoV2 3CL Protease | 396.53 |
| 5 | Etravirine | SARS-CoV2 3CL Protease | 409.34 |
| 6 | Amantadine | SARS-CoV2 3CL Protease | 419.76 |
| 7 | Letermovir | SARS-CoV2 3CL Protease | 460.28 |
| 8 | Rilpivirine | SARS-CoV2 3CL Protease | 470.79 |
| 9 | Darunavir | SARS-CoV2 3CL Protease | 472.24 |
| 10 | Lopinavir | SARS-CoV2 3CL Protease | 473.01 |
| 11 | Maraviroc | SARS-CoV2 3CL Protease | 474.86 |
| 12 | Fosamprenavir | SARS-CoV2 3CL Protease | 487.45 |
| 13 | Ritonavir | SARS-CoV2 3CL Protease | 492.19 |
....
```
Click here for the code!
```python
from DeepPurpose import oneliner
from DeepPurpose.dataset import *
oneliner.repurpose(*load_MMP9())
```
```
----output----
Drug Repurposing Result for MMP9
+------+-------------+-------------+---------------+
| Rank | Drug Name | Target Name | Binding Score |
+------+-------------+-------------+---------------+
| 1 | 6917849.0 | MMP9 | 5.42 |
| 2 | 441336.0 | MMP9 | 6.97 |
| 3 | 441335.0 | MMP9 | 8.37 |
| 4 | 27924.0 | MMP9 | 9.84 |
| 5 | 16490.0 | MMP9 | 9.86 |
| 6 | 23947600.0 | MMP9 | 10.11 |
| 7 | 5743.0 | MMP9 | 12.44 |
| 8 | 3288.0 | MMP9 | 15.91 |
| 9 | 129009989.0 | MMP9 | 18.01 |
| 10 | 129009925.0 | MMP9 | 23.13 |
| 11 | 40467076.0 | MMP9 | 23.48 |
| 12 | 6917974.0 | MMP9 | 24.50 |
| 13 | 73707512.0 | MMP9 | 26.83 |
```
Click here for the code!
```python
from DeepPurpose import oneliner
from DeepPurpose.dataset import *
oneliner.repurpose(*load_SARS_CoV_Protease_3CL(), *load_antiviral_drugs(no_cid = True), *load_AID1706_SARS_CoV_3CL(), \
split='HTS', convert_y = False, frac=[0.8,0.1,0.1], pretrained = False, agg = 'max_effect')
```
```
----output----
Drug Repurposing Result for SARS-CoV 3CL Protease
+------+----------------------+-----------------------+-------------+-------------+
| Rank | Drug Name | Target Name | Interaction | Probability |
+------+----------------------+-----------------------+-------------+-------------+
| 1 | Remdesivir | SARS-CoV 3CL Protease | YES | 0.99 |
| 2 | Efavirenz | SARS-CoV 3CL Protease | YES | 0.98 |
| 3 | Vicriviroc | SARS-CoV 3CL Protease | YES | 0.98 |
| 4 | Tipranavir | SARS-CoV 3CL Protease | YES | 0.96 |
| 5 | Methisazone | SARS-CoV 3CL Protease | YES | 0.94 |
| 6 | Letermovir | SARS-CoV 3CL Protease | YES | 0.88 |
| 7 | Idoxuridine | SARS-CoV 3CL Protease | YES | 0.77 |
| 8 | Loviride | SARS-CoV 3CL Protease | YES | 0.76 |
| 9 | Baloxavir | SARS-CoV 3CL Protease | YES | 0.74 |
| 10 | Ibacitabine | SARS-CoV 3CL Protease | YES | 0.70 |
| 11 | Taribavirin | SARS-CoV 3CL Protease | YES | 0.65 |
| 12 | Indinavir | SARS-CoV 3CL Protease | YES | 0.62 |
| 13 | Podophyllotoxin | SARS-CoV 3CL Protease | YES | 0.60 |
....
```
Click here for the code!
```python
from DeepPurpose import models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
# Load Data, an array of SMILES for drug, an array of Amino Acid Sequence for Target and an array of binding values/0-1 label.
# e.g. ['Cc1ccc(CNS(=O)(=O)c2ccc(s2)S(N)(=O)=O)cc1', ...], ['MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTH...', ...], [0.46, 0.49, ...]
# In this example, BindingDB with Kd binding score is used.
X_drug, X_target, y = process_BindingDB(download_BindingDB(SAVE_PATH),
y = 'Kd',
binary = False,
convert_to_log = True)
# Type in the encoding names for drug/protein.
drug_encoding, target_encoding = 'MPNN', 'Transformer'
# Data processing, here we select cold protein split setup.
train, val, test = data_process(X_drug, X_target, y,
drug_encoding, target_encoding,
split_method='cold_protein',
frac=[0.7,0.1,0.2])
# Generate new model using default parameters; also allow model tuning via input parameters.
config = generate_config(drug_encoding, target_encoding, transformer_n_layer_target = 8)
net = models.model_initialize(**config)
# Train the new model.
# Detailed output including a tidy table storing validation loss, metrics, AUC curves figures and etc. are stored in the ./result folder.
net.train(train, val, test)
# or simply load pretrained model from a model directory path or reproduced model name such as DeepDTA
net = models.model_pretrained(MODEL_PATH_DIR or MODEL_NAME)
# Repurpose using the trained model or pre-trained model
# In this example, loading repurposing dataset using Broad Repurposing Hub and SARS-CoV 3CL Protease Target.
X_repurpose, drug_name, drug_cid = load_broad_repurposing_hub(SAVE_PATH)
target, target_name = load_SARS_CoV_Protease_3CL()
_ = models.repurpose(X_repurpose, target, net, drug_name, target_name)
# Virtual screening using the trained model or pre-trained model
X_repurpose, drug_name, target, target_name = ['CCCCCCCOc1cccc(c1)C([O-])=O', ...], ['16007391', ...], ['MLARRKPVLPALTINPTIAEGPSPTSEGASEANLVDLQKKLEEL...', ...], ['P36896', 'P00374']
_ = models.virtual_screening(X_repurpose, target, net, drug_name, target_name)
```
Click here for the installation instruction!
First time:
```bash
git clone https://github.com/kexinhuang12345/DeepPurpose.git
## Download code repository
cd DeepPurpose
## Change directory to DeepPurpose
conda env create -f environment.yml
## Build virtual environment with all packages installed using conda
conda activate DeepPurpose
## Activate conda environment (use "source activate DeepPurpose" for anaconda 4.4 or earlier)
jupyter notebook
## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate
## when done, exit conda environment
```
In the future:
```bash
cd DeepPurpose
## Change directory to DeepPurpose
conda activate DeepPurpose
## Activate conda environment
jupyter notebook
## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate
## when done, exit conda environment
```
Click here for the format expected!
For drug target pairs:
```
Drug1_SMILES Target1_Seq Score/Label
Drug2_SMILES Target2_Seq Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_drug_target_pairs(PATH)
```
For bioassay training data:
```
Target_Seq
Drug1_SMILES Score/Label
Drug2_SMILES Score/Label
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, y = dataset.read_file_training_dataset_bioassay(PATH)
```
For drug repurposing library:
```
Drug1_Name Drug1_SMILES
Drug2_Name Drug2_SMILES
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_drug_names = dataset.read_file_repurposing_library(PATH)
```
For target sequence to be repurposed:
```
Target_Name Target_seq
```
Then, use
```python
from DeepPurpose import dataset
Target_seq, Target_name = dataset.read_file_target_sequence(PATH)
```
For virtual screening library:
```
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
Drug1_SMILES Drug1_Name Target1_Seq Target1_Name
....
```
Then, use
```python
from DeepPurpose import dataset
X_drug, X_target, X_drug_names, X_target_names = dataset.read_file_virtual_screening_drug_target_pairs(PATH)
```
Click here for the models supported!
|Model Name|
|------|
|DeepDTA_DAVIS|
|CNN_CNN_BindingDB|
|Morgan_CNN_BindingDB|
|Morgan_CNN_KIBA|
|Morgan_CNN_DAVIS|
|MPNN_CNN_BindingDB|
|MPNN_CNN_KIBA|
|MPNN_CNN_DAVIS|
|Transformer_CNN_BindingDB|
|Daylight_AAC_DAVIS|
|Daylight_AAC_KIBA|
|Daylight_AAC_BindingDB|
|Morgan_AAC_BindingDB|
|Morgan_AAC_KIBA|
|Morgan_AAC_DAVIS|
|CNN_Transformer_DAVIS|