# ConPLex_dev

**Repository Path**: wwz-2000/ConPLex_dev

## Basic Information

- **Project Name**: ConPLex_dev
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-08
- **Last Updated**: 2024-10-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Adapting Protein Language Models for Rapid DTI Prediction

This repository documents the code used to generate the results for our [PNAS](https://www.pnas.org/doi/10.1073/pnas.2220778120) article. The updated package, which is continuously being developed, can be found at [this repository](https://github.com/samsledje/ConPLex). Please submit an issue or email samsl@mit.edu with any questions.

### Sample Usage

`python train_DTI.py --exp-id ExperimentName --config configs/default_config.yaml`

### Repository Organization

- `src`: Python files containing protein and molecular featurizers, prediction architectures, and data loading
- `scripts`: Bash files to run benchmarking tasks
  - `CMD_BENCHMARK_DAVIS.sh` -- Run DTI classification benchmarks on DAVIS data set. Can be easily modified for other classification data sets
  - `CMD_BENCHMARK_TDC_DTI_DG.sh` -- Run benchmarks for [TDC](https://tdcommons.ai) [DTI-DG](https://tdcommons.ai/benchmark/dti_dg_group/bindingdb_patent/) regression task
  - `CMD_BENCHMARK_DUDE_CROSSTYPE.sh` -- Evaluate trained model on [DUDe](http://dude.docking.org) decoy performance for kinase and GPCR targets
  - `CMD_BENCHMARK_DUDE_WITHINTYPE.sh` -- The same as above, but with half of kinase, gpcr, protease, and nuclear targets
- `models`: Pre-trained protein language models
- `dataset`: Data sets to benchmark on, most are from [MolTrans](https://academic.oup.com/bioinformatics/article/37/6/830/5929692)
  - `DAVIS`
  - `BindingDB`
  - `BIOSNAP`
  - `DUDe` 
- `nb`: Jupyter notebooks for data generation and exploration
- `train_DTI.py` -- Main training script to run DTI classification benchmarks
- `DUDE_evaluate_decoys.py` -- Compare predictions of a trained model between a target and known true binders/decoys. Visualize embedding space
- `DUDE_summarize_decoys.py` -- Given a directory of protein targets, summarize active/decoy discriminative performance by target type


### Reference

- Described in [our PNAS paper](https://www.pnas.org/doi/10.1073/pnas.2220778120)
- Previously appeared in [NeurIPS MLSB 2021](https://www.mlsb.io/papers_2021/MLSB2021_Adapting_protein_language_models.pdf) and [NeurIPS MLSB 2022](https://www.biorxiv.org/content/10.1101/2022.11.03.515086v1), and on [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.12.06.519374v1).