# PepMimic
**Repository Path**: Billsfriend/PepMimic
## Basic Information
- **Project Name**: PepMimic
- **Description**: Fork of https://github.com/kxz18/PepMimic
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2025-09-02
- **Last Updated**: 2025-09-03
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Peptide Mimicry

## Running Locally
### Environment
:warning: The codes are tested under cuda 11.7.
```bash
conda env create -f env.yaml
conda activate pepmimic
```
:warning: You need to further manually setup FoldX5 suite (FoldX5.1 is compatible) by acquiring an [academic license](https://foldxsuite.crg.eu/academic-license-info). The suite should be downloaded and extracted under `evaluation/dG/foldx5`:
```bash
evaluation/dG/foldx5/
├── foldx_20251231
├── molecules
└── yasaraPlugin.zip
```
The suffix "20251231" denotes the last valid day of usage (2025/12/31) since foldx only provide 1-year license for academic usage and thus needs yearly renewal. After renewal, the path in `globals.py` also needs to be changed according to the new suffix.
### Checkpoints
The model weights can be downloaded at the [release page](https://github.com/kxz18/PepMimic/releases/download/v1.0/checkpoints.zip).
```bash
wget https://github.com/kxz18/PepMimic/releases/download/v1.0/checkpoints.zip
unzip checkpoints.zip
```
### Mimicking Given References
Prepare an input folder with reference complexes in PDB format, an index file containing the chain ids of the target protein and the ligand, as well as a configuration indicating parameters like peptide length and number of generations. We have prepared an example folder under `example_data/CD38`:
```bash
example_data/
└── CD38
├── 4cmh.pdb
├── 5f1o.pdb
├── config.yaml
└── index.txt
```
Here we also illustrate the meaning of each entry in the `config.yaml`:
```yaml
dataset:
test:
class: MimicryDataset
ref_dir: ./example_data/CD38 # The directory for all reference complexes, which should be a relative path rooted at the project folder, or a absolute path
n_sample_per_cplx: 20 # The number of generations for each reference complex. This is just a toy example for a quick tour. For practical usage, we recommend generating a total of above 100,000 candidates before ranking to select the top-scoring one for wetlab tests. For example, here we have two reference complexes, thus we should set n_sample_per_cplx to at least 50,000, so that the total generations will be above 100,000.
length_lb: 10 # lower bound of peptide length (inclusive)
length_ub: 12 # uppper bound of peptide length (inclusive)
dataloader:
num_workers: 4 # Number of CPUs for data processing. Usually 4 is enough.
batch_size: 32 # If the GPU is out of memory, please try to reduce the batch size
```
Each line of `index.txt` describes the filename (without `.pdb`), the target chains, the reference ligand chains, and custom annotations for a reference complex, separated by `\t`. For example, the line for `4cmh.pdb` looks like:
```
4cmh A B,C HEAVY CHAIN OF SAR650984-FAB FRAGMENT,LIGHT CHAIN OF SAR650984-FAB FRAGMENT
```
After preparation of these input files, you can try the generation with the following script:
```bash
# The last number 10 indicates we will finally select the best 10 candidates as the output
GPU=0 bash scripts/mimic.sh example_data/CD38 10
```
The results will be saved under `example_data/CD38/final_output`. You may also check `example_data/CD38/results` for unfiltered results.
## Running on Google Colab
We have also prepared the online version for users who prefer using Google Colab. However, we still recommend using the local version due to various constraints on Google Colab (e.g. Running time restriction).