# guacamol_baselines **Repository Path**: greitzmann/guacamol_baselines ## Basic Information - **Project Name**: guacamol_baselines - **Description**: Baselines models for GuacaMol benchmarks - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-17 - **Last Updated**: 2021-01-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GuacaMol Baselines A series of baseline model implementations for the [`guacamol`](https://github.com/BenevolentAI/guacamol) benchmark for generative chemistry. A more in depth explanation of the benchmarks and scores for these baselines is can be found in our [paper](https://arxiv.org/abs/1811.09621). ## Dependencies To install all dependencies: ```bash conda install rdkit -c rdkit pip install -r requirements.txt ``` ## Dataset Some baselines require the `guacamol` dataset to run, to get it run: ```bash bash fetch_guacamol_dataset.sh ``` ## Random Sampler Dummy baseline, always returning random molecules form the `guacamol` training set. To execute the goal-directed generation benchmarks: ```bash python -m random_smiles_sampler.goal_directed_generation ``` To execute the distribution learning benchmarks: ```bash python -m random_smiles_sampler.distribution_learning ``` ## Best from ChEMBL Dummy baseline that simply returns the molecules from the `guacamol` training set that best satisfy the score of a goal-directed benchmark. There is no model nor training, its only purpose is to establish a lower bound on the benchmark scores. To execute the goal-directed generation benchmarks: ```bash python -m best_from_chembl.goal_directed_generation ``` No distribution learning benchmark available. ## SMILES GA Genetic algorithm on SMILES as described in: https://www.journal.csj.jp/doi/10.1246/cl.180665 Implementation adapted from: https://github.com/tsudalab/ChemGE To execute the goal-directed generation benchmarks: ```bash python -m smiles_ga.goal_directed_generation ``` No distribution learning benchmark available. ## Graph GA Genetic algoritm on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751 Implementation adapted from: https://github.com/jensengroup/GB-GA To execute the goal-directed generation benchmarks: ```bash python -m graph_ga.goal_directed_generation ``` No distribution learning benchmark available. ## Graph MCTS Monte Carlo Tree Search on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751 Implementation adapted from: https://github.com/jensengroup/GB-GB To execute the goal-directed generation benchmarks: ```bash python -m graph_mcts.goal_directed_generation ``` To execute the distribution learning benchmarks: ```bash python -m graph_mcts.distribution_learning ``` To re-generate the distribution statistics as pickle files: ```bash python -m graph_mcts.analyze_dataset ``` ## SMILES LSTM Hill Climbing Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329 This implementation optimizes using *hill climbing* algorithm. Implementation by [BenevolentAI](https://benevolent.ai/) A pre-trained model is provided in: [smiles_lstm/pretrained_model](https://github.com/BenevolentAI/guacamol_baselines/tree/master/smiles_lstm_hc/pretrained_model) To execute the goal-directed generation benchmarks: ```bash python -m smiles_lstm_hc.goal_directed_generation ``` To execute the distribution learning benchmark: ```bash python -m smiles_lstm_hc.distribution_learning ``` To train a model from scratch: ```bash python -m smiles_lstm_hc.train_smiles_lstm_model ``` ## SMILES LSTM PPO Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329 This implementation optimizes using [*proximal policy optimization*](https://arxiv.org/pdf/1707.06347.pdf) algorithm. Implementation by [BenevolentAI](https://benevolent.ai/) A pre-trained model is provided in: [smiles_lstm/pretrained_model](https://github.com/BenevolentAI/guacamol_baselines/tree/master/smiles_lstm_ppo/pretrained_model) To execute the goal-directed generation benchmarks: ```bash python -m smiles_lstm_ppo.goal_directed_generation ```