# grammarVAE
**Repository Path**: wakeuppp/grammarVAE
## Basic Information
- **Project Name**: grammarVAE
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-06-18
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Grammar Variational Autoencoder
This repository contains training and sampling code for the paper: Grammar Variational Autoencoder.
## Requirements
Python 2.7
Install (CPU version) using `pip install -r requirements.txt`
For GPU compatibility, replace the fourth line in requirements.txt with: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
## Creating datasets
### Molecules
To create the molecule datasets, call:
* `python make_zinc_dataset_grammar.py`
* `python make_zinc_dataset_str.py`
### Equations
The equation dataset can be downloaded here: [grammar](https://www.dropbox.com/s/yq1gpygw3oq1grq/eq2_grammar_dataset.h5?dl=0), [string](https://www.dropbox.com/s/gn3iq2ykrs0dqwb/eq2_str_dataset.h5?dl=0)
## Training
### Molecules
To train the molecule models, call:
* `python train_zinc.py` % the grammar model
* `python train_zinc.py --latent_dim=2 --epochs=50` % train a model with a 2D latent space and 50 epochs
* `python train_zinc_str.py`
### Equations
* `python train_eq.py` % the grammar model
* `python train_eq.py --latent_dim=2 --epochs=50` % train a model with a 2D latent space and 50 epochs
* `python train_eq_str.py`
## Sampling
### Molecules
The file molecule_vae.py can be used to encode and decode SMILES strings. For a demo run:
* `python encode_decode_zinc.py`
### Equations
The analogous file equation_vae.py can encode and decode equation strings. Run:
* `python encode_decode_eq.py`
## Bayesian optimization
The Bayesian optimization experiments use sparse Gaussian processes coded in theano.
We use a modified version of theano with a few add ons, e.g. to compute
the log determinant of a positive definite matrix in a numerically stable
manner. The modified version of theano can be insalled by going to the folder
Theano-master and typing
* `python setup.py install`
The experiments with molecules require the rdkit library, which can be installed as described in
http://www.rdkit.org/docs/Install.html.
The Bayesian optimization experiments can be replicated as follows:
1 - Generate the latent representations of molecules and equations. For this, go to the folders
molecule_optimization/latent_features_and_targets_grammar/
molecule_optimization/latent_features_and_targets_character/
equation_optimization/latent_features_and_targets_grammar/
equation_optimization/latent_features_and_targets_character/
and type
* `python generate_latent_features_and_targets.py`
2 - Go to the folders
molecule_optimization/simulation1/grammar/
molecule_optimization/simulation1/character/
equation_optimization/simulation1/grammar/
equation_optimization/simulation1/character/
and type
* `nohup python run_bo.py &`
Repeat this step for all the simulation folders (simulation2,...,simulation10).
For speed, it is recommended to do this in a computer cluster in parallel.
2 - Extract the results by going to the folders
molecule_optimization/
equation_optimization/
and typing
* `python get_final_results.py`
* `./get_average_test_RMSE_LL.sh`