# smiles-transformer **Repository Path**: greitzmann/smiles-transformer ## Basic Information - **Project Name**: smiles-transformer - **Description**: Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-17 - **Last Updated**: 2021-01-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SMILES Transformer [SMILES Transformer](http://arxiv.org/abs/1911.04738) extracts molecular fingerprints from string representations of chemical molecules. The transformer learns latent representation that is useful for various downstream tasks through autoencoding task. ## Requirement This project requires the following libraries. - NumPy - Pandas - PyTorch > 1.2 - tqdm - RDKit ## Dataset Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used. These canonical SMILES were transformed randomly every epoch with [SMILES-enumeration](https://github.com/EBjerrum/SMILES-enumeration) by E. J. Bjerrum. ## Pre-training After preparing the SMILES corpus for pre-training, run: ``` $ python pretrain_trfm.py ``` Pre-trained model is [here](https://drive.google.com/file/d/1LwE2BzvtDaPGYv0OR6iBjmsqoloH885N/view?usp=sharing). ## Downstream Tasks See `experiments/` for the example codes. ## Cite ``` @article{honda2019smiles, title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery}, author={Shion Honda and Shoi Shi and Hiroki R. Ueda}, year={2019}, eprint={1911.04738}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```