# semla-flow **Repository Path**: ahlih_admin/semla-flow ## Basic Information - **Project Name**: semla-flow - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-04 - **Last Updated**: 2025-07-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SemlaFlow - Efficient Molecular Generation with Flow Matching and Semla This project creates a novel equivariant attention-based message passing architecture, Semla, for molecular design and dynamics tasks. We train a molecular generation model, SemlaFlow, using flow matching with optimal transport to generate realistic 3D molecular structures. ## Installation All of the code was run using a mamba/conda environment. You can of course use a different environment manager; all core requirements are contained in the `environment.yaml` file. Using mamba/conda you can recreate the environment as follows: 1. `mamba env create --file environment.yaml` 2. `mamba activate semlaflow` For developing (and to run the notebooks) you will also need to install the extra requirements: 3. `pip install -r extra_requirements.txt` ## Datasets For ease-of-use we have provided the processed data files in a Google drive [here](https://drive.google.com/drive/folders/1rHi5JzN05bsGRGQUcWRmDu-Ilfoa9EAT?usp=sharing). Copy the folder called `smol` from the QM9 or GEOM drugs folders and point to the `smol` folder when running the scripts. For example, pass `--data_path path/to/data/qm9/smol` to the script you wish to run. ### Data Prep We copied the code from MiDi (https://github.com/cvignac/MiDi) to download the QM9 dataset and create the data splits. We provide the code to do this, as well as create the _Smol_ internal dataset representation used for training in the `notebooks/qm9.ipynb` notebook. For GEOM Drugs we also follow the URLs provided in the MiDi repo. GEOM Drugs is preprocessed using the `preprocess.py` script. GEOM Drugs URLs from MiDi are as follows: * train: https://drive.switch.ch/index.php/s/UauSNgSMUPQdZ9v * validation: https://drive.switch.ch/index.php/s/YNW5UriYEeVCDnL * test: https://drive.switch.ch/index.php/s/GQW9ok7mPInPcIo ## Running Once you have created and activated the environment successfully, you can run the code. ### Scripts We provide 4 scripts in the repository: * `preprocess` - Used for preprocessing larger datasets into the internal representation used by the model for training * `train` - Trains a MolFlow model on preprocessed data * `evaluate` - Evaluates a trained model and prints the results * `predict` - Runs the sampling for a trained model and saves the generated molecules Each script can be run as follows (where `