# rfi_ml **Repository Path**: mirrors_ICRAR/rfi_ml ## Basic Information - **Project Name**: rfi_ml - **Description**: Machine learning code for RFI - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2020-08-08 - **Last Updated**: 2026-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # rfi_ml Machine learning code for RFI All of the scripts in this file must be run from the root of the repository. ## Installation Virtual Env installation ```bash # Create venv and install dependencies python3 -m venv venv source venv/bin/activate pip install -r scripts/requirements.txt # Install pyvex for reading .vex files cd pyvex/pyvex python setup.py install ``` ## Preprocessing ```bash # Preprocess power line data bash data/preprocess_india_txt.sh # Preprocess vlba data bash data/preprocess_vlba.sh ``` ## Documentation Generation ```bash source venv/bin/activate pdoc3 --html src --force ``` The documentation will be available in `html` for browsing. Additionally, documentation is written in [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#overview) format ## Configuration Default config file creation ```bash source venv/bin/activate python -m src.config ``` The config file will be created in the root of the repository ## Training ```bash source venv/bin/activate # Train using the config file in the root python -m src.train gan_config.settings ``` **Configuration Options** `USE_CUDA` - True to train using the GPU, false to use the CPU (defaults to true). `FILENAME` - Path to HDF5 file to load data from, relative to the repository root. `MAX_EPOCHS` - Max number of epochs to train the GAN for (defaults to 60). `MAX_GENERATOR_AUTOENCODER_EPOCHS` - Max number of epochs to train the generator autoencoder for (defaults to 60). `MAX_SAMPLES` - Maximum number of inputs to train on. Set to 0 for unlimited (defaults to 0). `BATCH_SIZE` - Number of samples to train on per batch (defaults to 4096). `POLARISATIONS` - Which polarisations should be used? (comma separated list, defaults to 0, 1). `FREQUENCIES` - Which frequencies should be used? (comma separated list, defaults to 0, 1, 2, 3). `NORMALISE` - Set to true to normalise inputs (defaults to true). `ADD_DROPOUT` - If true, add dropout to the inputs before passing them into the network (defaults to true). `ADD_NOISE` - If true, add noise to the inputs before passing them into the network (defaults to false). `REQUEUE_EPOCHS` - If > 0, perform REQUEUE_EPOCHS of training, stop, then run the REQUEUE_SCRIPT (defaults to 0). `REQUEUE_SCRIPT` - If REQUEUE_EPOCHS > 0, this script will be called to requeue the training script. `CHECKPOINT_DIRECTORY` - The directory to write checkpoints to, relative to the repository root. `RESULT_DIRECTORY` - The directory to write results to, relative to the repository root. **Example** ```text USE_CUDA = True FILENAME = data/processed/C148700001_fft.hdf5 MAX_EPOCHS = 60 MAX_AUTOENCODER_EPOCHS = 60 MAX_SAMPLES = 0 BATCH_SIZE = 128 NORMALISE = True ADD_DROPOUT = False ADD_NOISE = False REQUEUE_EPOCHS = 0 REQUEUE_SCRIPT = "" CHECKPOINT_DIRECTORY = data/checkpoints RESULT_DIRECTORY = data/results/ ```