# NanoFreeLunch.jl **Repository Path**: zhixingfeng/NanoFreeLunch.jl ## Basic Information - **Project Name**: NanoFreeLunch.jl - **Description**: Detecting DNA methylation from ONT data without raw-signals - **Primary Language**: Julia - **License**: GPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2022-05-09 - **Last Updated**: 2025-11-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # NanoFreeLunch Detecting DNA modifications quantitatively from Nanopore data without using raw signals. # Installation 1. Install Julia from https://julialang.org/. Do Not use Julia in containers like docker or singularity. 2. Enter the folder of NanoFreeLunch and type `julia setup.jl`. 3. The executable can be found in build/bin/, add the folder to PATH or add softlink of the executable to the folder in your PATH. **Warning**: you might experience slow package downloading or get error like `Exception: RequestError: HTTP/1.1 200 Connection established ` if Julia server connection is unstable in your region. You could change your Julia server by typing `export JULIA_PKG_SERVER=https://mirrors.pku.edu.cn/julia` and try again. Here is a list of alternative Julia package servers: ``` https://mirrors.pku.edu.cn/julia https://mirrors.sjtug.sjtu.edu.cn/julia https://mirrors.nju.edu.cn/julia https://releases.tongyuan.cc/juliapkg/original ``` # Demo A quick demo is available at https://gitee.com/zhixingfeng/nfl-demo/tree/main/demo. # Supporting data and code The supporting data and code to reproduce the results are available at https://gitee.com/zhixingfeng/nfl-demo. # Usage ## Extracting features from aligned reads ### `prepdata [options] [flags] bamfile reffile locifile outdir` --- ### Args - `` Sorted and indexed BAM file. - `` FASTA file for the reference genome. - `` Loci of the putative modified bases, for all methylation motif loci like CG. In `locifile`: - **Col 1** = chromosome - **Col 2** = 1-based genomic locus - **Col 3** = type of modification (currently only 5mC) - **Col 4** = strand - **Col 5** = methylation proportion (0~1) (only useful for model training, can be filled with any numbers for prediction) - `` Output directory containing multiple `.jld2` files that store the sparse matrix of each batch. ### Options - `-c`, `--chr <"">` Specify chromosome. - `--context-size <10>` Context size. - `-b`, `--batch-size <100>` Number of loci for each batch. Data is split into multiple batches to save RAM. - `--config-file <::String>` (for internal usage only) Config file for feature selection. - `--contexteffect-model-file <::String>` (for internal usage only) Context effect model file for base QV. ### Flags - `-f`, `--force` Overwrite the output directory. - `--full` Use full coskewness and cokurtosis instead of pairwise. - `-p`, `--parallel` Use multi-threading. Set `JULIA_NUM_THREADS` in the environment to the number of cores to use. - `-g`, `--genomic` Split loci by genomic loci instead of `locifile`. - `-r`, `--run-level-normalize` Normalize QV by run average QV and standard variance. - `--read-level-normalize` Normalize QV by read average QV. - `-h`, `--help` Print this help message. ### Tips for input BAM files The input BAM file should be sorted and indexed. It should cleaned by only keeping primary alignments using `samtools view -h -b -F 4079`. ### Tips for paramters - Do Not leave `-c` empty, always give it the chromosome name like `chr1`, `chr6` etc for better speed. - A large `--batch-size` will increase speed but also increase memory usage. - `--run-level-normalize` is recommended. - Type `export JULIA_NUM_THREADS=` in Shell before using `--parallel`, otherwise only 1 thread will be used no matter whether `--parallel` is set or not. ## Train the model Type `nfl train` for usage. This is for developers. Use the pre-trained models in `model` instead. ## Predict ### Usage: `predict` [options] --- ### Args - `` The model file, which is the output file of `nfl train`. - `` The `Xdata` folder in the output directory of `nfl prepdata`. - `` The output file (predicted modification level for qualitative detection or probability for qualitative detection). ### Options - `-e`, `--esp <0.001>` Precision threshold. Modification level <= `esp` will be set to 0, and modification level >= `1-esp` will be set to 1. - `-s`, `--subset <"">` Use a subset of features (default: using all the features). - `--objective ` Default is `reg:squarederror`. Use `reg:logistic` for direct regression instead of logit transformation. ### Flags - `-h`, `--help` Print this help message. # License The software is under GPL3 and the data in `test/data` and `test/results` are under the CC0 public domain waiver.