# Kraken-bracken-pipeline **Repository Path**: bioinfoFungi/kraken-bracken-pipeline ## Basic Information - **Project Name**: Kraken-bracken-pipeline - **Description**: Kraken-Bracken Pipeline - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-03-09 - **Last Updated**: 2023-03-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Kraken-Bracken Pipeline Metatranscriptomics (RNA) data analysis pipeline using kraken for OTU detection and bracken for correction of detected abundance values. ## Input There are some important inputs that are requested and other inputs and options that have a default value, which can also be modified. You can check all of these parameters by using the --help command of the pipeline. All inputs and options can be modified either by command line or directly by changing their default value in the nextflow.config file. To modify them by command line you have to type "--name of the variable desired value" right after the execution command. ### reads Location where your input fastq files are. Example: ```--reads '/path/to/reads/*.{1,2}.fastq.gz' ``` The name of the path must be **enclosed in quotes** and it must contain at least **one '*'** wildcard character, in order to catch all samples required. It is possible to specify whether the reads are paired or not (parameter pairedEnd). If they are paired, it is necessary to use **{1,2}** notation to specify read pairs. ### krakendb Path to Kraken database. This database should be downloaded and builded before executing this pipeline, however, there is no need for an specific database to perform. Nevertheless, it is important to reserve enough memory to load the chosen database. The default memory to load the database is 5 GB, but it can be tunned by changing the parameter "kraken_mem". ### targets A tab-separated values file that contains the names and condition of every sample in our study. An example of the format of this file is the following: | Filename | Name | Type | Covariable | |:--------:|:------------:|:----:|:----------:| | GSMXXXX1 | ConditionA 1 | A | Male | | GSMXXXX2 | ConditionA 2 | A | Female | | GSMXXXX3 | ConditionB 1 | B | Female | It is required that the third column contains the condition that will be compared in the differential abundance analysis. It is also possible (but optional), to add a 4th column that correspond to a determined covariable to be fitted in the model (in the example, sex variable). ### contrast A text file where every line correspond to a contrast to make in the differential abundance analysis. The format of writting down the contrast is the following one: **Name_of_contrast = (CaseCondition - ControlCondition)** It is worth to mention that the first line must contain no contrast and some word such as "Name" or "Contrasts". ## Output The main output obtained from this pipeline is a file with the extension ".tab" that contains the abundance of each OTU detected in each sample, and other file with the extension ".tsv" that contains the names of the different OTUs detected. Krona plots are also obtained (if skip_krona=false) in a unique directory called "krona_reports" Additionally, if differential abundance analisys is performed, a file with differentially abundant OTUs with some interest statistics (logFC, p-value, etc.) will be obtained, as different clustering plots of the different samples in the experiment(in a unique pdf file). ## Help ``` $ nextflow run kraken_braken.nf --help N E X T F L O W ~ version 22.04.5 Name: nf-differential-abundance-kraken2 Author: Bioinformatica IPPBLN ========================================= Mandatory arguments: --reads Path to input data (if paired end sequences must be a regular expression such as *{1,2}.fastq.gz) --krakendb Path to kraken database Settings: --kraken_mem Necesary memory to load kraken database. Default = 5 GB --confidence Confidence score threshold (0-1). Default = 0 --pairedEnd Specifies if reads are paired-end (true | false). Default = true --skip_bracken_build Skip building bracken database (true | false). Default = true --skip_krona Skip generating krona reports (true | false). Default = true --krona_dir Path to krona directory. Default = /usr/bin/Krona-master/KronaTools/bin/ --taxonomy_filter Specifies the taxonomic level to filter by bracken. Defaults to S --kmer_len Specifies the kmer length. Default = 35 --read_len Specifies the read length of the input data (needed for Bracken). Default = 75 --b_threshold Specifies threshold for bracken filter. Default = 10 --skip_diff Skip metagenome-Seq differential abundance analysis (true | false). Default = true --targets Metadata file that contains a tab-delimited table with filenames and contrast condition in the 3rd column. --contrast File with contrasts to perform in the abundance analysis. Options: --outdir The output directory where the results will be saved. Defaults to ./ --help --h Shows this help page Usage example: nextflow run main.nf --reads '/path/to/paired_end_reads_*.{1,2}.fastq.gz' --krakendb '/path/to/krakendb/' --krona_dir '/path/to/ktImportTaxonomy' --targets '/path/to/targets.txt' --contrast '/path/to/contrast.txt' ```