# LiveKraken

**Repository Path**: Ork_admin/LiveKraken

## Basic Information

- **Project Name**: LiveKraken
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2018-10-29
- **Last Updated**: 2024-06-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

LiveKraken - real-time taxonomic classification of Illumina sequence data.
=========================================================================

LiveKraken is an extension of the [Kraken] taxonomic sequence classification
tool for classifying Illumina sequence data as it is being generated.

Installation
-------
To install LiveKraken, please run 

`./install_kraken.sh $target_directory`

Alternatively, you may install LiveKraken via bioconda

[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/livekraken/README.html)

Usage
-------
LiveKraken is largely compatible with the original Kraken implementation 
and kept its parameters and options. However, a few new parameters are necessary
to control the incremental classification process. They are detailed below:

For LiveKraken, the `classify` program has been extended. The following input formats are now supported:

`-f`    if input is in FASTQ format

`-b`    if input is in BCL format

The default assumes that the input is a FASTA file, as in the original Kraken. 

If the `-b` flag is used for BCL files, the input path should be the `BaseCalls` folder generated by
Illumina sequencers instead of a file as in the case of FASTA/FASTQ input. 

For the BCL analysis, a few additional parameters controlling the incremental classification are necessary:

`-l`    final number of cycles so that the software knows when it can stop waiting for additional BCL files to be generated

`-s`    wait for this many cycles to accumulate before starting classification. Note that this should
        be greater than 31, which is the k-mer size used by Kraken

`-k`    the number of cycles to wait before another incremental classification occurs

`-x`    which tiles on the flowcell to analyze (default is all).
        Example: `-x 1101 -x 1315 -x 2108`

`-y`    which of the lanes (usually 8 lanes) to analyze (default is all). Example: `-y 1 -y 6`

`-a`    whether to output intermediate results at the incremental classification steps. This 
        would make sense if you want to monitor the sequence composition during sequencing.

The rest of the Kraken options retained their semantics. If FASTA or FASTQ files are analysed the new options don't have to be used and everything should be backward compatible to the original code. 

For the original Kraken code and description, please see the [Kraken webpage].

Visualisation 
-----
Visualisation of the results can be done using the livekraken_sankey_diagram.py script in the Visualisation folder.
Starting the program as follows will list all available options:

`./livekraken_sankey_diagram.py -h`

The script has only one required parameter `-i` that defines the LiveKraken output files to use.
It can be used several times to add several cycles of output (ideally in chronological order).

Additional options, for which sensible defaults are defined already, include:

`-c`    Used to switch from a red-green to a red-blue color scheme for the flows between nodes, default is false  
`-s`    Used to "compress" unclassified nodes by only keeping a number of reads corresponding to the sum of flows from/to nodes other than unclassified, default is false  
`-r`    Used to set on which level to bin the classified reads, valid choices are 'species', 'genus', 'family', and 'order', default is family  
`-t`    Used to determine the top x nodes to display for every cycle (plus one node serving as bin for everyting else), default is 10  
`-o`    Used to set the output directory path for the html and json file, default is "sankey"  
`-m`    Used to set the path to the names.dmp for taxonomic resolution, default is "names.dmp"  
`-n`    Used to set the path to the nodes.dmp for taxonomic resolution, default is "nodes.dmp"  

[Kraken webpage]:   http://ccb.jhu.edu/software/kraken/
[Kraken]:   http://ccb.jhu.edu/software/kraken/

Supported Sequencers
-------------
Due to differences in the structure and compression of raw sequencing data, LiveKraken currently only supports sequencers producing gz-compressed bcl (not cbcl) files. We are working on the extension of our approach for further devices.
Supported sequencers:
* Miseq
* HiSeq 1500
* HiSeq 2000/2500/3000/4000/X (untested)

Currently not supported:
* NovaSeq
* MiniSeq
* NextSeq500/550

Contact
-------

Please consult the [LiveKraken project website] for questions!

If this does not help, please feel free to contact:
* [Simon Tausch](mailto:simon.tausch@bfr.bund.de) and [Benjamin Strauch](mailto:b.strauch@fu-berlin.de) for technical questions
* [Bernhard Renard](mailto:renardb@rki.de) as project head

[LiveKraken project website]: https://gitlab.com/SimonHTausch/LiveKraken