# analysis-TAGADA **Repository Path**: zhangtong-swu/analysis-TAGADA ## Basic Information - **Project Name**: analysis-TAGADA - **Description**: TAGADA,一种用于转录本和基因组装、反卷积和分析的 RNA-seq 管道。给定基因组序列、参考注释和RNA-seq读数,TAGADA通过生成改进的注释来增强现有的基因模型。它还可以计算参考和新注释的表达值,鉴定长链非编码转录本 (lncRNA),并提供全面的质量控制报告。TAGADA使用Nextflow DSL2开发,提供用户友好的功能,并通过其容器化环境确保不同计算平台的可重复性。 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-12-15 - **Last Updated**: 2023-12-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TAGADA: Transcript And Gene Assembly, Deconvolution, Analysis TAGADA is a Nextflow pipeline that processes RNA-Seq data. It parallelizes multiple tasks to control reads quality, align reads to a reference genome, assemble new transcripts to create a novel annotation, and quantify genes and transcripts. ## Table of contents - [Dependencies](#dependencies) - [Usage](#usage) - [Nextflow options](#nextflow-options) - [Input and output options](#input-and-output-options) - [Merge options](#merge-options) - [Assembly options](#assembly-options) - [Skip options](#skip-options) - [Resources options](#resources-options) - [Custom resources](#custom-resources) - [Example configuration](#example-configuration) - [Metadata](#metadata) - [Example metadata](#example-metadata) - [Merging inputs](#merging-inputs) - [Merging inputs by a single factor](#merging-inputs-by-a-single-factor) - [Merging inputs by an intersection of factors](#merging-inputs-by-an-intersection-of-factors) - [Workflow and results](#workflow-and-results) - [Novel annotation](#novel-annotation) - [Funding](#funding) ## Dependencies To use this pipeline you will need: - [Nextflow](https://www.nextflow.io/docs/latest/getstarted.html) >= 21.04.1 - [Docker](https://docs.docker.com/engine/install/) >= 19.03.2 or [Singularity](https://sylabs.io/guides/3.5/user-guide/quick_start.html) >= 3.7.3 ## Usage A small dataset is provided to test this pipeline. To try it out, use this command: nextflow run FAANG/analysis-TAGADA -profile test,docker -revision 2.1.2 --output directory ### Nextflow options The pipeline is written in Nextflow, which provides the following default options:
| Option | Example | Description | Required |
|---|---|---|---|
-profile |
profile1,profile2,etc. |
Profile(s) to use when running the pipeline. Specify the profiles that fit your infrastructure among singularity, docker, kubernetes, slurm. |
Required |
-config |
custom.config |
Configuration file tailored to your infrastructure and dataset. To find a configuration file for your infrastructure, browse nf-core configs. Some large datasets require more computing resources than the pipeline defaults. To specify custom resources for specific processes, see the custom resources section. |
Optional |
-revision |
version |
Version of the pipeline to launch. | Optional |
-work-dir |
directory |
Work directory where all temporary files are written. | Optional |
-resume |
Resume the pipeline from the last completed process. | Optional |
| Option | Example | Description | Required |
|---|---|---|---|
--output |
directory |
Output directory where all results are written. | Required |
--reads |
'path/to/reads/*' |
Input fastq file(s) and/or bam file(s).For single-end reads, your files must end with: .fq[.gz]For paired-end reads, your files must end with: _[R]{1,2}.fq[.gz]For aligned reads, your files must end with: .bamIf the provided path includes a wildcard character like *, you must enclose it with quotes to prevent Bash glob expansion, as per Nextflow's requirements.If the files are numerous, you may provide a .txt sheet with one path or url per line. |
Required |
--annotation |
annotation.gtf |
Input reference annotation file or url. |
Required |
--genome |
genome.fa |
Input genome sequence file or url. |
Required |
--index |
directory |
Input genome index directory or url. |
Optional, to skip genome indexing |
--metadata |
metadata.tsv |
Input tabulated metadata file or url. |
Required if--assemble-byor --quantify-byare provided |
| Option | Example | Description | Required |
|---|---|---|---|
--assemble-by |
factor1,factor2,etc. |
Factor(s) defining groups in which transcripts are assembled. Aligned reads of identical factors are merged and each resulting merge group is processed individually. See the merging inputs section for details. | Optional |
--quantify-by |
factor1,factor2,etc. |
Factor(s) defining groups in which transcripts are quantified. Aligned reads of identical factors are merged and each resulting merge group is processed individually. See the merging inputs section for details. | Optional |
| Option | Example | Description | Required |
|---|---|---|---|
--min-transcript-occurrence |
2 |
After transcripts assembly, rare novel transcripts that appear in few assembly groups are removed from the final novel annotation. By default, if a transcript occurs in less than 2 assembly groups, it is removed. If there is only one assembly group, this option defaults to 1. |
Optional |
--min-monoexonic-occurrence |
2 |
If specified, rare novel monoexonic transcripts are filtered according to the provided threshold. Otherwise, this option takes the value of--min-transcript-occurrence. |
Optional |
--min-transcript-tpm |
0.1 |
After transcripts assembly, novel transcripts with low TPM values in every assembly group are removed from the final novel annotation. By default, if a transcript's TPM value is lower than 0.1 in every assembly group, it is removed. |
Optional |
--min-monoexonic-tpm |
1 |
If specified, novel monoexonic transcripts with low TPM values are filtered according to the provided threshold. Otherwise, this option takes the value of--min-transcript-tpm * 10. |
Optional |
--coalesce-transcripts-with |
tmerge |
Tool used to coalesce transcripts assemblies into a non-redundant set of transcripts for the novel annotation. Can be tmerge or stringtie. Defaults to tmerge. |
Optional |
--tmerge-args |
'--endFuzz 10000' |
Custom arguments to pass to tmerge when coalescing transcripts. | Optional |
--feelnc-filter-args |
'--size 200' |
Custom arguments to pass to FEELnc's filter script when detecting long non-coding transcripts. | Optional |
--feelnc-codpot-args |
'--mode shuffle' |
Custom arguments to pass to FEELnc's coding potential script when detecting long non-coding transcripts. | Optional |
--feelnc-classifier-args |
'--window 10000' |
Custom arguments to pass to FEELnc's classifier script when detecting long non-coding transcripts. | Optional |
| Option | Example | Description | Required |
|---|---|---|---|
--skip-assembly |
Skip transcripts assembly with StringTie and skip all subsequent processes working with a novel annotation. | Optional | |
--skip-lnc-detection |
Skip detection of long non-coding transcripts in the novel annotation with FEELnc. | Optional |
| Option | Example | Description | Required |
|---|---|---|---|
--max-cpus |
16 |
Maximum number of CPU cores that can be used for each process. This is a limit, not the actual number of requested CPU cores. | Optional |
--max-memory |
64GB |
Maximum memory that can be used for each process. This is a limit, not the actual amount of allotted memory. | Optional |
--max-time |
24h |
Maximum time that can be spent on each process. This is a limit and has no effect on the duration of each process. | Optional |
| Metadata | Transcripts assembly by tissue |
Annotation | Quantification by stage |
||
|---|---|---|---|---|---|
| input | tissue | stage | |||
| A | liver | 30 days | A, B, C ↓ liver |
liver, muscle ↓ novel annotation |
A, B ↓ 30 days |
| B | liver | 30 days | |||
| C | liver | 60 days | C, D ↓ 60 days |
||
| D | muscle | 60 days | D ↓ muscle |
||
| Metadata | Transcripts assembly by tissue and stage |
Annotation | Quantification by input |
||
|---|---|---|---|---|---|
| input | tissue | stage | |||
| A | liver | 30 days | A, B ↓ liver at 30 days |
liver at 30 days, liver at 60 days, muscle at 60 days ↓ novel annotation |
A |
| B | liver | 30 days | B | ||
| C | liver | 60 days | C ↓ liver at 60 days |
C | |
| D | muscle | 60 days | D ↓ muscle at 60 days |
D | |