1 Star 0 Fork 0

BXXu / deNOPA

Create your Gitee Account
Explore and code with more than 6 million developers,Free private repositories !:)
Sign up
Clone or Download
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.en.md

deNOPA

Introduction

As the basal bricks, the dynamics and arrangement of nucleosomes orchestrate the higher architecture of chromatin in a fundamental way, thereby affecting almost all nuclear biology processes. Thanks to its rather simple protocol, ATAC-seq has been rapidly adopted as a major tool for chromatin-accessible profiling at both bulk and single-cell level; however, to picture the arrangement of nucleosomes per se remains a challenge with ATAC-seq. We introduce a novel ATAC-seq analysis toolkit, named deNOPA, to predict nucleosome positions. Assessments showed that deNOPA not only outperformed state-of-the-art tools, but it is the only tool able to predict nucleosome position precisely with ultrasparse ATAC-seq data. The remarkable performance of deNOPA was fueled by the reads from short fragments, which compose nearly half of sequenced reads and are normally discarded from an ATAC-seq library. However, we found that the short fragment reads enrich information on nucleosome positions and that the linker regions were predicted by reads from both short and long fragments using Gaussian smoothing.

Install

Pre-requirements

The current version of deNOPA was developed with python 2.7. Python 3 is not supported by now. Besides a python environment, you need also get the following pre-requirements installed:

  • numpy
  • scipy
  • h5py
  • pysam
  • sklearn
  • statsmodels
Installation

Use the following commands to get deNOPA installed.

git clone https://gitee.com/bxxu/denopa.git

cd denopa

python setup.py install

Usage

usage: denopa [-h] -i INPUT [-o OUTPUT] [-b BUFFERSIZE] [-s CHROMSKIP] [-n NAME] [-m MAXLEN] [--proc PROC] [-p PARER] [-q QNFR] [-r]

Decoding the nucleosome positions with ATAC-seq data at single cell level

optional arguments:

-h, --help show this help message and exit

-i INPUT, --input INPUT

​ The input bam files. The files should be sorted. This argument could be given multiple times for multiple input files.

-o OUTPUT, --output OUTPUT

​ The directory where the output files will go to. It will be created if not exists (default .).

-b BUFFERSIZE, --bufferSize BUFFERSIZE

​ Number of reads buffered in reading the bam file (default 1000000).

-s CHROMSKIP, --chromSkip CHROMSKIP

​ Names of chromosomes skiped from the processing. Multiple values should be sepaated by ',' (default chrY,chrM).

-n NAME, --name NAME The name of the project (default deNOPA).

-m MAXLEN, --maxLen MAXLEN

​ The maximun fragment length in the input files (default 2000).

--proc PROC

​ Number of processors used in the analysis (default 1).

-p PARER, --pARER PARER

​ The p-value threshold used in determining the ATAC-seq reads enriched regions (ARERs, default 0.1).

-q QNFR, --qNFR QNFR

​ The q-value threshold used in determining the nucleosome free regions (NFRs, default 0.1).

-r, --removeIntermediateFiles

​ The intermediate files will be removed if this flag is set.

Output files

Results
  • {NAME}_ARERs.txt: The bet like file of the detected ARERs. The chromosome name, start position, end position, local maximum point, maximum signal and p-value of each ARER were recorded.
  • {NAME}_NFR.txt: A standard broadPeak formatted bed file for detected NFRs.
  • {NAME}_nucleosomes.txt: A bed like file for all detected nucleosomes. For each nucleosome, the chromosome name, start position, end position, left inflection point, center position, right inflection point, number of sequencing fragments intersected, number of sequencing fragments covered it, number of Tn5 cutting sites in its inner, p-value indicating whether the number of sequencing fragments covering the nucleosome was large enough, p-value indicating whether the number of Tn5 cutting events in its inner was small enough, the combination of the about two p-values, whether the nucleosome was dynamic were listed respectively.
Intermediated files
  • {NAME}_candidates.pkl: All the detected nucleosome candidates before the DBSCAN outlier detection was applied.
  • {NAME}_frag_len.pkl: The fragment length distribution of the ATAC-seq library.
  • {NAME}_pileup_signal.hdf: The raw coverage and cutting sites signal profiles.
  • {NAME}_smooth.hdf: The smoothed signal profiles and their derivations.

Citation

Repository Comments ( 0 )

Sign in to post a comment

About

No description expand collapse
Python
GPL-3.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
Python
1
https://gitee.com/bxxu/denopa.git
git@gitee.com:bxxu/denopa.git
bxxu
denopa
deNOPA
master

Search