1 Star 1 Fork 0

BXXu / deNOPA

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 74.52 KB
一键复制 编辑 原始数据 按行查看 历史
xubx 提交于 2023-12-17 09:44 . v1.0.3

deNOPA

Introduction

As the basal bricks, the dynamics and arrangement of nucleosomes orchestrate the higher architecture of chromatin in a fundamental way, thereby affecting almost all nuclear biology processes. Thanks to its rather simple protocol, ATAC-seq has been rapidly adopted as a major tool for chromatin-accessible profiling at both bulk and single-cell level; however, to picture the arrangement of nucleosomes per se remains a challenge with ATAC-seq. We introduce a novel ATAC-seq analysis toolkit, named deNOPA, to predict nucleosome positions. Assessments showed that deNOPA not only outperformed state-of-the-art tools, but it is the only tool able to predict nucleosome position precisely with ultrasparse ATAC-seq data. The remarkable performance of deNOPA was fueled by the reads from short fragments, which compose nearly half of sequenced reads and are normally discarded from an ATAC-seq library. However, we found that the short fragment reads enrich information on nucleosome positions and that the linker regions were predicted by reads from both short and long fragments using Gaussian smoothing. See the basic workflow of deNOPA as follows:

Fig.1

Install

Pre-requirements

The deNOPA package was initially developed using python 2.7. The support of python 3 has also been added at version 1.0.2 (tested python 3.7). The package was tested under the default environment of Anaconda-5.3.1, both python 2.7 and python 3.7 version. Besides a python environment, the following dependencies were also needed.

  • numpy
  • scipy
  • h5py
  • pysam
  • sklearn
  • statsmodels

Please make sure they were properly installed ahead of the deNOPA package itself. Please also use the python 3 version as far as possible. Only this version is maintained now.

We also offered a tested pre-requirements list in the requirements.txt. User can quickly build an environment using

pip install -r requirements.txt

Install from source code

Use the following commands to get deNOPA installed.

git clone https://gitee.com/bxxu/denopa.git

cd denopa

python setup.py install
Install from pre-built files.

Download the compatible wheel file from the dist directory in this repo according to your version of python. Then get it installed using the following command:

pip install deNOPA-x.y.z-pyX-none-any.whl

Usage

The bam files should be indexed using samtools before running the package. The package was only tested in alignments from bowtie2. The compatibility to other aligners is not guaranteed.

usage: denopa [-h] -i INPUT [-o OUTPUT] [-b BUFFERSIZE] [-s CHROMSKIP]
              [-c CHROMINCLUDE] [-n NAME] [-m MAXLEN] [--proc PROC] [-p PARER]
              [-q QNFR] [-r]

Decoding the nucleosome positions with ATAC-seq data at single cell level

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        The input bam files. The files should be sorted. This
                        argument could be given multiple times for multiple
                        input files.
  -o OUTPUT, --output OUTPUT
                        The directory where the output files will go to. It
                        will be created if not exists (default .).
  -b BUFFERSIZE, --bufferSize BUFFERSIZE
                        Number of reads buffered in reading the bam file
                        (default 1000000).
  -s CHROMSKIP, --chromSkip CHROMSKIP
                        Names of chromosomes skiped from the processing.
                        Multiple values should be sepaated by ',' (default
                        chrY,chrM).
  -c CHROMINCLUDE, --chromInclude CHROMINCLUDE
                        The regular expression of chromosome names included in
                        the analysis, for human genome
                        'chr[1-9][0-9]{,1}|chrX' should be enough. It can be
                        combined with -s.
  -n NAME, --name NAME  The name of the project (default deNOPA).
  -m MAXLEN, --maxLen MAXLEN
                        The maximun fragment length in the input files
                        (default 2000).
  --proc PROC           Number of processors used in the analysis (default 1).
  -p PARER, --pARER PARER
                        The p-value threshold used in determining the ATAC-seq
                        reads enriched regions (ARERs, default 0.1)
  -q QNFR, --qNFR QNFR  The q-value threshold used in determining the
                        nucleosome free regions (NFRs, default 0.1).
  -r, --removeIntermediateFiles
                        The intermediate files will be removed if this flag is
                        set.

Output files

Results
  • {NAME}_ARERs.txt: The bet like file of the detected ARERs. The chromosome name, start position, end position, local maximum point, maximum signal and p-value of each ARER were recorded.
  • {NAME}_NFR.txt: A standard broadPeak formatted bed file for detected NFRs.
  • {NAME}_nucleosomes.txt: A bed like file for all detected nucleosomes. For each nucleosome, the chromosome name, start position, end position, left inflection point, center position, right inflection point, number of sequencing fragments intersected, number of sequencing fragments covered it, number of Tn5 cutting sites in its inner, p-value indicating whether the number of sequencing fragments covering the nucleosome was large enough, p-value indicating whether the number of Tn5 cutting events in its inner was small enough, the combination of the about two p-values, whether the nucleosome was dynamic were listed respectively.
Intermediated files
  • {NAME}_candidates.pkl: All the detected nucleosome candidates before the DBSCAN outlier detection was applied.
  • {NAME}_frag_len.pkl: The fragment length distribution of the ATAC-seq library.
  • {NAME}_pileup_signal.hdf: The raw coverage and cutting sites signal profiles.
  • {NAME}_smooth.hdf: The smoothed signal profiles and their derivations.

Test data

Sparse data does not mean no data. When reads are too sparse, deNOPA cannot detect any ARER and will certainly fail. The package has been tested to work for data with 600K or more read pairs for Saccharomyces cerevisiae or 10M or more read pairs for human or mouse. Here we provided a test dataset in the "test" directory which contained about 25% (604K) aligned fragments in SRR1822145, together with the results.

Citation

Please add the following citation if you use deNOPA in your study:

Xu B, Li X, Gao X, et al. DeNOPA: decoding nucleosome positions sensitively with sparse ATAC-seq data[J]. Briefings in Bioinformatics, 2022, 23(1): bbab469.

Contacts

You can send questions, discussions, bug reports and other useful information to Zhihua Zhang (zangzhihua@big.ac.cn) or Bingxiang Xu (xubingxiang@sus.edu.cn).

Python
1
https://gitee.com/bxxu/denopa.git
git@gitee.com:bxxu/denopa.git
bxxu
denopa
deNOPA
master

搜索帮助