# deNOPA **Repository Path**: bxxu/denopa ## Basic Information - **Project Name**: deNOPA - **Description**: deNOPA, a sensitive nucleosome positioning algorithm for sparse ATAC-seq data. - **Primary Language**: Python - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2021-01-16 - **Last Updated**: 2025-08-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # deNOPA ### Introduction *A*s the basal bricks, the dynamics and arrangement of nucleosomes orchestrate the higher architecture of chromatin in a fundamental way, thereby affecting almost all nuclear biology processes. Thanks to its rather simple protocol, ATAC-seq has been rapidly adopted as a major tool for chromatin-accessible profiling at both bulk and single-cell level; however, to picture the arrangement of nucleosomes *per se* remains a challenge with ATAC-seq. We introduce a novel ATAC-seq analysis toolkit, named **deNOPA**, to predict nucleosome positions. Assessments showed that deNOPA not only outperformed state-of-the-art tools, but it is the only tool able to predict nucleosome position precisely with ultrasparse ATAC-seq data. The remarkable performance of deNOPA was fueled by the reads from short fragments, which compose nearly half of sequenced reads and are normally discarded from an ATAC-seq library. However, we found that the short fragment reads enrich information on nucleosome positions and that the linker regions were predicted by reads from both short and long fragments using Gaussian smoothing. See the basic workflow of deNOPA as follows: ![Fig.1]() ### Install ##### Pre-requirements The deNOPA package was initially developed using python 2.7. The support of python 3 has also been added at version 1.0.2 (tested python 3.7). The package was tested under the default environment of Anaconda-5.3.1, both python 2.7 and python 3.7 version. Besides a python environment, the following dependencies were also needed. * numpy * scipy * h5py * pysam * sklearn * statsmodels Please make sure they were properly installed ahead of the deNOPA package itself. Please also use the python 3 version as far as possible. Only this version is maintained now. We also offered a tested pre-requirements list in the requirements.txt. User can quickly build an environment using `pip install -r requirements.txt` ##### Install from source code Use the following commands to get deNOPA installed. ``` git clone https://gitee.com/bxxu/denopa.git cd denopa python setup.py install ``` ##### Using docker image. To unify the user experience, the denopa package has been containerized as a Docker image. Users can pull it from Docker Hub using the following command: ``` docker pull hepingshiming2007/denopa:latest ``` After pulling successfully, you can start the container with the following command: ``` docker run -it --name denopa docker.1ms.run/hepingshiming2007/denopa bash ``` After that, the `denopa` command inside the container will be immediately available. ### Usage The bam files should be indexed using samtools before running the package. The package was only tested in alignments from bowtie2. The compatibility to other aligners is not guaranteed. ``` usage: denopa [-h] -i INPUT [-o OUTPUT] [-b BUFFERSIZE] [-s CHROMSKIP] [-c CHROMINCLUDE] [-n NAME] [-m MAXLEN] [--proc PROC] [-p PARER] [-q QNFR] [-r] Decoding the nucleosome positions with ATAC-seq data at single cell level optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT The input bam files. The files should be sorted. This argument could be given multiple times for multiple input files. -o OUTPUT, --output OUTPUT The directory where the output files will go to. It will be created if not exists (default .). -b BUFFERSIZE, --bufferSize BUFFERSIZE Number of reads buffered in reading the bam file (default 1000000). -s CHROMSKIP, --chromSkip CHROMSKIP Names of chromosomes skiped from the processing. Multiple values should be sepaated by ',' (default chrY,chrM). -c CHROMINCLUDE, --chromInclude CHROMINCLUDE The regular expression of chromosome names included in the analysis, for human genome 'chr[1-9][0-9]{,1}|chrX' should be enough. It can be combined with -s. -n NAME, --name NAME The name of the project (default deNOPA). -m MAXLEN, --maxLen MAXLEN The maximun fragment length in the input files (default 2000). --proc PROC Number of processors used in the analysis (default 1). -p PARER, --pARER PARER The p-value threshold used in determining the ATAC-seq reads enriched regions (ARERs, default 0.1) -q QNFR, --qNFR QNFR The q-value threshold used in determining the nucleosome free regions (NFRs, default 0.1). -r, --removeIntermediateFiles The intermediate files will be removed if this flag is set. ``` ### Output files ##### Results * {NAME}_ARERs.txt: The bet like file of the detected ARERs. The chromosome name, start position, end position, local maximum point, maximum signal and p-value of each ARER were recorded. * {NAME}_NFR.txt: A standard broadPeak formatted bed file for detected NFRs. * {NAME}_nucleosomes.txt: A bed like file for all detected nucleosomes. For each nucleosome, the chromosome name, start position, end position, left inflection point, center position, right inflection point, number of sequencing fragments intersected, number of sequencing fragments covered it, number of Tn5 cutting sites in its inner, p-value indicating whether the number of sequencing fragments covering the nucleosome was large enough, p-value indicating whether the number of Tn5 cutting events in its inner was small enough, the combination of the about two p-values, whether the nucleosome was dynamic were listed respectively. ##### Intermediated files * {NAME}_candidates.pkl: All the detected nucleosome candidates before the DBSCAN outlier detection was applied. * {NAME}_frag_len.pkl: The fragment length distribution of the ATAC-seq library. * {NAME}_pileup_signal.hdf: The raw coverage and cutting sites signal profiles. * {NAME}_smooth.hdf: The smoothed signal profiles and their derivations. ### Test data Sparse data does not mean no data. When reads are too sparse, deNOPA cannot detect any ARER and will certainly fail. The package has been tested to work for data with 600K or more read pairs for *Saccharomyces cerevisiae* or 10M or more read pairs for human or mouse. Here we provided a test dataset in the "test" directory which contained about 25% (604K) aligned fragments in SRR1822145, together with the results. ### Citation Please add the following citation if you use deNOPA in your study: > Xu B, Li X, Gao X, et al. DeNOPA: decoding nucleosome positions sensitively with sparse ATAC-seq data[J]. Briefings in Bioinformatics, 2022, 23(1): bbab469. ### Contacts You can send questions, discussions, bug reports and other useful information to Zhihua Zhang (zangzhihua@big.ac.cn) or Bingxiang Xu (xubingxiang@sus.edu.cn).