# Tag-seq **Repository Path**: zhoujj2013/Tag-seq ## Basic Information - **Project Name**: Tag-seq - **Description**: No description available - **Primary Language**: Perl - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-22 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Tag-seq ## Description of Tag-seq data analysis For Tag-seq data analysis, we retained fragments that contain an intact Tag at the beginning of read2 (second of pair). Then, reads were mapped to the reference genome (hg19) using STAR2 after quality filtering, then PCR duplications were removed using UMI-tools. To identify candidate DSBs, the start mapping positions were grouped if the distance among them is less than ten bps, resulting in editing hotspots induced by RGNs. Then, the peaks with sufficient reads were detected in RGNs hotspot. Furthermore, the peaks with reads mapping to both + and - strands, or the same strand but amplified with both forward and reverse tag-specific primers, are flagged as sites of potential DSBs. The flanking regions of potential DSBs match gRNA identified as on-target sites using a Smith-Waterman local-alignment algorithm. Identified off-targets sorted by Tag-seq read count are annotated in a final output table and visualize as a pdf file. ## System requirements Tag-seq runs under the Linux (i.e., Centos, see also https://www.centos.org/ for further details) on a 64-bit machine with at least 32 GB RAM. Tag-seq requires PERL v5, R, Python 2.7, [pip](https://bootstrap.pypa.io/get-pip.py) and several python packages listed in [python.package.requirement.txt](https://github.com/zhoujj2013/Tag-seq/blob/master/python.package.requirement.txt).; Tag-seq also requires some third-party packages: [STAR aligner](https://github.com/alexdobin/STAR) [FASTQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [AdapterRemoval](https://github.com/MikkelSchubert/adapterremoval) [BEDTOOLS](https://bedtools.readthedocs.io/en/latest/) [SAMTOOLS](http://samtools.sourceforge.net/) [PICARD](https://broadinstitute.github.io/picard/) [umi_tools](https://github.com/CGATOxford/UMI-tools) [bedops](https://bedops.readthedocs.io/en/latest/) water in [EMBOSS](http://emboss.sourceforge.net/download/) [RIdeogram](https://github.com/TickingClock1992/RIdeogram) Tag-seq have been tested in CentOS release 7.4 (Linux OS 64 bit). ## Run by Docker ### Install docker ``` # update source sudo apt-get update # apt through https sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ software-properties-common # add GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - # get the stable version sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" # update source sudo apt-get update # install docker-ce sudo apt-get install docker-ce # set user information, so that we can run docker without sudo sudo usermod -a -G docker $USER # exit and login again ``` ### create docker image ``` git clone https://github.com/zhoujj2013/Tag-seq.git --depth 1 cd Tag-seq/docker/ docker build -t ubuntu:tagseq . docker run -i -t ubuntu:tagseq echo "hello world!" cd - ``` ### run Tag-seq Prepare reference ``` # prepare reference mkdir ref && cd ref rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz . gunzip hg19.fa.gz samtools faidx hg19.fa rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes . mkdir star_index /path_to_star/STAR --runMode genomeGenerate --genomeDir ./ --genomeFastaFiles ../hg19.fa --runThreadN 8 cd ../../ ``` Prepare fastq dataset ``` # prepare fastq dataset as follow: $tree data/ data/ ├── 4P_L_R1.fq ├── 4P_L_R2.fq ├── 4P_R_R1.fq ├── 4P_R_R2.fq └── README.txt 0 directories, 5 files ``` Run Tag-seq ``` cd Tag-seq/test gunzip data/*.fq.gz docker run -v /path_to/Tag-seq/test:/mnt/tagseq -v /path_to/ref:/mnt/tagseq/ref -w /mnt/tagseq -i -t ubuntu:tagseq perl /docker_main/software/Tag-seq/bin/run_guideseq.pl ./config.docker.txt all #check the result ``` ## Installation ### Get Tag-seq pipeline ``` git clone https://github.com/zhoujj2013/Tag-seq.git --depth 1 ``` ### Install require python packages ``` cd ./Tag-seq pip install -r python.package.requirement.txt --user ``` ### Preparation Download reference genome and build index. ``` # download genome mkdir hg19 cd hg19 wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.chrom.sizes gunzip hg19.fa.gz # build genome index /path_to/STAR --runMode genomeGenerate --genomeDir ./ --genomeFastaFiles ./hg19.fa --runThreadN 32 ``` ### Run test If you have obtained the reference genome, STAR index, you can run test to examine whether the package works well (the test dataset is placed in ./test directory within Tag-seq). Tag-seq requires a [sgrna.lst](https://github.com/zhoujj2013/Tag-seq/blob/master/test/sgrna.lst) and a configure file containing paths of input files, sgRNA, Tag primers and genome etc. (See [config.TEST.txt](https://github.com/zhoujj2013/Tag-seq/blob/master/test/config.TEST.txt) for more details.) ``` cd test # run ########## the content of work.sh ######### # gunzip data/*.fq.gz # perl ../bin/run_guideseq.pl config.TEST.txt all > config.TEST.log 2>config.TEST.err ########################################### sh work.sh # around 30 mins. # you can check the report in out.XXX/sgrna_id.find.target/. # you can identify off-targets for multiple sgRNA simultaneously. ``` ### Result #### 1. QC statistics you can check [stat.txt](https://github.com/zhoujj2013/Tag-seq/blob/master/stat.txt). #### 2. Information of potential targets in bed format ``` chr1 10111 10112 AAVS1.E_minus_minus_2_9,AAVS1.E_plus_minus_1_13 0 29 0 12 chr1 55903742 55903743 AAVS1.E_minus_minus_3669_8,AAVS1.E_plus_minus_5324_6 0 11 0 17 chr1 68164302 68164303 AAVS1.E_minus_minus_4802_6,AAVS1.E_plus_plus_6944_6 8 0 0 7 chr1 111700139 111700140 AAVS1.E_minus_minus_7763_6,AAVS1.E_plus_plus_11377_6 9 0 0 5 chr1 121478642 121478643 AAVS1.E_minus_plus_8420_6,AAVS1.E_plus_plus_12435_6 9 0 7 0 ``` Column 1: chromosome Column 2: start Column 3: end Column 4: id Column 5: read count for plus strand in plus library Column 6: read count for minus strand in plus library Column 7: read count for plus strand in minus library Column 8: read count for minus strand in minus library #### 3. Potential off-targets Illustrate of off-targets sites and read count. ![off-targets](https://upload-images.jianshu.io/upload_images/4180410-e4d77af6060e2f8a.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) #### 4. Read counts across sgRNA in target and off-target sites ![sites](https://upload-images.jianshu.io/upload_images/4180410-35ecdde6cda3c9c4.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) #### 5. Global view of target and off-target sites ![global](https://upload-images.jianshu.io/upload_images/4180410-5303467568095e98.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) ## Tag-seq Runtime The running time of Tag-seq depends on the size of sequencing depth (For 30M flagments, it takes 30mins). ## Please cite 1. xxxx Tag-seq (submitted)