# SVision **Repository Path**: kylintu/SVision ## Basic Information - **Project Name**: SVision - **Description**: No description available - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-25 - **Last Updated**: 2021-06-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README svision_logo SVision is a deep learning-based structural variants caller that takes aligned reads or contigs as input. Especially, SVision implements a targeted multi-objects recognition framework, detecting and characterizing both simple and complex structural variants from three-channel similarity images. SVision workflow ## License SVision is free for non-commercial use by academic, government, and non-profit/not-for-profit institutions. A commercial version of the software is available and licensed through Xi’an Jiaotong University. For more information, please contact with Jiadong Lin (jiadong324@stu.xjtu.edu.cn) or Kai Ye (kaiye@xjtu.edu.cn). ## Install Step1: Create a python environment with conda ``` conda create -n svision-env python=3.6 ``` Step2: Install required packages of specific versions ``` conda install -c bioconda pysam conda install -c conda-forge opencv==4.5.1 conda install -c conda-forge tensorflow==1.14.0 ``` Step3: Install SVision from source ``` python setup.py install ``` The Pip and Conda install would be available after the **-beta* version. ## Usage ``` SVision [parameters] -o -b -g -m ``` Check all parameters with ``` SVision -h ``` Please check the [wiki](https://github.com/xjtu-omics/SVision/wiki) page for more usage details. #### Input/output parameters ``` -o OUT_PATH Absolute path to output -b BAM_PATH Absolute path to bam file -m MODEL_PATH Absolute path to CNN predict model -g GENOME Absolute path to your reference genome (.fai required in the directory) -n SAMPLE Name of the BAM sample name ``` ```-g``` path to the reference genome, the index file should under the same directory. ```-m``` path to the pre-trained deep learning model *svision-cnn-model.ckpt* ([download link](https://drive.google.com/drive/folders/1j74IN6kPKEx9hy3aENx3zHYPUnyYWGvj?usp=sharing)). Please download all files and save them to your local directory. #### General parameters ``` -t THREAD_NUM Thread numbers [1] -s MIN_SUPPORT Min support read number for an SV [1] -c CHROM Specific region to detect, format: chr1:xxx-xxx or 1:xxx-xxx --hash Activate hash table to align unmapped sequences --qnames Report support names for each events --cluster Cluster calls that might occur together --graph Report graph for events --contig Activate contig mode ``` ```--hash``` enables kmer based alignment for unmapped sequences. ```--graph``` enables the program to create the CSV graph in GFA format. This might increase the running time for high-coverage sequencing (>50X). ```--contig``` is used for calling from assemblies, which currently uses minimap2 aligned BAM file as input. ## SVision output ### VCF The SV ```ID``` column is given in the format of ```a_b```, where ```b``` indicates site ```a``` contains other type of SVs. Filters used in the output. ```Covered```: The entire SV is spanned by long-reads, producing the most confident calls. ```Uncovered```: SV is partially spanned by long-reads, i.e. reads spanning one of the breakpoints. ```Clustered```: SV is partially spanned by long-reads, but can be spanned through reads clusters. We add extra attributes in the ```INFO``` column of VCF format for SVision detected structural variants. ```BRPKS```: The CNN recognized breakpoint junctions through tMOR. ```GraphID```: The graph index used to indicate the graph structure, which requires ```--report_graph``` and is obtained by calculating isomorphic graphs. The ID for simple SVs is -1. ```VAF```: The estimated variant allele fraction, which is calculated by DV/DR. Note that SVision does not calculate the genotypes in the current version. ### CSV graph #### CSV graph compare SVision classify the graph of each CSV instances by comparing their graph topologies. It will create two text (.txt) file along with the VCF output. 1. ```sample.graph_exactly_match.txt```: Unique graphs for all CSV instances. 2. ```sample.graph_symmetry_match.txt```: Topological similar graph found from unique graphs. #### Graph format The graph output requires ```--graph``` activated. The below example is an CSV in rGFA format (node sequence is omitted for display purpose), which is detected by SVision at chr11:99,819,283-99,820,576 in HG00733. The graph output is saved in separated files for each CSV events. ``` S S1 SN:Z:chr11 SO:i:99819338 SR:i:0 LN:i:2990 S I0 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:15813 SR:i:0 LN:i:1113 S I1 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:16927 SR:i:0 LN:i:466 S I2 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17400 SR:i:0 LN:i:377 DP:S:S1:99820198 S I3 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17778 SR:i:0 LN:i:838 S I4 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:18617 SR:i:0 LN:i:61 DP:S:S0:99819276 L S0 + I0 + 0M SR:i:0 L I0 + I1 + 0M SR:i:0 L I1 + I2 - 0M SR:i:0 L I2 - I3 + 0M SR:i:0 L I3 + I4 + 0M SR:i:0 L I4 + S1 + 0M SR:i:0 ``` Besides the information included in standard [rGFA](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md) format, we add another ```DP:S``` column to indicate sequence with detected origins via local realignment, such as node ```I2``` is duplicated from node ```S1```. #### Graph genotyping **Note**: This is a post-processing step that tries to validate the detected CSVs. **Step1: Extract HiFi raw reads** ``` samtools view -b HG00733.ngmlr.sorted.bam chr11:99810000-99830000 > tmp.bam samtools fasta tmp.bam > tmp.fasta ``` **Step2: Align with GraphAligner** Please check [GraphAligner](https://github.com/maickrau/GraphAligner) for the detailed usage. ``` GraphAligner -g chr11-99819283-99820576.gfa -f tmp.fasta -a aln.gaf -x vg ``` Example of CSV path supporting reads ``` m54329U_190827_173812/140708091/ccs     21668   0       21668   +       >S0>I0>I1I3>I4>S1 m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1I3>I4>S1 m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1I3>I4>S1 ``` ## Contact If you have any questions, please feel free to contact: jiadong66@stu.xjtu.edu.cn, songbowang125@163.com