# polyswitch

**Repository Path**: cx100/polyswitch

## Basic Information

- **Project Name**: polyswitch
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-03-17
- **Last Updated**: 2022-07-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# polyswitch
## Introduction
Polyswitch is a program to evaluate switch errors in haplotype-resolved genome assemblies, including heterozygous diploid and polyploid genomes, based on ONT ultra long data. A number of ONT ultra-long reads were selected and subsequently split into 5-kb non-overlapping windows. The 5-kb windows were BLASTn against the haplotype-resolved genome assembly using default parameters, and only windows that meet the following criteria were considered as validate windows and thus kept for further analysis: 1) For each window, we allow the bit score in the best match should be 1.5 times larger than the secondary match; 2) the best bit score should be larger than 1000. In addition, we also restricted that these reads with at least 5 valid windows were retained for assessment of switch errors. We next used the majority rule to determine the original location of the ONT reads. For instance, if more than 50% of validated windows in one ultra-long read mapped to Chr01A, we considered this read was originally from Chr01A rather than other alternative haplotypes. Any validate window against the original location was considered as a switch error.
## Installation
```
$ git clone https://github.com/eden00Chen/polyswitch.git
$ cd polyswitch
$ chmod +x bin/*
$ chmod +x scripts/*
$ export PATH=/your/path/to/polyswitch/scripts/:/your/path/to/polyswitch/bin/:$PATH
```
*example : export PATH=/home/software/polyswitch/scripts/:/home/software/polyswitch/bin:$PATH*  
*check : ployswitch -h*
## Dependencies
Following is a list of thirty-party programs that will be used in polyswitch pipeline.
* seqkit v2.1.0 +
* blast v2.3.0 +
* python v3.9.5 + (Python Toolkit : pandas、numpy、pysam、pyfaidx、Bio)
## Usage
```
polyswitch -1 reads.fasta -2 genome.fasta -o outdir -g genome_size [-r reads_number]
-1  :               reads file in FASTA format
-2  :               genome file in FASTA format
-o  :               outdir
-g  :               genome size(bp)
-r  :               select the number of reads,default:1000,recommend:4000
```

## Test
* test_data location:(data szie:1.3G)
```
wget ftp://43.138.130.14/pub/test_data.tar.gz
```
* command line:
```
polyswitch -1 reads_absolute_path -2 genome_absolute_path -o out_dir -g 146298676 -r 1000 -t 20
```
* result:
```
#version is: 1.1.0
#reads_file  : sample_reads.fasta
#genome_file : sample_genome.fasta
genome_size  : 146298676
reads_number : 1000
                  
  ***** Results: *****
                  
		switch_num : 562
		total_window_num : 5088
		total_reads_length : 85.93
		swtich_error(switch_num/total_window_num) : 11.05%
		swtich_error(switch_num/total_reads_length) : 6.54 /MB
Dependencies and versions:
  seqkit : 2.1.0
  blast  : 2.3.0+
```
## Polyswitch note
* Only supports ONT ultra-long reads.
* Support the input of data in GZ format, but the file naming rule must be filename.gz
* The input path and output path of the file must be absolute                  
  not: reads.fasta
  
  ex: /public/home/polyswitch_project/Data/reads.fasta