# OcBSA
**Repository Path**: Bioinformaticslab/OcBSA
## Basic Information
- **Project Name**: OcBSA
- **Description**: OcBSA specifically for QTL mapping in F1 populations.
Developed by: zhanglk960127@163.com
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 11
- **Forks**: 1
- **Created**: 2023-11-05
- **Last Updated**: 2025-01-20
## Categories & Tags
**Categories**: Uncategorized
**Tags**: QTL, BSA, F1, outcross
## README
<<<<<<< HEAD
# OcBSA: an NGS-based Bulk Segregant Analysis Tool for Outcross Populations.
# OcBSA 用于F1分离群体的BSA分析软件
### Please cite "OcBSA: an NGS-based Bulk Segregant Analysis Tool for Outcross Populations, Moleculat Plant, 2024. https://doi.org/10.1016/j.molp.2024.02.011"
## Windows version
> OcBSA.exe can be used directly run on windows computer; 直接下载OcBSA.exe就可以直接在windows电脑上运行。
>If you want to run it as a command line on a windows computer, you can download the 'windows' folder locally and run the OcBSA.py script, taking care to install same package according to the environment.yaml file.
## Linux Version
### QTL analyse
> optional arguments:
-h, --help show this help message and exit
-p1 PARENT1 (Column number of dominant parent in the VCF (Counting from 0); 显性亲本在vcf中的列数 (从0开始数))
-p2 PARENT2 (Column number of another parent in the VCF(Counting from 0); 隐性亲本在vcf中的列数 (从0开始数))
-b1 POOL1
(Column number of pool with dominant trait in the VCF (Counting from 0); 具有显性表型的混池在vcf中的列数 (从0开始数))
-b2 POOL2 (Column number of pool with Recessive trait in the VCF (Counting from 0); 具有隐性表型的混池在vcf中的列数 (从0开始数))
-d1 PARENTDEP1 (Minimum coverage of the parents; 亲本的最低覆盖度)
-d2 POOLDEP1 (Minimum coverage of the pools; 混池的最低覆盖度)
-d3 PARENTDEP2 (Maximum coverage of the parents; 亲本的最高覆盖度)
-d4 POOLDEP2 (Maximum coverage of the pools; 混池的最高覆盖度)
-w WIN (Size of sliding windows; 滑窗的大小)
-vcf INPUT_VCF (Path of VCF file; vcf文件 )
-table INPUT_FORMAT (Path of simple VCF file; 本程序简易的VCF文件格式,替换VCF文件的输入
-OcValue INPUT_OcValue (The intermediate file (.OcValue) generated earlier can be used to resize the window; OcValue文件, 如果只是调整窗口大小, 可以使用之前生成的中间文件(.OcValue))
-o OUT (Name of output file; 输出文件名)
### Run OC-BSA
```
# run with VCF file
python OC_BSA.py -vcf ./potato_example.vcf -p1 -2 -p2 -1 -b1 -3 -b2 -4 -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt
python OC_BSA.py -vcf ./potato_example.vcf -p1 11 -p2 12 -b1 10 -b2 9 -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt
# run with one simple table file
python OC_BSA.py -table potato_example.vcf.table -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt
# just change the windows size
python OC_BSA.py -OcValue potato_example.vcf_200k_OCBSA.txt.OcValue -w 100000 -o ./potato_example.vcf_100k_OCBSA.txt
```
***
## Drawing the results
> optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-f INPUTFILE, --inputfile
input file for plot, output_file of OcBSA or F2_BSA;
-OcValue, --OcValue plot for OcValue
-snpindex, --snpindex
plot for snpindex
-ED, --ED plot for ED
-p POSITION, --position
Select a coordinate to plot a portion of it. e.g.chr10,1,10000
-c COLOR, --color COLOR
Choose a gradient color (color of heatmap) for the dot plot
-o OUTPUT, --output OUTPUT
The name of the output figure, ending with .png or .pdf
>The color of the BSA figure can be specified using the -c parameter. For a detailed list of available colors, please refer to the following URL.
https://matplotlib.org/stable/gallery/color/colormap_reference.html
```
#Plotting the whole genome
python bsa_fig.py -f potato_example.vcf_200k_OCBSA.txt -OcValue -o test.png
#Plotting 1Mb to 6Mb on chromosome 10
python bsa_fig.py -f potato_example.vcf_200k_OCBSA.txt -p chr10,50000000,60000000 -OcValue -o test.png
```
***
## Primer design
> NOTE: The blast software and primer3-py package needs to be installed. 需要提前装好blast和python模块:primer3-py
```
conda install primer3-py
```
> Optional arguments of Primer design:
-h, --help show this help message and exit
-g G path to genome; 参考基因组
-OcValue OcValue path to OcValue file; OcBSA的输出文件OcValue
-i I path to genome reigon, eg. chr11,0,10000; 目标区间
-f F output folder; 输出文件夹
-o , --output path to output file; 输出文件名
, default =output.primer.extracted
-n , --number number of candidate primer pairs to pick, default = 10
-k , --flank flaning length of indel, default=200
-s , --short shortest acceptable primer, default = 18
-O , --OPPORTUNE most acceptable primer, default = 20
-l , --long longest acceptable primer, default = 24
-S , --SHORT shortest product , default = 70
-L , --LONG longest product , default = 200
-m , --mintemp min Tm in celsius, default = 50
-x , --maxtemp max Tm in celsius, default = 65
-M , --mingc min GC percentage, default = 35
-X , --maxgc max GC percentage, default = 65
-D , --tmdiff accepted TM difference to form primer pair, default = 0.5
```
python primer_design.py -g ./rerence_genome.fa -OcValue ./output.vcf.OcValue -i chr10,56000000,57000000 -f ./primer/
```
# BSA in F2 or other biparent population
> To facilitate the use of the BSA algorithm in populations other than the F1 population (RILs, F2, F3), we provide a tool containing the snp-index and ED algorithms for your usage
>为了方便在除了F1群体的其他群体(RILs, F2, F3)使用BSA算法,我们提供了一个包含snp-index和ED算法的工具供大家使用。
>optional arguments:
-h, --help show this help message and exit
-snpindex, --snpindex Option 1 description
-ED, --ED Option 2 description
-p1 PARENT1, --parent1 PARENT1
Column number of parent1 in the VCF; 亲本1在vcf中的列数
-p2 PARENT2, --parent2 PARENT2
Column number of arent2 in the VCF; 亲本2在vcf中的列数
-b1 POOL1, --pool1 POOL1
Column number of pool with parent1 trait in the VCF;
具有和亲本1同样表型的混池在vcf中的列数
-b2 POOL2, --pool2 POOL2
Column number of pool with parent2 trait in the VCF;
具有和亲本2同样表型的混池在vcf中的列数
-d1 PARENTDEP1, --parentdep1 PARENTDEP1
Minimum coverage of the parents; 亲本的最低覆盖度
-d2 POOLDEP1, --pooldep1 POOLDEP1
Minimum coverage of the pools; 混池的最低覆盖度
-d3 PARENTDEP2, --parentdep2 PARENTDEP2
Maximum coverage of the parents; 亲本的最高覆盖度
-d4 POOLDEP2, --pooldep2 POOLDEP2
Maximum coverage of the pools; 混池的最高覆盖度
-w WIN, --win WIN Size of sliding windows, 选择滑窗的大小
-vcf INPUT_VCF, --input_vcf INPUT_VCF
Path of VCF file, vcf文件
-table INPUT_FORMAT, --input_format INPUT_FORMAT
Path of simple VCF file, 本程序简易的VCF文件格式,替换VCF文件的输入
-infile INPUT_INFILE, --input_infile INPUT_INFILE
The intermediate file (ED/snpindex) generated earlier
can be used to resize the window; ED/snpindex文件,
如果只是调整窗口大小, 可以使用之前生成的中间文件(.ED/.snpindex)
-o OUT, --out OUT Name of output file; 输出文件名
## Run with ED
```
#Run with vcf
python F2_BSA.py -ED -p1 -4 -p2 -3 -b1 -2 -b2 -1 -vcf test.vcf -o test.vcf_1M_ED.txt
#Run with table
python F2_BSA.py -ED -table example_file/F2_test.table -o test.table_1M_ED.txt
#plot
python bsa_fig.py -f test.vcf_1M_ED.txt -snpindex -o test.png
python bsa_fig.py -f test.vcf_1M_ED.txt -snpindex -o test.png -p Chr01,1000000,16000000
```
## Run with snp-index
```
#Run with vcf
python F2_BSA.py -snpindex -p1 -4 -p2 -3 -b1 -2 -b2 -1 -vcf test.vcf.vcf.table -o test.vcf_1M_snpindex.txt
#Run with table
python F2_BSA.py -snpindex -table example_file/F2_test.table -o test.table_1M_snpindex.txt
#plot
python bsa_fig.py -f test.table_1M_snpindex.txt -snpindex -o test.png
python bsa_fig.py -f test.table_1M_snpindex.txt -snpindex -o test.png -p Chr01,1000000,16000000
```
# Building mixing pools from vcf file
>If you have individual sequencing VCF files and want to pool them into a pooled sample by yourself, you can follow this process.
```
python cluster_vcf.py -c individual_sequencing.vcf -f pool_config.txt -o out_table.txt
```
### pool_config.txt
#sampleID_N is the sample ID in the VCF file.
#B1-sample ID of Pool1, B2-sample ID of Pool2, P1-sample ID of Parent1, P2-sample ID of Parent1
```
B1:sampleID1,sampleID2,sampleID3,sampleID4,sampleID5,sampleID6,sampleID7,sampleID8,sampleID9,sampleID10,
B2:sampleID11,sampleID12,sampleID13,sampleID14,sampleID15,sampleID16,sampleID17,sampleID18,sampleID19,sampleID20,
P1:sampleID_P1
P2:sampleID_P2
```