# OcBSA **Repository Path**: Bioinformaticslab/OcBSA ## Basic Information - **Project Name**: OcBSA - **Description**: OcBSA specifically for QTL mapping in F1 populations. Developed by: zhanglk960127@163.com - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 11 - **Forks**: 1 - **Created**: 2023-11-05 - **Last Updated**: 2025-01-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: QTL, BSA, F1, outcross ## README <<<<<<< HEAD # OcBSA: an NGS-based Bulk Segregant Analysis Tool for Outcross Populations. # OcBSA 用于F1分离群体的BSA分析软件 ### Please cite "OcBSA: an NGS-based Bulk Segregant Analysis Tool for Outcross Populations, Moleculat Plant, 2024. https://doi.org/10.1016/j.molp.2024.02.011" ## Windows version > OcBSA.exe can be used directly run on windows computer; 直接下载OcBSA.exe就可以直接在windows电脑上运行。

>If you want to run it as a command line on a windows computer, you can download the 'windows' folder locally and run the OcBSA.py script, taking care to install same package according to the environment.yaml file.
## Linux Version ### QTL analyse > optional arguments: -h, --help show this help message and exit -p1 PARENT1 (Column number of dominant parent in the VCF (Counting from 0); 显性亲本在vcf中的列数 (从0开始数)) -p2 PARENT2 (Column number of another parent in the VCF(Counting from 0); 隐性亲本在vcf中的列数 (从0开始数)) -b1 POOL1 (Column number of pool with dominant trait in the VCF (Counting from 0); 具有显性表型的混池在vcf中的列数 (从0开始数)) -b2 POOL2 (Column number of pool with Recessive trait in the VCF (Counting from 0); 具有隐性表型的混池在vcf中的列数 (从0开始数)) -d1 PARENTDEP1 (Minimum coverage of the parents; 亲本的最低覆盖度) -d2 POOLDEP1 (Minimum coverage of the pools; 混池的最低覆盖度) -d3 PARENTDEP2 (Maximum coverage of the parents; 亲本的最高覆盖度) -d4 POOLDEP2 (Maximum coverage of the pools; 混池的最高覆盖度) -w WIN (Size of sliding windows; 滑窗的大小) -vcf INPUT_VCF (Path of VCF file; vcf文件 ) -table INPUT_FORMAT (Path of simple VCF file; 本程序简易的VCF文件格式,替换VCF文件的输入
-OcValue INPUT_OcValue (The intermediate file (.OcValue) generated earlier can be used to resize the window; OcValue文件, 如果只是调整窗口大小, 可以使用之前生成的中间文件(.OcValue)) -o OUT (Name of output file; 输出文件名)
### Run OC-BSA ``` # run with VCF file python OC_BSA.py -vcf ./potato_example.vcf -p1 -2 -p2 -1 -b1 -3 -b2 -4 -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt python OC_BSA.py -vcf ./potato_example.vcf -p1 11 -p2 12 -b1 10 -b2 9 -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt # run with one simple table file python OC_BSA.py -table potato_example.vcf.table -w 200000 -o ./potato_example.vcf_200k_OCBSA.txt # just change the windows size python OC_BSA.py -OcValue potato_example.vcf_200k_OCBSA.txt.OcValue -w 100000 -o ./potato_example.vcf_100k_OCBSA.txt ```
*** ## Drawing the results > optional arguments: -h, --help show this help message and exit
--version show program's version number and exit
-f INPUTFILE, --inputfile input file for plot, output_file of OcBSA or F2_BSA;
-OcValue, --OcValue plot for OcValue
-snpindex, --snpindex plot for snpindex
-ED, --ED plot for ED
-p POSITION, --position Select a coordinate to plot a portion of it. e.g.chr10,1,10000
-c COLOR, --color COLOR Choose a gradient color (color of heatmap) for the dot plot
-o OUTPUT, --output OUTPUT The name of the output figure, ending with .png or .pdf >The color of the BSA figure can be specified using the -c parameter. For a detailed list of available colors, please refer to the following URL. https://matplotlib.org/stable/gallery/color/colormap_reference.html ``` #Plotting the whole genome python bsa_fig.py -f potato_example.vcf_200k_OCBSA.txt -OcValue -o test.png #Plotting 1Mb to 6Mb on chromosome 10 python bsa_fig.py -f potato_example.vcf_200k_OCBSA.txt -p chr10,50000000,60000000 -OcValue -o test.png ``` *** ## Primer design > NOTE: The blast software and primer3-py package needs to be installed. 需要提前装好blast和python模块:primer3-py ``` conda install primer3-py ``` > Optional arguments of Primer design: -h, --help show this help message and exit -g G path to genome; 参考基因组
-OcValue OcValue path to OcValue file; OcBSA的输出文件OcValue
-i I path to genome reigon, eg. chr11,0,10000; 目标区间
-f F output folder; 输出文件夹
-o , --output path to output file; 输出文件名
, default =output.primer.extracted -n , --number number of candidate primer pairs to pick, default = 10 -k , --flank flaning length of indel, default=200
-s , --short shortest acceptable primer, default = 18 -O , --OPPORTUNE most acceptable primer, default = 20 -l , --long longest acceptable primer, default = 24 -S , --SHORT shortest product , default = 70 -L , --LONG longest product , default = 200 -m , --mintemp min Tm in celsius, default = 50 -x , --maxtemp max Tm in celsius, default = 65 -M , --mingc min GC percentage, default = 35 -X , --maxgc max GC percentage, default = 65 -D , --tmdiff accepted TM difference to form primer pair, default = 0.5 ``` python primer_design.py -g ./rerence_genome.fa -OcValue ./output.vcf.OcValue -i chr10,56000000,57000000 -f ./primer/ ``` # BSA in F2 or other biparent population > To facilitate the use of the BSA algorithm in populations other than the F1 population (RILs, F2, F3), we provide a tool containing the snp-index and ED algorithms for your usage >为了方便在除了F1群体的其他群体(RILs, F2, F3)使用BSA算法,我们提供了一个包含snp-index和ED算法的工具供大家使用。 >optional arguments:
-h, --help show this help message and exit
-snpindex, --snpindex Option 1 description
-ED, --ED Option 2 description
-p1 PARENT1, --parent1 PARENT1 Column number of parent1 in the VCF; 亲本1在vcf中的列数
-p2 PARENT2, --parent2 PARENT2 Column number of arent2 in the VCF; 亲本2在vcf中的列数
-b1 POOL1, --pool1 POOL1 Column number of pool with parent1 trait in the VCF; 具有和亲本1同样表型的混池在vcf中的列数
-b2 POOL2, --pool2 POOL2 Column number of pool with parent2 trait in the VCF; 具有和亲本2同样表型的混池在vcf中的列数
-d1 PARENTDEP1, --parentdep1 PARENTDEP1 Minimum coverage of the parents; 亲本的最低覆盖度
-d2 POOLDEP1, --pooldep1 POOLDEP1 Minimum coverage of the pools; 混池的最低覆盖度
-d3 PARENTDEP2, --parentdep2 PARENTDEP2 Maximum coverage of the parents; 亲本的最高覆盖度
-d4 POOLDEP2, --pooldep2 POOLDEP2 Maximum coverage of the pools; 混池的最高覆盖度
-w WIN, --win WIN Size of sliding windows, 选择滑窗的大小
-vcf INPUT_VCF, --input_vcf INPUT_VCF Path of VCF file, vcf文件
-table INPUT_FORMAT, --input_format INPUT_FORMAT Path of simple VCF file, 本程序简易的VCF文件格式,替换VCF文件的输入
-infile INPUT_INFILE, --input_infile INPUT_INFILE The intermediate file (ED/snpindex) generated earlier can be used to resize the window; ED/snpindex文件, 如果只是调整窗口大小, 可以使用之前生成的中间文件(.ED/.snpindex)
-o OUT, --out OUT Name of output file; 输出文件名
## Run with ED ``` #Run with vcf python F2_BSA.py -ED -p1 -4 -p2 -3 -b1 -2 -b2 -1 -vcf test.vcf -o test.vcf_1M_ED.txt #Run with table python F2_BSA.py -ED -table example_file/F2_test.table -o test.table_1M_ED.txt #plot python bsa_fig.py -f test.vcf_1M_ED.txt -snpindex -o test.png python bsa_fig.py -f test.vcf_1M_ED.txt -snpindex -o test.png -p Chr01,1000000,16000000 ``` ## Run with snp-index ``` #Run with vcf python F2_BSA.py -snpindex -p1 -4 -p2 -3 -b1 -2 -b2 -1 -vcf test.vcf.vcf.table -o test.vcf_1M_snpindex.txt #Run with table python F2_BSA.py -snpindex -table example_file/F2_test.table -o test.table_1M_snpindex.txt #plot python bsa_fig.py -f test.table_1M_snpindex.txt -snpindex -o test.png python bsa_fig.py -f test.table_1M_snpindex.txt -snpindex -o test.png -p Chr01,1000000,16000000 ``` # Building mixing pools from vcf file >If you have individual sequencing VCF files and want to pool them into a pooled sample by yourself, you can follow this process. ``` python cluster_vcf.py -c individual_sequencing.vcf -f pool_config.txt -o out_table.txt ``` ### pool_config.txt #sampleID_N is the sample ID in the VCF file. #B1-sample ID of Pool1, B2-sample ID of Pool2, P1-sample ID of Parent1, P2-sample ID of Parent1 ``` B1:sampleID1,sampleID2,sampleID3,sampleID4,sampleID5,sampleID6,sampleID7,sampleID8,sampleID9,sampleID10, B2:sampleID11,sampleID12,sampleID13,sampleID14,sampleID15,sampleID16,sampleID17,sampleID18,sampleID19,sampleID20, P1:sampleID_P1 P2:sampleID_P2 ```