# Solanaceae Pangenome **Repository Path**: Bioinformaticslab/solanaceae-pangenome ## Basic Information - **Project Name**: Solanaceae Pangenome - **Description**: Pipeline code for Solanaceae Pangenome - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2024-09-10 - **Last Updated**: 2025-01-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Solanaceae Pangenome ### 介绍 Pipeline code for Solanaceae Pangenome ### 使用说明 #### Construction of Solanaceae Pangenome ``` # install the mSynOrths, installation instructions at in https://gitee.com/zhanglingkui/msynorths # Identifying the syntenic gene pairs among 30 species python mSynOrths.py -f ./genome_pos.txt -m 0.3 -x 10 -n 6 -e 1e-20 -v 0.5 -i 50 -t 50 -o sola_30species_0224 # Identifying the transposed orthologs between 30 species python find_tranpos_gene.py -m sola_30species_0224 # Constructing the pan-genomic gene set python bulit_pangenome.py -m sola_30species_0224 -f mSynF1 -o ./sola_30_species_pan.txt # Constructing a phylogenetic tree ``` #### Gene Transposition and Gene Loss ``` #identifying the syntenic gene pairs among 10 species python mSynOrths.py -f ./genome_pos.txt -m 0.7 -x 10 -n 6 -e 1e-20 -v 0.6 -i 60 -t 50 -o sola_10_species_hight_1119/ # Constructing the pangenome python bulit_pangenome.py -m ./sola_10_species_hight_1119/ -f mSynF1 -o sola_10_species_hight_1119_pangenome.txt # Identifying the transposed orthologs between 30 species python find_tranpos_gene.py -m ./sola_10_species_hight_1119/ # filter gene group less than five genes python filter_pangenome.py sola_10_species_hight_1119_pangenome.txt > sola_10_species_hight_1119_pangenome_filter.txt # identifying the size of lost fragments python loop_find_pav.py > pav_size.txt # Counting the number of transposed genes python check_trans_seg.py ./ > trans_gene_num.txt ## Identifying the pseudogene # First extract the representative sequence of the suspected missing gene (using the cds of the longest gene of the family in which the missing gene is located to represent the missing gene) python find_pseud_gene_fasta.py ./sola_10_species_hight_1119/ ./ mkdir gene_region && cd gene_region/ #依赖的脚本拿进来 ##Identifying pseudogenes by aligning lost gene sequences to intergenic regions nohup python ../loop_inter_region.py ../sola_10_species_hight_1119/ & #Get to identify pseudogenes upstream and downstream, as well as gene regions python get_pseud_gene_bed.py ## Viewing gene densities upstream and downstream of transposition loss genes mkdir TE && cd TE && mkdir gene_bed ln -s ../../TE3/annot/ ./ && cd gene_bed #Obtain upstream and downstream sequences of transposable genes python get_trans_gene_bed.py ../../sola_10_species_hight_1119/ mv ../../gene_region/*loss*bed ./ ##Calculating upstream and downstream densities, lost genes, transposable genes, and covariate genes. python ../loop_te_density.py & ``` ### 参与贡献 1. Fork 本仓库 2. 新建 Feat_xxx 分支 3. 提交代码 4. 新建 Pull Request