# Solanaceae Pangenome

**Repository Path**: Bioinformaticslab/solanaceae-pangenome

## Basic Information

- **Project Name**: Solanaceae Pangenome
- **Description**: Pipeline code for Solanaceae Pangenome
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2024-09-10
- **Last Updated**: 2025-01-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Solanaceae Pangenome

### 介绍
Pipeline code for Solanaceae Pangenome


### 使用说明
#### Construction of Solanaceae Pangenome
```
# install the mSynOrths, installation instructions at  in https://gitee.com/zhanglingkui/msynorths
# Identifying the syntenic gene pairs among 30 species
python mSynOrths.py -f ./genome_pos.txt -m 0.3 -x 10 -n 6 -e 1e-20 -v 0.5 -i 50 -t 50 -o sola_30species_0224 

# Identifying the transposed orthologs between 30 species
python find_tranpos_gene.py -m sola_30species_0224

# Constructing the pan-genomic gene set
python bulit_pangenome.py -m sola_30species_0224 -f mSynF1 -o ./sola_30_species_pan.txt

# Constructing a phylogenetic tree


```
#### Gene Transposition and Gene Loss 
```
#identifying the syntenic gene pairs among 10 species
python mSynOrths.py -f ./genome_pos.txt -m 0.7 -x 10 -n 6 -e 1e-20 -v 0.6 -i 60 -t 50 -o sola_10_species_hight_1119/
# Constructing the pangenome 
python bulit_pangenome.py -m  ./sola_10_species_hight_1119/ -f mSynF1 -o sola_10_species_hight_1119_pangenome.txt
# Identifying the transposed orthologs between 30 species
python find_tranpos_gene.py -m ./sola_10_species_hight_1119/
# filter gene group less than five genes
python filter_pangenome.py sola_10_species_hight_1119_pangenome.txt > sola_10_species_hight_1119_pangenome_filter.txt
# identifying the size of lost fragments
python loop_find_pav.py > pav_size.txt

# Counting the number of transposed genes 
python check_trans_seg.py ./ > trans_gene_num.txt

## Identifying the pseudogene
# First extract the representative sequence of the suspected missing gene (using the cds of the longest gene of the family in which the missing gene is located to represent the missing gene)
python find_pseud_gene_fasta.py ./sola_10_species_hight_1119/ ./
mkdir gene_region && cd gene_region/
#依赖的脚本拿进来

##Identifying pseudogenes by aligning lost gene sequences to intergenic regions
nohup python ../loop_inter_region.py ../sola_10_species_hight_1119/ &
#Get to identify pseudogenes upstream and downstream, as well as gene regions
python get_pseud_gene_bed.py

## Viewing gene densities upstream and downstream of transposition loss genes
mkdir TE && cd TE && mkdir gene_bed 
ln -s ../../TE3/annot/ ./ && cd gene_bed
#Obtain upstream and downstream sequences of transposable genes
python get_trans_gene_bed.py ../../sola_10_species_hight_1119/

mv ../../gene_region/*loss*bed ./

##Calculating upstream and downstream densities, lost genes, transposable genes, and covariate genes.
python ../loop_te_density.py &


```

### 参与贡献

1.  Fork 本仓库
2.  新建 Feat_xxx 分支
3.  提交代码
4.  新建 Pull Request