the database was built for the software of GFAP
GAFP was developed by python
Total two parts are included into GFAP software. The databases contained annotated information (including pfam, GO, KEGG information) of 200 plant species. Among of them, the data in GFAP_database, GFAP_GO, GFAP_KEGG can be directly used by selecting the species in
Other information can be used after inputting the species name in the following dialog boxes.
When users want to perform DNA alignment, these files should be put into the database folder of GFAP. The species names should be written into
Total seven modules are designed in GFAP. The first module is the "Introduction" module:
It contains the brief instructions for other modules. Users can click the module name for obtaining the information.
The second module is the "Alignment" module. It is responsible for sequence alignment.
Protein alignment
Detail: ① input protein fasta file --->② select a species name --->③ set evalue (the default value is 1e-5) if you want to set another value ---> if the interested species is not in the species list, you can write its name in ④ --->⑤ select the save location and name your file.
Translation
Considering that some users may not have protein-sequence files, we set the following functions for translating DNA into protein sequences in batch.
Please note! The precondition of these functions is that the length of DNA sequences should be a multiple of three.
Detail: ① input DNA fasta file --->② format it --->③ save location and name your file
DNA alignment
We encourage users to perform the alignment process using protein sequences. However, it may be hard for some users to obtain protein sequences. Therefore, GFAP still supports the DNA alignment. The functions as follows:
Detail: ① automatically open the website for downloading DNA alignment file --->② input DNA fasta file --->③ format the file --->④ input the species name (such as Arabidopsis_thaliana) --->⑤ set evalue --->⑥ save location and name your file.
The third module is the "Pfam analysis" module. It is responsible for annotating sequences with protein-domain information.
① input the alignment result --->② select a species name --->if the species name not in the species list, the information should be put into ③ --->④ save location and name your file. The similar results as follows:
Furthermore, the function of ⑤ can show species name in yourself database. The statistics of the input-genes families can be performed by clicking ⑥
Considering that users may be interested in some special genes, we set the following functions for allowing users to extract annotating information of some genes
Detail: ①input gene IDs of the interested genes --->②input the annotating result file --->③ save location and name your file. Furthermore, the function of ④ shows the species name in the website databases.
The third module is the "GO analysis" module. It is responsible for annotating sequences with GO IDs and functional information.
GO ID
①input the alignment result --->②select species name --->if the species name not in the species list, the information should be put into ③ --->④ save location and name your file. After that, press the "search" button to perform the function. The similar results as follows:
In addition, "press, if you want to go analysis" can be used for extracting the GO IDs from the annotating result, which is convenient for utilizing downstream software, such as REVIGO:
GO function
Detail: ①input the GO-ID file --->④ save location and name your file. After that, press the button of "search(1,3)" to annotate the GO IDs with the functions. The similar results as follows:
It is difficult for users with limited bioinformatic and programming skills to extract GO IDs from the transcriptome result (as the following picture showed).
The following functions can be utilized for solving this problem.
Detail: ②input the transcriptome result --->③ select a extracting model --->press the "extract(2,3)" button to perform the function. The similar result as follows:
GO visualization
We set the following functions for GO visualization.
Detail: ①Automatically open the REVIGO website, users need to analyze their GO annotation results by this website --->②input the csv-format file --->③save location and name your file --->④format your file. After that, ⑤input the REVIGO result --->⑥choose color model --->⑦select a title --->press the "draw" button for finishing the process. The similar result as follows:
The forth module is the "KEGG analysis" module. It is responsible for annotating sequences with KEGG IDs and functional information.
KEGG ID and KO ID
Detail: ①input the alignment result --->②select a species name --->if the species name not in the species list, the information should be put into ③ --->④save location and name your file. After that, click the "search" button to finish the process. These functions can produce two files. one of them contains gene ID and KEGG ID. as follows:
Another file contains KO IDs and functions. as follows:
This result contains gene ids, kegg ids, ko ids and ko functions. Therefore, the result may be too large to open it. To solve the problem, we set the following function to extract the partial result from the annotating result.
Detail: After obtaining the annotating result, directly select the relevant content in ① and then press the "extract" button to finish the process. The extraction result will be put in the same folder with the annotation result.
KEGG functions
Detail: ①input the file containing kegg ids --->②save location and name your file. The similar result as follows:
KEGG visualization
We set the following functions for kegg visualization.
network
Detail: ④input the kegg-function file --->⑤format your file --->⑥select a color model --->press the button of "draw---network" to draw the network.
heatmap
Detail: ①input the kegg functions --->②format your file --->③ save location and name your file ---> the, press the "statistics" button;
input the statistics result into ⑦ --->select a color model --->press the button of "draw---heatmap" to show the statistics result.
The operations of the remaining two module are similar. Therefore, we will introduce the "Transcription factor" module for you.
The functions of this module is to annotate your sequences with the information of transcription factors.
①input protein fasta file --->②format your file --->③select a transcription factor model --->④save location and name your file --->press ⑤ to finish the process. The result will also show in ⑥. As follows:
After that, ⑦ save location and name your file and press the button of "extract the relevant sequences(1,2,3,4)" to automatically extract the relevant sequences from your protein files.
How to make statistics of all transcription factors collected in GFAP database?
Detail: ①input your protein-sequence file --->② format your file --->③save location and name your file --->press ④ to finish the process. The result will also be showed in ⑤, as follows:
Then, users can press the ⑥ button to draw the result:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。