1 Star 2 Fork 0

simon1989/GFAP-database

Create your Gitee Account
Explore and code with more than 13.5 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README
Apache-2.0

GFAP-database

Description

the database was built for the software of GFAP

Software Architecture

GAFP was developed by python

Installation

  1. GFAPv2.0.exe can be installed by Windows system, no matter in 32 or 64 bits.
  2. GFAPv2.0.pkg

Instructions

Image description Total two parts are included into GFAP software. The databases contained annotated information (including pfam, GO, KEGG information) of 200 plant species. Among of them, the data in GFAP_database, GFAP_GO, GFAP_KEGG can be directly used by selecting the species in Image description Other information can be used after inputting the species name in the following dialog boxes. Image description When users want to perform DNA alignment, these files should be put into the database folder of GFAP. The species names should be written into Image description Total seven modules are designed in GFAP. The first module is the "Introduction" module: Image description It contains the brief instructions for other modules. Users can click the module name for obtaining the information.

Alignment

The second module is the "Alignment" module. It is responsible for sequence alignment. Image description Protein alignment Image description Detail: ① input protein fasta file --->② select a species name --->③ set evalue (the default value is 1e-5) if you want to set another value ---> if the interested species is not in the species list, you can write its name in ④ --->⑤ select the save location and name your file. Translation

Considering that some users may not have protein-sequence files, we set the following functions for translating DNA into protein sequences in batch.

Please note! The precondition of these functions is that the length of DNA sequences should be a multiple of three. Image description Detail: ① input DNA fasta file --->② format it --->③ save location and name your file DNA alignment

We encourage users to perform the alignment process using protein sequences. However, it may be hard for some users to obtain protein sequences. Therefore, GFAP still supports the DNA alignment. The functions as follows: Image description Detail: ① automatically open the website for downloading DNA alignment file --->② input DNA fasta file --->③ format the file --->④ input the species name (such as Arabidopsis_thaliana) --->⑤ set evalue --->⑥ save location and name your file.

Pfam analysis

The third module is the "Pfam analysis" module. It is responsible for annotating sequences with protein-domain information. Image description ① input the alignment result --->② select a species name --->if the species name not in the species list, the information should be put into ③ --->④ save location and name your file. The similar results as follows: Image description Furthermore, the function of ⑤ can show species name in yourself database. The statistics of the input-genes families can be performed by clicking ⑥

Considering that users may be interested in some special genes, we set the following functions for allowing users to extract annotating information of some genes Image description Detail: ①input gene IDs of the interested genes --->②input the annotating result file --->③ save location and name your file. Furthermore, the function of ④ shows the species name in the website databases.

GO analysis

The third module is the "GO analysis" module. It is responsible for annotating sequences with GO IDs and functional information. Image description GO ID Image description ①input the alignment result --->②select species name --->if the species name not in the species list, the information should be put into ③ --->④ save location and name your file. After that, press the "search" button to perform the function. The similar results as follows: Image description In addition, "press, if you want to go analysis" can be used for extracting the GO IDs from the annotating result, which is convenient for utilizing downstream software, such as REVIGO: Image description GO function Image description Detail: ①input the GO-ID file --->④ save location and name your file. After that, press the button of "search(1,3)" to annotate the GO IDs with the functions. The similar results as follows: Image description It is difficult for users with limited bioinformatic and programming skills to extract GO IDs from the transcriptome result (as the following picture showed). Image description The following functions can be utilized for solving this problem. Image description Detail: ②input the transcriptome result --->③ select a extracting model --->press the "extract(2,3)" button to perform the function. The similar result as follows: Image description GO visualization

We set the following functions for GO visualization. Image description Detail: ①Automatically open the REVIGO website, users need to analyze their GO annotation results by this website --->②input the csv-format file --->③save location and name your file --->④format your file. After that, ⑤input the REVIGO result --->⑥choose color model --->⑦select a title --->press the "draw" button for finishing the process. The similar result as follows: Image description

KEGG analysis

The forth module is the "KEGG analysis" module. It is responsible for annotating sequences with KEGG IDs and functional information. KEGG ID and KO ID Image description Detail: ①input the alignment result --->②select a species name --->if the species name not in the species list, the information should be put into ③ --->④save location and name your file. After that, click the "search" button to finish the process. These functions can produce two files. one of them contains gene ID and KEGG ID. as follows: Image description Another file contains KO IDs and functions. as follows: Image description This result contains gene ids, kegg ids, ko ids and ko functions. Therefore, the result may be too large to open it. To solve the problem, we set the following function to extract the partial result from the annotating result. Image description Detail: After obtaining the annotating result, directly select the relevant content in ① and then press the "extract" button to finish the process. The extraction result will be put in the same folder with the annotation result. KEGG functions Image description Detail: ①input the file containing kegg ids --->②save location and name your file. The similar result as follows: Image description KEGG visualization

We set the following functions for kegg visualization. Image description network Image description Detail: ④input the kegg-function file --->⑤format your file --->⑥select a color model --->press the button of "draw---network" to draw the network. heatmap Image description Detail: ①input the kegg functions --->②format your file --->③ save location and name your file ---> the, press the "statistics" button; input the statistics result into ⑦ --->select a color model --->press the button of "draw---heatmap" to show the statistics result.

Transcription factor and gene family

The operations of the remaining two module are similar. Therefore, we will introduce the "Transcription factor" module for you. Image description The functions of this module is to annotate your sequences with the information of transcription factors. Image description ①input protein fasta file --->②format your file --->③select a transcription factor model --->④save location and name your file --->press ⑤ to finish the process. The result will also show in ⑥. As follows: Image description After that, ⑦ save location and name your file and press the button of "extract the relevant sequences(1,2,3,4)" to automatically extract the relevant sequences from your protein files. Image description How to make statistics of all transcription factors collected in GFAP database?

Detail: ①input your protein-sequence file --->② format your file --->③save location and name your file --->press ④ to finish the process. The result will also be showed in ⑤, as follows: Image description Then, users can press the ⑥ button to draw the result: Image description

Copyright [Dong Xu] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

the database was built for the software of GFAP expand collapse
Python
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

can not load any more
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/SimonX19891216/gfap-database.git
git@gitee.com:SimonX19891216/gfap-database.git
SimonX19891216
gfap-database
GFAP-database
master

Search