# KEGG-Annotation **Repository Path**: leosfan/KEGG-Annotation ## Basic Information - **Project Name**: KEGG-Annotation - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-04-07 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README #### KEGG-Annotation #### The purpose of this repository is to describe a simple script of KEGG annotation of AA sequence files - - First, deploy the KOBAS and Uniprot databses for blast, usearch, and diamond ##### Uniprot/Swissprot ``` cd /data/DATABASES mkdir UNIPROT cd UNIPROT wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz gunzip uniprot_sprot.fasta.gz ``` - MAKE USEARCH DATABASE ``` usearch -makeudb_usearch uniprot_sprot.fasta -output uniprot.udb ``` - MAKE BLASTABLE DATABAESE ``` makeblastdb -in uniprot_sprot.fasta -input_type fasta -out uniprot_sprot -dbtype prot ``` - MAKE DIAMOND DATABASE ``` diamond makedb --in uniprot_sprot.fasta -d uniprot_sprot.diamond.db ``` - Give everyone permission ``` sudo chmod 775 * ``` ##### KOBAS - Download KOBAS files and unzip ``` mkdir /data/KOBAS/sqlite3 mkdir /data/KOBAS/seq_pep cd /data/KOBAS/sqlite3 wget http://kobas.cbi.pku.edu.cn/download/sqlite3/ko.db.gz gunzip ko.db.gz cd /data/KOBAS/seq_pep wget http://kobas.cbi.pku.edu.cn/download/seq_pep/ko.pep.fasta.gz gunzip ko.pep.fasta.gz ``` - Make a blastable database ``` makeblastdb -in ko.pep.fasta -dbtype prot ``` - Make a Diamond searchable database ``` diamond makedb --in ko.pep.fasta -d ko.pep ``` - Note: the KOBAS fasta file is to big for the free version of Usearch. You will need the 64 bit version to use usearch. This is about $1k/ - Give everyone permission ``` sudo chmod 775 * ``` ### KEGG annotation procedure - first, run a diamond search agains the KOBAS KO database ``` diamond blastp -d /data/DATABASES/KOBAS/seq_pep/ko -q sequence_files/SDB_ONE.faa -o SDB_ONE.faa.dmd -e 1e-10 -k 1 gunzip SDB_ONE.faa.dmd.gz ``` - Extract KO numbers and get gene IDs from faa file ``` cut -f 1,2 SDB_ONE.faa.dmd > SDB_ONE.ORF_G_IDs grep '>' sequence_files/SDB_ONE.faa | sed 's/>//g' > SDB_ONE.ORF_IDs cut -f 2 SDB_ONE.ORF_G_IDs > SDB_ONE.G_IDs ``` - Make an SQlite database and add the tables ``` sqlite3 annotate.sqlite .separator \t create table ORF_G_IDs (ORF, GID); .import SDB_ONE.ORF_G_IDs ORF_G_IDs .separator " " create table KoGenes (KO, GID); .import KoGenes KoGenes .separator \t create table ORFS (ORF); .import SDB_ONE.ORF_IDs ORFS CREATE TABLE IF NOT EXISTS out as SELECT ORF_G_IDs.ORF, ORF_G_IDs.GID, KoGenes.KO FROM ORF_G_IDs JOIN KoGenes ON ORF_G_IDs.GID = KoGenes.GID; CREATE TABLE IF NOT EXISTS allout as SELECT * FROM ORFS LEFT JOIN out ON ORFS.ORF = out.ORF .separator "\t" .output SDB_ONE.G_ID_KO_ORF_ID SELECT * FROM allout ``` SELECT ORF_G_IDs.ORF, KoGenes.KO FROM ORF_G_IDs LEFT JOIN KoGenes ON ORF_G_IDs.GID = KoGenes.GID;