Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).
Name:
Article - Frontiers in Microbiology - Mining a database of single amplified genomes - 2013.pdf
Size:
1.947Mb
Format:
PDF
Description:
Article - Full Text
Name:
Data Sheet 1.DOCX
Size:
6.647Mb
Format:
Microsoft Word 2007
Description:
Supplemental Data Sheet 1
Name:
Data Sheet 2.DOCX
Size:
85.91Kb
Format:
Microsoft Word 2007
Description:
Supplemental Data Sheet 2
Name:
Data Sheet 3.DOCX
Size:
138.8Kb
Format:
Microsoft Word 2007
Description:
Supplemental Data Sheet 3
Name:
Data Sheet 4.DOCX
Size:
151.5Kb
Format:
Microsoft Word 2007
Description:
Supplemental Data Sheet 4
Name:
Data Sheet 5.DOCX
Size:
139.8Kb
Format:
Microsoft Word 2007
Description:
Supplemental Data Sheet 5
Type
ArticleAuthors
Grötzinger, Stefan W.Alam, Intikhab
Ba Alawi, Wail

Bajic, Vladimir B.

Stingl, Ulrich

Eppinger, Jörg

KAUST Department
Applied Mathematics and Computational Science ProgramBiological & Organometallic Catalysis Laboratories
Biological and Environmental Sciences and Engineering (BESE) Division
Bioscience Program
Chemical Science Program
Computational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
KAUST Catalysis Center (KCC)
Marine Microbial Ecology Research Group
Marine Science Program
Physical Science and Engineering (PSE) Division
Red Sea Research Center (RSRC)
Date
2014-04-07Permanent link to this record
http://hdl.handle.net/10754/323510
Metadata
Show full item recordAbstract
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.Citation
Grötzinger SW, Alam I, Ba Alawi W, Bajic VB, Stingl U, et al. (2014) Mining a database of single amplified genomes from Red Sea brine pool extremophiles -- improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA). Frontiers in Microbiology 5. doi:10.3389/fmicb.2014.00134.Publisher
Frontiers Media SAJournal
Frontiers in MicrobiologyPubMed ID
24778629PubMed Central ID
PMC3985023Additional Links
http://journal.frontiersin.org/Journal/10.3389/fmicb.2014.00134/abstracthttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985023/
ae974a485f413a2113503eed53cd6c53
10.3389/fmicb.2014.00134
Scopus Count
Collections
Articles; Biological and Environmental Science and Engineering (BESE) Division; Red Sea Research Center (RSRC); Bioscience Program; Marine Science Program; Applied Mathematics and Computational Science Program; Physical Science and Engineering (PSE) Division; Computer Science Program; Chemical Science Program; KAUST Catalysis Center (KCC); Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) DivisionRelated articles
- INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.
- Authors: Alam I, Antunes A, Kamau AA, Ba Alawi W, Kalkatawi M, Stingl U, Bajic VB
- Issue date: 2013
- The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation.
- Authors: Yu C, Zavaljevski N, Desai V, Johnson S, Stevens FJ, Reifman J
- Issue date: 2008 Jan 25
- Insights into Red Sea Brine Pool Specialized Metabolism Gene Clusters Encoding Potential Metabolites for Biotechnological Applications and Extremophile Survival.
- Authors: Ziko L, Adel M, Malash MN, Siam R
- Issue date: 2019 May 8
- [Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].
- Authors: Zhang DL, Ji L, Li YD
- Issue date: 2004 May