• Login
    View Item 
    •   Home
    • Research
    • Articles
    • View Item
    •   Home
    • Research
    • Articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguideTheses and Dissertations LibguideSubmit an Item

    Statistics

    Display statistics

    SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Article-PLoS_ONE-SECOM_A_no-2012.pdf
    Size:
    2.628Mb
    Format:
    PDF
    Description:
    Article - Full Text
    Download
    Thumbnail
    Name:
    Supplement_1_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s001.pdf
    Size:
    8.723Kb
    Format:
    PDF
    Description:
    Supplemental File 1
    Download
    Thumbnail
    Name:
    Supplement_2_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s002.pdf
    Size:
    8.708Kb
    Format:
    PDF
    Description:
    Supplemental File 2
    Download
    Thumbnail
    Name:
    Supplement_3_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s003.pdf
    Size:
    8.667Kb
    Format:
    PDF
    Description:
    Supplemental File 3
    Download
    Thumbnail
    Name:
    Supplement_4_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s004.pdf
    Size:
    8.610Kb
    Format:
    PDF
    Description:
    Supplemental File 4
    Download
    Thumbnail
    Name:
    Supplement_5_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s005.pdf
    Size:
    8.724Kb
    Format:
    PDF
    Description:
    Supplemental File 5
    Download
    Thumbnail
    Name:
    Supplement_6_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s006.pdf
    Size:
    8.704Kb
    Format:
    PDF
    Description:
    Supplemental File 6
    Download
    Thumbnail
    Name:
    Supplement_7_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s007.pdf
    Size:
    8.692Kb
    Format:
    PDF
    Description:
    Supplemental File 7
    Download
    Thumbnail
    Name:
    Supplement_8_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s008.pdf
    Size:
    8.657Kb
    Format:
    PDF
    Description:
    Supplemental File 8
    Download
    Thumbnail
    Name:
    Supplement_9_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s009.pdf
    Size:
    8.649Kb
    Format:
    PDF
    Description:
    Supplemental File 9
    Download
    Thumbnail
    Name:
    Supplement_10_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s010.pdf
    Size:
    8.667Kb
    Format:
    PDF
    Description:
    Supplemental File 10
    Download
    Thumbnail
    Name:
    Supplement_11_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s011.png
    Size:
    29.15Kb
    Format:
    PNG image
    Description:
    Supplemental File 11
    Image viewer
    Download
    Thumbnail
    Name:
    Supplement_12_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s012.png
    Size:
    9.864Kb
    Format:
    PNG image
    Description:
    Supplemental File 12
    Image viewer
    Download
    Thumbnail
    Name:
    Supplement_13_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s013.tex
    Size:
    1.725Kb
    Format:
    TeX
    Description:
    Supplemental File 13
    Download
    Thumbnail
    Name:
    Supplement_14_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s014.tex
    Size:
    4.661Kb
    Format:
    TeX
    Description:
    Supplemental File 14
    Download
    Thumbnail
    Name:
    Supplement_15_-_PLoS_ONE-SECOM_A_no-2012.pone.0039475.s015.pdf
    Size:
    153.0Kb
    Format:
    PDF
    Description:
    Supplemental File 15
    Download
    View more filesView fewer files
    Type
    Article
    Authors
    Fan, Ming
    Wong, Ka-Chun
    Ryu, Tae Woo
    Ravasi, Timothy cc
    Gao, Xin cc
    KAUST Department
    Biological and Environmental Sciences and Engineering (BESE) Division
    Bioscience Program
    Computational Bioscience Research Center (CBRC)
    Computer Science Program
    Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
    Integrative Systems Biology Lab
    Structural and Functional Bioinformatics Group
    Date
    2012-06-28
    Permanent link to this record
    http://hdl.handle.net/10754/325305
    
    Metadata
    Show full item record
    Abstract
    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.
    Citation
    Fan M, Wong K-C, Ryu T, Ravasi T, Gao X (2012) SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification. PLoS ONE 7: e39475. doi:10.1371/journal.pone.0039475.
    Publisher
    Public Library of Science (PLoS)
    Journal
    PLoS ONE
    DOI
    10.1371/journal.pone.0039475
    PubMed ID
    22761802
    PubMed Central ID
    PMC3386278
    ae974a485f413a2113503eed53cd6c53
    10.1371/journal.pone.0039475
    Scopus Count
    Collections
    Articles; Biological and Environmental Science and Engineering (BESE) Division; Bioscience Program; Structural and Functional Bioinformatics Group; Integrative Systems Biology Lab; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division

    entitlement

    Related articles

    • HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.
    • Authors: Bradshaw CR, Surendranath V, Henschel R, Mueller MS, Habermann BH
    • Issue date: 2011 Mar 10
    • Issue date: 2004
    • Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence.
    • Authors: Bernardes J, Zaverucha G, Vaquero C, Carbone A
    • Issue date: 2016 Jul
    • CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.
    • Authors: Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X
    • Issue date: 2016 Jun 15
    • Computational identification of novel chitinase-like proteins in the Drosophila melanogaster genome.
    • Authors: Zhu Q, Deng Y, Vanka P, Brown SJ, Muthukrishnan S, Kramer KJ
    • Issue date: 2004 Jan 22

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps

      Oliva, Romina; Chermak, Edrisse; Cavallo, Luigi (Molecules, MDPI AG, 2015-07-01) [Article]
      In view of the increasing interest both in inhibitors of protein-protein interactions and in protein drugs themselves, analysis of the three-dimensional structure of protein-protein complexes is assuming greater relevance in drug design. In the many cases where an experimental structure is not available, protein-protein docking becomes the method of choice for predicting the arrangement of the complex. However, reliably scoring protein-protein docking poses is still an unsolved problem. As a consequence, the screening of many docking models is usually required in the analysis step, to possibly single out the correct ones. Here, making use of exemplary cases, we review our recently introduced methods for the analysis of protein complex structures and for the scoring of protein docking poses, based on the use of inter-residue contacts and their visualization in inter-molecular contact maps. We also show that the ensemble of tools we developed can be used in the context of rational drug design targeting protein-protein interactions.
    • Thumbnail

      Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis.

      Wang, Xue; Zhang, Yaqun; Yu, Bin; Salhi, Adil; Chen, Ruixin; Wang, Lin; Liu, Zengfeng (Computers in biology and medicine, Elsevier BV, 2021-06-01) [Article]
      Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and time-consuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISP-XGBoost method can further enhance the prediction of PPI sites.
    • Thumbnail

      Functional pangenome analysis suggests inhibition of the protein E as a readily available therapy for COVID-2019.

    DSpace software copyright © 2002-2022  DuraSpace
    Quick Guide | Contact Us | KAUST University Library
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.