For more information visit: https://sfb.kaust.edu.sa/Pages/Home.aspx

Recent Submissions

  • Self-normalizing learning on biomedical ontologies using a deep Siamese neural network

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2020-04-25) [Preprint]
    Motivation:Ontologies are widely used in biomedicine for the annotation and standardization of data.One of the main roles of ontologies is to provide structured background knowledge within a domain as well as a set of labels, synonyms, and definitions for the classes within a domain. The two types of information provided by ontologies have been extensively exploited in natural language processing and machine learning applications. However, they are commonly used separately, and thus it is unknown if joining the two sources of information can further benefit data analysis tasks. Results:We developed a novel method that applies named entity recognition and normalization methods on texts to connect the structured information in biomedical ontologies with the information contained in natural language. We apply this normalization both to literature and to the natural language information contained within ontologies themselves. The normalized ontologies and text are then used to generate embeddings, and relations between entities are predicted using a deep Siamese neural network model that takes these embeddings as input. We demonstrate that our novel embedding and prediction method using self normalized biomedical ontologies significantly outperforms the state of the art methods in embedding ontologies on two benchmark tasks: prediction of interactions between proteins and prediction of gene disease associations. Our method also allows us to apply ontology based annotations and axioms to the prediction of toxicological effects of chemicals where our method shows superior performance. Our method is generic and can be applied in scenarios where ontologies consisting of both structured information and natural language labels or synonyms are used.
  • Comparative genomics study reveals Red Sea Bacillus with characteristics associated with potential microbial cell factories (MCFs)

    Othoum, Ghofran K.; Prigent, S.; Derouiche, A.; Shi, L.; Bokhari, Ameerah; Alamoudi, S.; Bougouffa, Salim; Gao, Xin; Hoehndorf, Robert; Arold, Stefan T.; Gojobori, Takashi; Hirt, Heribert; Lafi, Feras Fawzi; Nielsen, J.; Bajic, Vladimir B.; Mijakovic, I.; Essack, Magbubah (Scientific Reports, Springer Science and Business Media LLC, 2019-12-17) [Article]
    Recent advancements in the use of microbial cells for scalable production of industrial enzymes encourage exploring new environments for efficient microbial cell factories (MCFs). Here, through a comparison study, ten newly sequenced Bacillus species, isolated from the Rabigh Harbor Lagoon on the Red Sea shoreline, were evaluated for their potential use as MCFs. Phylogenetic analysis of 40 representative genomes with phylogenetic relevance, including the ten Red Sea species, showed that the Red Sea species come from several colonization events and are not the result of a single colonization followed by speciation. Moreover, clustering reactions in reconstruct metabolic networks of these Bacillus species revealed that three metabolic clades do not fit the phylogenetic tree, a sign of convergent evolution of the metabolism of these species in response to special environmental adaptation. We further showed Red Sea strains Bacillus paralicheniformis (Bac48) and B. halosaccharovorans (Bac94) had twice as much secreted proteins than the model strain B. subtilis 168. Also, Bac94 was enriched with genes associated with the Tat and Sec protein secretion system and Bac48 has a hybrid PKS/NRPS cluster that is part of a horizontally transferred genomic region. These properties collectively hint towards the potential use of Red Sea Bacillus as efficient protein secreting microbial hosts, and that this characteristic of these strains may be a consequence of the unique ecological features of the isolation environment.
  • Formal axioms in biomedical ontologies improve analysis and interpretation of associated data.

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-12-10) [Article]
    Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns, and encode domain background knowledge. The domain knowledge in biomedical ontologies may also have the potential to provide background knowledge for machine learning and predictive modelling. We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein-protein interactions and gene-disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. https://github.com/bio-ontology-research-group/tsoe.
  • Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs.

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Astro, Veronica; Hong, Seungbeom; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian G; Huser, Raphaël; Ali, Amal J.; Merzaban, Jasmeen; Adamo, Antonio; Jaremko, Mariusz; Jaremko, Lukasz; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T. (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-04) [Article]
    MOTIVATION:Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS:To enable a proteome-wide assessment of LD motifs, we developed an active-learning based framework (LDmotif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal (NES) as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY:LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
  • Machine Learning to Predict Standard Enthalpy of Formation of Hydrocarbons

    Yalamanchi, Kiran K.; Van Oudenhoven, Vincent C.O.; Tutino, Francesco; Monge Palacios, Manuel; Alshehri, Abdulelah; Gao, Xin; Sarathy, Mani (Journal of Physical Chemistry A, American Chemical Society (ACS), 2019-08-29) [Article]
    Thermodynamic properites of molecules are used widely in the study of reactive processes. Such properties are typically measured via experiments or calculated by a variety of computational chemistry methods. In this work, machine learning (ML) models for estimation of standard enthalpy of formation at 298.15 K are developed for three classes of acyclic and closed-shell hydrocarbons, viz. alkanes, alkenes, and alkynes. Initially, an extensive literature survey is performed to collect standard enthalpy data for training ML models. A commercial software (Dragon) is used to obtain a wide set of molecular descriptors by providing SMILES strings. The molecular descriptors are used as input features for the ML models. Support vector regression (SVR) and artificial neural networks are used with a two-level K-fold cross-validation (K-fold CV) workflow. The first level is for estimation of accuracy of both the ML models, and the second level is for generation of the final models. The SVR model is selected as the best model based on error estimates over 10-fold CV. The final SVR model is compared against conventional Benson's group additivity for a set of octene isomers from the database, illustrating the advantages of the proposed ML modeling approach.
  • Accelerating flash calculation through deep learning methods

    Li, Yu; Zhang, Tao; Sun, Shuyu; Gao, Xin (Journal of Computational Physics, Elsevier BV, 2019-05-29) [Article]
    In the past two decades, researchers have made remarkable progress in accelerating flash calculation, which is very useful in a variety of engineering processes. In this paper, general phase splitting problem statements and flash calculation procedures using the Successive Substitution Method are reviewed, while the main shortages are pointed out. Two acceleration methods, Newton's method and the Sparse Grids Method are presented afterwards as a comparison with the deep learning model proposed in this paper. A detailed introduction from artificial neural networks to deep learning methods is provided here with the authors' own remarks. Factors in the deep learning model are investigated to show their effect on the final result. A selected model based on that has been used in a flash calculation predictor with comparison with other methods mentioned above. It is shown that results from the optimized deep learning model meet the experimental data well with the shortest CPU time. More comparison with experimental data has been conducted to show the robustness of our model.
  • Intrinsic cleavage of RNA polymerase II adopts a nucleobase-independent mechanism assisted by transcript phosphate

    Tse, Carmen Ka Man; Xu, Jun; Xu, Liang; Sheong, Fu Kit; Wang, Shenglong; Chow, Hoi Yee; Gao, Xin; Li, Xuechen; Cheung, Peter Pak-Hang; Wang, Dong; Zhang, Yingkai; Huang, Xuhui (Nature Catalysis, Springer Nature, 2019-02-11) [Article]
    RNA polymerase II (Pol II) utilizes the same active site for polymerization and intrinsic cleavage. Pol II proofreads the nascent transcript via its intrinsic nuclease activity to maintain high transcriptional fidelity critical for cell growth and viability. The detailed catalytic mechanism of intrinsic cleavage remains unknown. Here, we combined ab initio quantum mechanics/molecular mechanics studies and biochemical cleavage assays to show that Pol II utilizes downstream phosphate oxygen to activate the attacking nucleophile in hydrolysis, while the newly formed 3′-end is protonated through active-site water without a defined general acid. Experimentally, alteration of downstream phosphate oxygen either by 2′-5′ sugar linkage or stereo-specific thio-substitution of phosphate oxygen drastically reduced cleavage rate. We showed by N7-modification that guanine nucleobase is not directly involved as an acid–base catalyst. Our proposed mechanism provides important insights into the intrinsic transcriptional cleavage reaction, an essential step in transcriptional fidelity control.
  • A Novel One-Pot Three-Component Reaction for Rapid Access of Arylidene 2-Aminoimidazolone Derivatives

    Hanif, Aansa; Sardar, Aniqa; Alazmi, Meshari; Tariq, Haniya; Emwas, Abdul-Hamid M.; Gao, Xin; Chotana, Ghayoor Abbas; Zaib Saleem, Rahman Shah (ChemistrySelect, Wiley, 2019-02-05) [Article]
    A simple and convenient one-pot reaction for the synthesis of arylidene 2-aminoimidazolone derivatives from structurally diverse benzaldehydes, amines and 2-(methylthio)-1H-imidazol-4(5H)-one is described. The reaction offers flexibility of use of electron-rich and electron-deficient benzaldehydes and mono and di-substituted amines for the rapid development of combinatorial library.
  • Formal axioms in biomedical ontologies improve analysis and interpretation of associated data

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2019-02-02) [Preprint]
    Motivation: There are now over 500 ontologies in the life sciences. Over the past years, significant resources have been invested into formalizing these biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns, and encode domain background knowledge. At the same time, ontologies have extended their amount of human-readable information such as labels and definitions as well as other meta-data. As a consequence, biomedical ontologies now form large formalized domain knowledge bases and have a potential to improve ontology-based data analysis by providing background knowledge and relations between biological entities that are not otherwise connected. Results: We evaluate the contribution of formal axioms and ontology meta-data to the ontology-based prediction of protein-protein interactions and gene-disease associations. We find that the formal axioms that have been created for the Gene Ontology and several other ontologies significantly improve ontology- based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute in varying degrees to improving data analysis. Our results have major implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings clearly motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies
  • H-NS uses an autoinhibitory conformational switch for environment-controlled gene silencing

    Shahul Hameed, Umar F; Liao, Chenyi; Radhakrishnan, Anand K; Huser, Franceline; Aljedani, Safia Salim Eid; Zhao, Xiaochuan; Momin, Afaque Ahmad Imtiyaz; Melo, Fernando A; Guo, Xianrong; Brooks, Claire; Li, Yu; Cui, Xuefeng; Gao, Xin; Ladbury, John E; Jaremko, Lukasz; Jaremko, Mariusz; Li, Jianing; Arold, Stefan T. (Nucleic Acids Research, Oxford University Press (OUP), 2018-12-28) [Article]
    As an environment-dependent pleiotropic gene regulator in Gram-negative bacteria, the H-NS protein is crucial for adaptation and toxicity control of human pathogens such as Salmonella, Vibrio cholerae or enterohaemorrhagic Escherichia coli. Changes in temperature affect the capacity of H-NS to form multimers that condense DNA and restrict gene expression. However, the molecular mechanism through which H-NS senses temperature and other physiochemical parameters remains unclear and controversial. Combining structural, biophysical and computational analyses, we show that human body temperature promotes unfolding of the central dimerization domain, breaking up H-NS multimers. This unfolding event enables an autoinhibitory compact H-NS conformation that blocks DNA binding. Our integrative approach provides the molecular basis for H-NS-mediated environment-sensing and may open new avenues for the control of pathogenic multi-drug resistant bacteria.
  • OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Bioinformatics, Oxford University Press (OUP), 2018-11-08) [Article]
    Motivation:Ontologies are widely used in biology for data annotation, integration, and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions, or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such. Results:We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology metadata. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. Availability:https://github.com/bio-ontology-research-group/opa2vec.
  • Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Bioinformatics, Oxford University Press (OUP), 2018-06-27) [Article]
    Motivation Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. Results We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein–protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved.
  • In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters

    Othoum, Ghofran K.; Bougouffa, Salim; Mohamad Razali, Rozaimi; Bokhari, Ameerah; Alamoudi, Soha; Antunes, André; Gao, Xin; Hoehndorf, Robert; Arold, Stefan T.; Gojobori, Takashi; Hirt, Heribert; Mijakovic, Ivan; Bajic, Vladimir B.; Lafi, Feras Fawzi; Essack, Magbubah (BMC Genomics, Springer Nature, 2018-05-22) [Article]
    BackgroundThe increasing spectrum of multidrug-resistant bacteria is a major global public health concern, necessitating discovery of novel antimicrobial agents. Here, members of the genus Bacillus are investigated as a potentially attractive source of novel antibiotics due to their broad spectrum of antimicrobial activities. We specifically focus on a computational analysis of the distinctive biosynthetic potential of Bacillus paralicheniformis strains isolated from the Red Sea, an ecosystem exposed to adverse, highly saline and hot conditions.ResultsWe report the complete circular and annotated genomes of two Red Sea strains, B. paralicheniformis Bac48 isolated from mangrove mud and B. paralicheniformis Bac84 isolated from microbial mat collected from Rabigh Harbor Lagoon in Saudi Arabia. Comparing the genomes of B. paralicheniformis Bac48 and B. paralicheniformis Bac84 with nine publicly available complete genomes of B. licheniformis and three genomes of B. paralicheniformis, revealed that all of the B. paralicheniformis strains in this study are more enriched in nonribosomal peptides (NRPs). We further report the first computationally identified trans-acyltransferase (trans-AT) nonribosomal peptide synthetase/polyketide synthase (PKS/ NRPS) cluster in strains of this species.ConclusionsB. paralicheniformis species have more genes associated with biosynthesis of antimicrobial bioactive compounds than other previously characterized species of B. licheniformis, which suggests that these species are better potential sources for novel antibiotics. Moreover, the genome of the Red Sea strain B. paralicheniformis Bac48 is more enriched in modular PKS genes compared to B. licheniformis strains and other B. paralicheniformis strains. This may be linked to adaptations that strains surviving in the Red Sea underwent to survive in the relatively hot and saline ecosystems.
  • In silico screening for candidate chassis strains of free fatty acid-producing cyanobacteria

    Motwalli, Olaa Amin; Essack, Magbubah; Jankovic, Boris R.; Ji, Boyang; Liu, Xinyao; Ansari, Hifzur Rahman; Hoehndorf, Robert; Gao, Xin; Arold, Stefan T.; Mineta, Katsuhiko; Archer, John A.C.; Gojobori, Takashi; Mijakovic, Ivan; Bajic, Vladimir B. (BMC Genomics, Springer Nature, 2017-01-05) [Article]
    Background Finding a source from which high-energy-density biofuels can be derived at an industrial scale has become an urgent challenge for renewable energy production. Some microorganisms can produce free fatty acids (FFA) as precursors towards such high-energy-density biofuels. In particular, photosynthetic cyanobacteria are capable of directly converting carbon dioxide into FFA. However, current engineered strains need several rounds of engineering to reach the level of production of FFA to be commercially viable; thus new chassis strains that require less engineering are needed. Although more than 120 cyanobacterial genomes are sequenced, the natural potential of these strains for FFA production and excretion has not been systematically estimated. Results Here we present the FFA SC (FFASC), an in silico screening method that evaluates the potential for FFA production and excretion of cyanobacterial strains based on their proteomes. A literature search allowed for the compilation of 64 proteins, most of which influence FFA production and a few of which affect FFA excretion. The proteins are classified into 49 orthologous groups (OGs) that helped create rules used in the scoring/ranking of algorithms developed to estimate the potential for FFA production and excretion of an organism. Among 125 cyanobacterial strains, FFASC identified 20 candidate chassis strains that rank in their FFA producing and excreting potential above the specifically engineered reference strain, Synechococcus sp. PCC 7002. We further show that the top ranked cyanobacterial strains are unicellular and primarily include Prochlorococcus (order Prochlorales) and marine Synechococcus (order Chroococcales) that cluster phylogenetically. Moreover, two principal categories of enzymes were shown to influence FFA production the most: those ensuring precursor availability for the biosynthesis of lipids, and those involved in handling the oxidative stress associated to FFA synthesis. Conclusion To our knowledge FFASC is the first in silico method to screen cyanobacteria proteomes for their potential to produce and excrete FFA, as well as the first attempt to parameterize the criteria derived from genetic characteristics that are favorable/non-favorable for this purpose. Thus, FFASC helps focus experimental evaluation only on the most promising cyanobacteria.
  • Supplementary Material for: In silico screening for candidate chassis strains of free fatty acid-producing cyanobacteria

    Motwalli, Olaa Amin; Essack, Magbubah; Jankovic, Boris R.; Ji, Boyang; Liu, Xinyao; Ansari, Hifzur Rahman; Hoehndorf, Robert; Gao, Xin; Arold, Stefan T.; Mineta, Katsuhiko; Archer, John A.C.; Gojobori, Takashi; Mijakovic, Ivan; Bajic, Vladimir B. (figshare, 2017) [Dataset]
    Abstract Background Finding a source from which high-energy-density biofuels can be derived at an industrial scale has become an urgent challenge for renewable energy production. Some microorganisms can produce free fatty acids (FFA) as precursors towards such high-energy-density biofuels. In particular, photosynthetic cyanobacteria are capable of directly converting carbon dioxide into FFA. However, current engineered strains need several rounds of engineering to reach the level of production of FFA to be commercially viable; thus new chassis strains that require less engineering are needed. Although more than 120 cyanobacterial genomes are sequenced, the natural potential of these strains for FFA production and excretion has not been systematically estimated. Results Here we present the FFA SC (FFASC), an in silico screening method that evaluates the potential for FFA production and excretion of cyanobacterial strains based on their proteomes. A literature search allowed for the compilation of 64 proteins, most of which influence FFA production and a few of which affect FFA excretion. The proteins are classified into 49 orthologous groups (OGs) that helped create rules used in the scoring/ranking of algorithms developed to estimate the potential for FFA production and excretion of an organism. Among 125 cyanobacterial strains, FFASC identified 20 candidate chassis strains that rank in their FFA producing and excreting potential above the specifically engineered reference strain, Synechococcus sp. PCC 7002. We further show that the top ranked cyanobacterial strains are unicellular and primarily include Prochlorococcus (order Prochlorales) and marine Synechococcus (order Chroococcales) that cluster phylogenetically. Moreover, two principal categories of enzymes were shown to influence FFA production the most: those ensuring precursor availability for the biosynthesis of lipids, and those involved in handling the oxidative stress associated to FFA synthesis. Conclusion To our knowledge FFASC is the first in silico method to screen cyanobacteria proteomes for their potential to produce and excrete FFA, as well as the first attempt to parameterize the criteria derived from genetic characteristics that are favorable/non-favorable for this purpose. Thus, FFASC helps focus experimental evaluation only on the most promising cyanobacteria.
  • Supplementary Material for: Hologenome analysis of two marine sponges with different microbiomes

    Ryu, Tae Woo; Seridi, Loqmane; Moitinho-Silva, Lucas; Oates, Matthew; Liew, Yi Jin; Mavromatis, Charalampos Harris; Wang, Xiaolei; Haywood, Annika; Lafi, Feras; Kupresanin, Marija; Sougrat, Rachid; Alzahrani, Majed A.; Giles, Emily; Ghosheh, Yanal; Schunter, Celia Marei; Baumgarten, Sebastian; Berumen, Michael L.; Gao, Xin; Aranda, Manuel; Foret, Sylvain; Gough, Julian; Voolstra, Christian R.; Hentschel, Ute; Ravasi, Timothy (figshare, 2016) [Dataset]
    Abstract Background Sponges (Porifera) harbor distinct microbial consortia within their mesohyl interior. We herein analysed the hologenomes of Stylissa carteri and Xestospongia testudinaria, which notably differ in their microbiome content. Results Our analysis revealed that S. carteri has an expanded repertoire of immunological domains, specifically Scavenger Receptor Cysteine-Rich (SRCR)-like domains, compared to X. testudinaria. On the microbial side, metatranscriptome analyses revealed an overrepresentation of potential symbiosis-related domains in X. testudinaria. Conclusions Our findings provide genomic insights into the molecular mechanisms underlying host-symbiont coevolution and may serve as a roadmap for future hologenome analyses.
  • DESM: portal for microbial knowledge exploration systems

    Salhi, Adil; Essack, Magbubah; Radovanovic, Aleksandar; Marchand, Benoit; Bougouffa, Salim; Antunes, Andre; Simoes, Marta; Lafi, Feras Fawzi; Motwalli, Olaa Amin; Bokhari, Ameerah; Malas, Tareq Majed Yasin; Al Amoudi, Soha; Othum, Ghofran; Alam, Intikhab; Mineta, Katsuhiko; Gao, Xin; Hoehndorf, Robert; Archer, John A.C.; Gojobori, Takashi; Bajic, Vladimir B. (Nucleic Acids Research, Oxford University Press (OUP), 2015-11-05) [Article]
    Microorganisms produce an enormous variety of chemical compounds. It is of general interest for microbiology and biotechnology researchers to have means to explore information about molecular and genetic basis of functioning of different microorganisms and their ability for bioproduction. To enable such exploration, we compiled 45 topic-specific knowledgebases (KBs) accessible through DESM portal (www.cbrc.kaust.edu.sa/desm). The KBs contain information derived through text-mining of PubMed information and complemented by information data-mined from various other resources (e.g. ChEBI, Entrez Gene, GO, KOBAS, KEGG, UniPathways, BioGrid). All PubMed records were indexed using 4 538 278 concepts from 29 dictionaries, with 1 638 986 records utilized in KBs. Concepts used are normalized whenever possible. Most of the KBs focus on a particular type of microbial activity, such as production of biocatalysts or nutraceuticals. Others are focused on specific categories of microorganisms, e.g. streptomyces or cyanobacteria. KBs are all structured in a uniform manner and have a standardized user interface. Information exploration is enabled through various searches. Users can explore statistically most significant concepts or pairs of concepts, generate hypotheses, create interactive networks of associated concepts and export results. We believe DESM will be a useful complement to the existing resources to benefit microbiology and biotechnology research.
  • Synthesis of Fluoroalkoxy Substituted Arylboronic Esters by Iridium-Catalyzed Aromatic C–H Borylation

    Batool, Farhat; Parveen, Shehla; Emwas, Abdul-Hamid M.; Sioud, Salim; Gao, Xin; Munawar, Munawar A.; Chotana, Ghayoor A. (Organic Letters, American Chemical Society (ACS), 2015-08-17) [Article]
    The preparation of fluoroalkoxy arylboronic esters by iridium-catalyzed aromatic C–H borylation is described. The fluoroalkoxy groups employed include trifluoromethoxy, difluoromethoxy, 1,1,2,2-tetrafluoroethoxy, and 2,2-difluoro-1,3-benzodioxole. The borylation reactions were carried out neat without the use of a glovebox or Schlenk line. The regioselectivities available through the iridium-catalyzed C–H borylation are complementary to those obtained by the electrophilic aromatic substitution reactions of fluoroalkoxy arenes. Fluoroalkoxy arylboronic esters can serve as versatile building blocks.
  • PERK silence inhibits glioma cell growth under low glucose stress by blockage of p-AKT and subsequent HK2's mitochondria translocation

    Hou, Xu; Liu, Yaohua; Liu, Huailei; Chen, Xin; Liu, Min; Che, Hui; Guo, Fei; Wang, Chunlei; Zhang, Daming; Wu, Jianing; Chen, Xiaofeng; Shen, Chen; Li, Chenguang; Peng, Fei; Bi, Yunke; Yang, Zhuowen; Yang, Guang; Ai, Jing; Gao, Xin; ZHAO, SHIGUANG (Scientific Reports, Springer Nature, 2015-03-12) [Article]
    Glioma relies on glycolysis to obtain energy and sustain its survival under low glucose microenvironment in vivo. The mechanisms on glioma cell glycolysis regulation are still unclear. Signaling mediated by Double-stranded RNA-activated protein kinase (PKR) - like ER kinase (PERK) is one of the important pathways of unfolded protein response (UPR) which is comprehensively activated in cancer cells upon the hypoxic and low glucose stress. Here we show that PERK is significantly activated in human glioma tissues. PERK silencing results in decreased glioma cell viability and ATP/lactate production upon low glucose stress, which is mediated by partially blocked AKT activation and subsequent inhibition of Hexokinase II (HK2)'s mitochondria translocation. More importantly, PERK silenced glioma cells show decreased tumor formation capacity. Our results reveal that PERK activation is involved in glioma glycolysis regulation and may be a potential molecular target for glioma treatment.
  • Median Modified Wiener Filter for nonlinear adaptive spatial denoising of protein NMR multidimensional spectra

    Cannistraci, Carlo Vittorio; Abbas, Ahmed; Gao, Xin (Scientific Reports, Springer Nature, 2015-01-26) [Article]
    Denoising multidimensional NMR-spectra is a fundamental step in NMR protein structure determination. The state-of-the-art method uses wavelet-denoising, which may suffer when applied to non-stationary signals affected by Gaussian-white-noise mixed with strong impulsive artifacts, like those in multi-dimensional NMR-spectra. Regrettably, Wavelet's performance depends on a combinatorial search of wavelet shapes and parameters; and multi-dimensional extension of wavelet-denoising is highly non-trivial, which hampers its application to multidimensional NMR-spectra. Here, we endorse a diverse philosophy of denoising NMR-spectra: less is more! We consider spatial filters that have only one parameter to tune: the window-size. We propose, for the first time, the 3D extension of the median-modified-Wiener-filter (MMWF), an adaptive variant of the median-filter, and also its novel variation named MMWF*. We test the proposed filters and the Wiener-filter, an adaptive variant of the mean-filter, on a benchmark set that contains 16 two-dimensional and three-dimensional NMR-spectra extracted from eight proteins. Our results demonstrate that the adaptive spatial filters significantly outperform their non-adaptive versions. The performance of the new MMWF* on 2D/3D-spectra is even better than wavelet-denoising. Noticeably, MMWF* produces stable high performance almost invariant for diverse window-size settings: this signifies a consistent advantage in the implementation of automatic pipelines for protein NMR-spectra analysis.

View more