Recent Submissions

  • Modern Deep Learning in Bioinformatics.

    Li, Haoyang; Tian, Shuye; Li, Yu; Fang, Qiming; Tan, Renbo; Pan, Yijie; Huang, Chao; Xu, Ying; Gao, Xin (Journal of molecular cell biology, Oxford University Press (OUP), 2020-06-24) [Article]
    Deep learning (DL) has shown explosive growth in its application to bioinformatics and has demonstrated thrillingly promising power to mine the complex relationship hidden in large-scale biological and biomedical data. A number of comprehensive reviews have been published on such applications, ranging from high-level reviews with future perspectives to those mainly serving as tutorials. These reviews have provided an excellent introduction to and guideline for applications of DL in bioinformatics, covering multiple types of machine learning (ML) problems, different DL architectures, and ranges of biological/biomedical problems. However, most of these reviews have focused on previous research, whereas current trends in the principled DL field and perspectives on their future developments and potential new applications to biology and biomedicine are still scarce. We will focus on modern DL, the ongoing trends and future directions of the principled DL field, and postulate new and major applications in bioinformatics.
  • Modern Deep Learning in Bioinformatics.

    Li, Haoyang; Tian, Shuye; Li, Yu; Fang, Qiming; Tan, Renbo; Pan, Yijie; Huang, Chao; Xu, Ying; Gao, Xin (Journal of molecular cell biology, Oxford University Press (OUP), 2020-06-24) [Article]
    Deep learning (DL) has shown explosive growth in its application to bioinformatics and has demonstrated thrillingly promising power to mine the complex relationship hidden in large-scale biological and biomedical data. A number of comprehensive reviews have been published on such applications, ranging from high-level reviews with future perspectives to those mainly serving as tutorials. These reviews have provided an excellent introduction to and guideline for applications of DL in bioinformatics, covering multiple types of machine learning (ML) problems, different DL architectures, and ranges of biological/biomedical problems. However, most of these reviews have focused on previous research, whereas current trends in the principled DL field and perspectives on their future developments and potential new applications to biology and biomedicine are still scarce. We will focus on modern DL, the ongoing trends and future directions of the principled DL field, and postulate new and major applications in bioinformatics.
  • Modeling quantitative traits for COVID-19 case reports

    Queralt-Rosinach, Núria; Bello, Susan; Hoehndorf, Robert; Weiland, Claus; Rocca-Serra, Philippe; Schofield, Paul N. (Cold Spring Harbor Laboratory, 2020-06-21) [Preprint]
    <jats:p>Medical practitioners record the condition status of a patient through qualitative and quantitative observations. The measurement of vital signs and molecular parameters in the clinics gives a complementary description of abnormal phenotypes associated with the progression of a disease. The Clinical Measurement Ontology (CMO) is used to standardize annotations of these measurable traits. However, researchers have no way to describe how these quantitative traits relate to phenotype concepts in a machine-readable manner. Using the WHO clinical case report form standard for the COVID-19 pandemic, we modeled quantitative traits and developed OWL axioms to formally relate clinical measurement terms with anatomical, biomolecular entities and phenotypes annotated with the Uber-anatomy ontology (Uberon), Chemical Entities of Biological Interest (ChEBI) and the Phenotype and Trait Ontology (PATO) biomedical ontologies. The formal description of these relations allows interoperability between clinical and biological descriptions, and facilitates automated reasoning for analysis of patterns over quantitative and qualitative biomedical observations.</jats:p>
  • Behavioral and brain- transcriptomic synchronization between the two opponents of a fighting pair of the fish Betta splendens.

    Vu, Trieu-Duc; Iwasaki, Yuki; Shigenobu, Shuji; Maruko, Akiko; Oshima, Kenshiro; Iioka, Erica; Huang, Chao-Li; Abe, Takashi; Tamaki, Satoshi; Lin, Yi-Wen; Chen, Chih-Kuan; Lu, Mei-Yeh; Hojo, Masaru; Wang, Hao-Ven; Tzeng, Shun-Fen; Huang, Hao-Jen; Kanai, Akio; Gojobori, Takashi; Chiang, Tzen-Yuh; Sun, H Sunny; Li, Wen-Hsiung; Okada, Norihiro (PLoS genetics, Public Library of Science (PLoS), 2020-06-20) [Article]
    Conspecific male animals fight for resources such as food and mating opportunities but typically stop fighting after assessing their relative fighting abilities to avoid serious injuries. Physiologically, how the fighting behavior is controlled remains unknown. Using the fighting fish Betta splendens, we studied behavioral and brain-transcriptomic changes during the fight between the two opponents. At the behavioral level, surface-breathing, and biting/striking occurred only during intervals between mouth-locking. Eventually, the behaviors of the two opponents became synchronized, with each pair showing a unique behavioral pattern. At the physiological level, we examined the expression patterns of 23,306 brain transcripts using RNA-sequencing data from brains of fighting pairs after a 20-min (D20) and a 60-min (D60) fight. The two opponents in each D60 fighting pair showed a strong gene expression correlation, whereas those in D20 fighting pairs showed a weak correlation. Moreover, each fighting pair in the D60 group showed pair-specific gene expression patterns in a grade of membership analysis (GoM) and were grouped as a pair in the heatmap clustering. The observed pair-specific individualization in brain-transcriptomic synchronization (PIBS) suggested that this synchronization provides a physiological basis for the behavioral synchronization. An analysis using the synchronized genes in fighting pairs of the D60 group found genes enriched for ion transport, synaptic function, and learning and memory. Brain-transcriptomic synchronization could be a general phenomenon and may provide a new cornerstone with which to investigate coordinating and sustaining social interactions between two interacting partners of vertebrates.
  • Analysis of transcript-deleterious variants in Mendelian disorders: implications for RNA-based diagnostics.

    Maddirevula, Sateesh; Kuwahara, Hiroyuki; Ewida, Nour; Shamseldin, Hanan E; Patel, Nisha; AlZahrani, Fatema; AlSheddi, Tarfa; AlObeid, Eman; Alenazi, Mona; Alsaif, Hessa S; Alqahtani, Maha; AlAli, Maha; Al Ali, Hatoon; Helaby, Rana; Ibrahim, Niema; Abdulwahab, Firdous; Hashem, Mais; Hanna, Nadine; Monies, Dorota; Derar, Nada; Alsagheir, Afaf; Alhashem, Amal; Alsaleem, Badr; Alhebbi, Hamoud; Wali, Sami; Umarov, Ramzan; Gao, Xin; Alkuraya, Fowzan S. (Genome biology, Springer Science and Business Media LLC, 2020-06-20) [Article]
    BACKGROUND:At least 50% of patients with suspected Mendelian disorders remain undiagnosed after whole-exome sequencing (WES), and the extent to which non-coding variants that are not captured by WES contribute to this fraction is unclear. Whole transcriptome sequencing is a promising supplement to WES, although empirical data on the contribution of RNA analysis to the diagnosis of Mendelian diseases on a large scale are scarce. RESULTS:Here, we describe our experience with transcript-deleterious variants (TDVs) based on a cohort of 5647 families with suspected Mendelian diseases. We first interrogate all families for which the respective Mendelian phenotype could be mapped to a single locus to obtain an unbiased estimate of the contribution of TDVs at 18.9%. We examine the entire cohort and find that TDVs account for 15% of all "solved" cases. We compare the results of RT-PCR to in silico prediction. Definitive results from RT-PCR are obtained from blood-derived RNA for the overwhelming majority of variants (84.1%), and only a small minority (2.6%) fail analysis on all available RNA sources (blood-, skin fibroblast-, and urine renal epithelial cells-derived), which has important implications for the clinical application of RNA-seq. We also show that RNA analysis can establish the diagnosis in 13.5% of 155 patients who had received "negative" clinical WES reports. Finally, our data suggest a role for TDVs in modulating penetrance even in otherwise highly penetrant Mendelian disorders. CONCLUSIONS:Our results provide much needed empirical data for the impending implementation of diagnostic RNA-seq in conjunction with genome sequencing.
  • A self-adaptive deep learning algorithm for accelerating multi-component flash calculation

    Zhang, Tao; Li, Yu; Li, Yiteng; Sun, Shuyu; Gao, Xin (Computer Methods in Applied Mechanics and Engineering, Elsevier BV, 2020-06-11) [Article]
    In this paper, the first self-adaptive deep learning algorithm is proposed in details to accelerate flash calculations, which can quantitatively predict the total number of phases in the mixture and related thermodynamic properties at equilibrium for realistic reservoir fluids with a large number of components under various environmental conditions. A thermodynamically consistent scheme for phase equilibrium calculation is adopted and implemented at specified moles, volume and temperature, and the flash results are used as the ground truth for training and testing the deep neural network. The critical properties of each component are considered as the input features of the neural network and the final output is the total number of phases at equilibrium and the molar compositions in each phase. Two network structures are well designed, one of which transforms the input of various numbers of components in the training and the objective fluid mixture into a unified space before entering the productive neural network. “Ghost components” are defined and introduced to process the data padding work in order to modify the dimension of input flash calculation data to meet the training and testing requirements of the target fluid mixture. Hyperparameters on both two neural networks are carefully tuned in order to ensure the physical correlations underneath the input parameters are preserved properly through the learning process. This combined structure can make our deep learning algorithm to be self-adaptive to the change of input components and dimensions. Furthermore, two Softmax functions are used in the last layer to enforce the constraint that the summation of mole fractions in each phase is equal to 1. An example is presented that the flash calculation results of a 8-component Eagle Ford oil is used as input to estimate the phase equilibrium state of a 14-component Eagle Ford oil. The results are satisfactory with very small estimation errors. The capability of the proposed deep learning algorithm is also verified that simultaneously completes phase stability test and phase splitting calculation. Remarks are concluded at the end to provide some guidance for further research in this direction, especially the potential application of newly developed neural network models.
  • Generative adversarial network-based super-resolution of diffusion-weighted imaging: Application to tumour radiomics in breast cancer.

    Fan, Ming; Liu, Zuhui; Xu, Maosheng; Wang, Shiwei; Zeng, Tieyong; Gao, Xin; Li, Lihua (NMR in biomedicine, Wiley, 2020-06-11) [Article]
    Diffusion-weighted imaging (DWI) is increasingly used to guide the clinical management of patients with breast tumours. However, accurate tumour characterization with DWI and the corresponding apparent diffusion coefficient (ADC) maps are challenging due to their limited resolution. This study aimed to produce super-resolution (SR) ADC images and to assess the clinical utility of these SR images by performing a radiomic analysis for predicting the histologic grade and Ki-67 expression status of breast cancer. To this end, 322 samples of dynamic enhanced magnetic resonance imaging (DCE-MRI) and the corresponding DWI data were collected. A SR generative adversarial (SRGAN) and an enhanced deep SR (EDSR) network along with the bicubic interpolation were utilized to generate SR-ADC images from which radiomic features were extracted. The dataset was randomly separated into a development dataset (n = 222) to establish a deep SR model using DCE-MRI and a validation dataset (n = 100) to improve the resolution of ADC images. This random separation of datasets was performed 10 times, and the results were averaged. The EDSR method was significantly better than the SRGAN and bicubic methods in terms of objective quality criteria. Univariate and multivariate predictive models of radiomic features were established to determine the area under the receiver operating characteristic curve (AUC). Individual features from the tumour SR-ADC images showed a higher performance with the EDSR and SRGAN methods than with the bicubic method and the original images. Multivariate analysis of the collective radiomics showed that the EDSR- and SRGAN-based SR-ADC images performed better than the bicubic method and original images in predicting either Ki-67 expression levels (AUCs of 0.818 and 0.801, respectively) or the tumour grade (AUCs of 0.826 and 0.828, respectively). This work demonstrates that in addition to improving the resolution of ADC images, deep SR networks can also improve tumour image-based diagnosis in breast cancer.
  • Malaria parasites regulate intra-erythrocytic development duration via serpentine receptor 10 to coordinate with host rhythms

    Subudhi, Amit; O’Donnell, Aidan J.; Ramaprasad, Abhinay; Abkallo, Hussein M.; Kaushik, Abhinav; Ansari, Hifzur Rahman; Abdel-Haleem, Alyaa M.; Rached, Fathia Ben; Kaneko, Osamu; Culleton, Richard; Reece, Sarah E.; Pain, Arnab (Nature Communications, Springer Science and Business Media LLC, 2020-06-02) [Article]
    Malaria parasites complete their intra-erythrocytic developmental cycle (IDC) in multiples of 24 h suggesting a circadian basis, but the mechanism controlling this periodicity is unknown. Combining in vivo and in vitro approaches utilizing rodent and human malaria parasites, we reveal that: (i) 57% of Plasmodium chabaudi genes exhibit daily rhythms in transcription; (ii) 58% of these genes lose transcriptional rhythmicity when the IDC is out-of-synchrony with host rhythms; (iii) 6% of Plasmodium falciparum genes show 24 h rhythms in expression under free-running conditions; (iv) Serpentine receptor 10 (SR10) has a 24 h transcriptional rhythm and disrupting it in rodent malaria parasites shortens the IDC by 2-3 h; (v) Multiple processes including DNA replication, and the ubiquitin and proteasome pathways, are affected by loss of coordination with host rhythms and by disruption of SR10. Our results reveal malaria parasites are at least partly responsible for scheduling the IDC and coordinating their development with host daily rhythms.
  • Novel Missense Variant in Heterozygous State in the BRPF1 Gene Leading to Intellectual Developmental Disorder With Dysmorphic Facies and Ptosis.

    Naseer, Muhammad Imran; Abdulkareem, Angham Abdulrahman; Guzmán-Vega, Francisco J.; Arold, Stefan T.; Pushparaj, Peter Natesan; Chaudhary, Adeel G; AlQahtani, Mohammad H (Frontiers in genetics, Frontiers Media SA, 2020-05-28) [Article]
    Intellectual developmental disorder with dysmorphic facies and ptosis is an autosomal dominant condition characterized by delayed psychomotor development, intellectual disability, delayed speech, and dysmorphic facial features, mostly ptosis. Heterozygous mutations in bromodomain and plant homeodomain (PHD) finger containing one (BRPF1) gene have been reported. In this study, whole exome sequencing (WES) was performed as a molecular diagnostic test. Bioinformatics of WES data and candidate gene prioritization identified a novel variant in heterozygous state in the exon 3 of BRPF1 gene (ENST383829: c.1054G > C and p.Val352Leu). Autosomal dominant inheritance in the family affected individuals and exclusion of non-pathogenicity in the ethnically matched healthy controls (n = 100) were performed by Sanger sequencing. To the best of our knowledge, this is the first evidence of BRPF1 variant in a Saudi family. Whole exome sequencing analysis has been proven as a valuable tool in the molecular diagnostics. Our findings further expand the role of WES in efficient disease diagnosis in Arab families and explained that the mutation in BRPF1 gene plays an important role for the development of IDDFP syndrome.
  • Mg2+ Is a Missing Link in Plant Cell Ca2+ Signalling and Homeostasis—A Study on Vicia faba Guard Cells

    Lemtiri-Chlieh, Fouad; Arold, Stefan T.; Gehring, Christoph A (International Journal of Molecular Sciences, MDPI AG, 2020-05-27) [Article]
    <jats:p>Hyperpolarization-activated calcium channels (HACCs) are found in the plasma membrane and tonoplast of many plant cell types, where they have an important role in Ca2+-dependent signalling. The unusual gating properties of HACCs in plants, i.e., activation by membrane hyperpolarization rather than depolarization, dictates that HACCs are normally open in the physiological hyperpolarized resting membrane potential state (the so-called pump or P-state); thus, if not regulated, they would continuously leak Ca2+ into cells. HACCs are permeable to Ca2+, Ba2+, and Mg2+; activated by H2O2 and the plant hormone abscisic acid (ABA); and their activity in guard cells is greatly reduced by increasing amounts of free cytosolic Ca2+ ([Ca2+]Cyt), and hence closes during [Ca2+]Cyt surges. Here, we demonstrate that the presence of the commonly used Mg-ATP inside the guard cell greatly reduces HACC activity, especially at voltages ≤ −200 mV, and that Mg2+ causes this block. Therefore, we firstly conclude that physiological cytosolic Mg2+ levels affect HACC gating and that channel opening requires either high negative voltages (≥ −200 mV) or displacement of Mg2+ away from the immediate vicinity of the channel. Secondly, based on structural comparisons with a Mg2+-sensitive animal inward-rectifying K+ channel, we propose that the likely candidate HACCs described here are cyclic nucleotide gated channels (CNGCs), many of which also contain a conserved diacidic Mg2+ binding motif within their pores. This conclusion is consistent with the electrophysiological data. Finally, we propose that Mg2+, much like in animal cells, is an important component in Ca2+ signalling and homeostasis in plants.</jats:p>
  • Computational Drug-target Interaction Prediction based on Graph Embedding and Graph Mining

    Thafar, Maha A.; Albaradie, Somayah; Olayan, Rawan S.; Ashoor, Haitham; Essack, Magbubah; Bajic, Vladimir B. (ACM, 2020-05-22) [Conference Paper]
    Identification of interactions of drugs and proteins is an essential step in the early stages of drug discovery and in finding new drug uses. Traditional experimental identification and validation of these interactions are still time-consuming, expensive, and do not have a high success rate. To improve this identification process, development of computational methods to predict and rank likely drug-target interactions (DTI) with minimum error rate would be of great help. In this work, we propose a computational method for (Drug-Target interaction prediction using Graph Embedding and graph Mining), DTiGEM. DTiGEM models identify novel DTIs as a link prediction problem in a heterogeneous graph constructed by integrating three networks, namely: drug-drug similarity, target-target similarity, and known DTIs. DTiGEM combines different techniques, including graph embeddings (e.g., node2vec), graph mining (e.g., path scores between drugs and targets), and machine learning (e.g., different classifiers). DTiGEM achieves improvement in the prediction performance compared to other state-of-the-art methods for computational prediction of DTIs on four benchmark datasets in terms of area under precision-recall curve (AUPR). Specifically, we demonstrate that based on the average AUPR score across all benchmark datasets, DTiGEM achieves the highest average AUPR value (0.831), thus reducing the prediction error by 22.4% relative to the second-best performing method in the comparison.
  • PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation

    Chen, Feiyang; Jiang, Ying; Zeng, Xiangrui; Zhang, Jing; Gao, Xin; Xu, Min (Algorithms, MDPI AG, 2020-05-20) [Article]
    <jats:p>Salient segmentation is a critical step in biomedical image analysis, aiming to cut out regions that are most interesting to humans. Recently, supervised methods have achieved promising results in biomedical areas, but they depend on annotated training data sets, which requires labor and proficiency in related background knowledge. In contrast, unsupervised learning makes data-driven decisions by obtaining insights directly from the data themselves. In this paper, we propose a completely unsupervised self-aware network based on pre-training and attentional backpropagation for biomedical salient segmentation, named as PUB-SalNet. Firstly, we aggregate a new biomedical data set from several simulated Cellular Electron Cryo-Tomography (CECT) data sets featuring rich salient objects, different SNR settings, and various resolutions, which is called SalSeg-CECT. Based on the SalSeg-CECT data set, we then pre-train a model specially designed for biomedical tasks as a backbone module to initialize network parameters. Next, we present a U-SalNet network to learn to selectively attend to salient objects. It includes two types of attention modules to facilitate learning saliency through global contrast and local similarity. Lastly, we jointly refine the salient regions together with feature representations from U-SalNet, with the parameters updated by self-aware attentional backpropagation. We apply PUB-SalNet for analysis of 2D simulated and real images and achieve state-of-the-art performance on simulated biomedical data sets. Furthermore, our proposed PUB-SalNet can be easily extended to 3D images. The experimental results on the 2d and 3d data sets also demonstrate the generalization ability and robustness of our method.</jats:p>
  • Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies

    Slater, Luke T; Gkoutos, Georgios V; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2020-05-17) [Preprint]
    <jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combine sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. We test the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes.</jats:p><jats:p>We design and implement a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying the minimal set of axioms that, when removed, result in a consistent and coherent set of ontologies. We applied our algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.</jats:p></jats:sec>
  • Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA

    Albaradei, Somayah; Magana-Mora, Arturo; Thafar, Maha; Uludag, Mahmut; Bajic, Vladimir B.; Gojobori, Takashi; Essack, Magbubah; Jankovic, Boris R. (Gene: X, Elsevier BV, 2020-05-13) [Article]
    Background: The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases. Although various models have been proposed for the in silico prediction of SS, improving their accuracy is required for reliable annotation. Moreover, models are often derived and tested using the same genome, providing no evidence of broad application, i.e. to other poorly studied genomes. Results: With this in mind, we developed the Splice2Deep models for SS detection. Each model is an ensemble of deep convolutional neural networks. We evaluated the performance of the models based on the ability to detect SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate that the models efficiently detect SS in other organisms not considered during the training of the models. Compared to the state-of-the-art tools, Splice2Deep models achieved significantly reduced average error rates of 41.97% and 28.51% for acceptor and donor SS, respectively. Moreover, the Splice2Deep cross-organism validation demonstrates that models correctly identify conserved genomic elements enabling annotation of SS in new genomes by choosing the taxonomically closest model. Conclusions: The results of our study demonstrated that Splice2Deep both achieved a considerably reduced error rate compared to other state-of-the-art models and the ability to accurately recognize SS in other organisms for which the model was not trained, enabling annotation of poorly studied or newly sequenced genomes. Splice2Deep models are implemented in Python using Keras API; the models and the data are available at https://github.com/SomayahAlbaradei/Splice_Deep.git.
  • Machine learning with biomedical ontologies

    Kulmanov, Maxat; Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2020-05-08) [Preprint]
    <jats:p>Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge, and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in biomedical ontologies, and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/bio-ontology-research-group/machine-learning-with-ontologies">https://github.com/bio-ontology-research-group/machine-learning-with-ontologies</jats:ext-link>.</jats:p><jats:sec><jats:title>Key points</jats:title><jats:list list-type="bullet"><jats:list-item><jats:p>Ontologies provide background knowledge that can be exploited in machine learning models.</jats:p></jats:list-item><jats:list-item><jats:p>Ontology embeddings are structure-preserving maps from ontologies into vector spaces and provide an important method for utilizing ontologies in machine learning. Embeddings can preserve different structures in ontologies, including their graph structures, syntactic regularities, or their model-theoretic semantics.</jats:p></jats:list-item><jats:list-item><jats:p>Axioms in ontologies, in particular those involving negation, can be used as constraints in optimization and machine learning to reduce the search space.</jats:p></jats:list-item></jats:list></jats:sec>
  • Facile Synthesis of NH-Free 5-(Hetero)Aryl-Pyrrole-2-Carboxylates by Catalytic C–H Borylation and Suzuki Coupling

    Kanwal, Saba; Ann, Noor-ul-; Fatima, Saman; Emwas, Abdul-Hamid M.; Alazmi, Meshari; Gao, Xin; Ibrar, Maha; Zaib Saleem, Rahman Shah; Chotana, Ghayoor Abbas (Molecules, MDPI AG, 2020-05-05) [Article]
    <jats:p>A convenient two-step preparation of NH-free 5-aryl-pyrrole-2-carboxylates is described. The synthetic route consists of catalytic borylation of commercially available pyrrole-2-carboxylate ester followed by Suzuki coupling without going through pyrrole N–H protection and deprotection steps. The resulting 5-aryl substituted pyrrole-2-carboxylates were synthesized in good- to excellent yields. This synthetic route can tolerate a variety of functional groups including those with acidic protons on the aryl bromide coupling partner. This methodology is also applicable for cross-coupling with heteroaryl bromides to yield pyrrole-thiophene, pyrrole-pyridine, and 2,3’-bi-pyrrole based bi-heteroaryls.</jats:p>
  • Role of 1’-Ribose Cyano Substitution for Remdesivir to Effectively Inhibit both Nucleotide Addition and Proofreading in SARS-CoV-2 Viral RNA Replication

    Zhang, Lu; Zhang, Dong; Yuan, Congmin; Wang, Xiaowei; Li, Yongfang; Jia, Xilin; Gao, Xin; Yen, Hui-Ling; Cheung, Peter Pak-Hang; Huang, Xuhui (Cold Spring Harbor Laboratory, 2020-04-28) [Preprint]
    <jats:title>Abstract</jats:title><jats:p>COVID-19 has recently caused a global health crisis and an effective interventional therapy is urgently needed. SARS-CoV-2 RNA-dependent RNA polymerase (RdRp) provides a promising but challenging drug target due to its intrinsic proofreading exoribonuclease (ExoN) function. Nucleoside triphosphate (NTP) analogues added to the growing RNA chain should supposedly terminate viral RNA replication, but ExoN can cleave the incorporated compounds and counteract their efficacy. Remdesivir targeting SARS-CoV-2 RdRp exerts high drug efficacy <jats:italic>in vitro</jats:italic> and <jats:italic>in vivo</jats:italic>. However, its underlying inhibitory mechanisms remain elusive. Here, we performed all-atom molecular dynamics (MD) simulations with an accumulated simulation time of 12.6 microseconds to elucidate the molecular mechanisms underlying the inhibitory effects of remdesivir in nucleotide addition (RdRp complex: nsp12-nsp7-nsp8) and proofreading (ExoN complex: nsp14-nsp10). We found that the 1’-cyano group of remdesivir possesses the dual role of inhibiting both nucleotide addition and proofreading. For nucleotide addition, we showed that incorporation of one remdesivir is not sufficient to terminate RNA synthesis. Instead, the presence of the polar 1’-cyano group of remdesivir at an upstream site causes instability via its electrostatic interactions with a salt bridge formed by Asp865 and Lys593, rendering translocation unfavourable. This may eventually lead to a delayed chain termination of RNA extension by three nucleotides. For proofreading, remdesivir can inhibit cleavage via the steric clash between the 1’-cyano group and Asn104. To further examine the role of 1’-cyano group in remdesivir’s inhibitory effects, we studied three additional NTP analogues with other types of modifications: favipiravir, vidarabine, and fludarabine. Our simulations suggest that all three of them are prone to ExoN cleavage. Our computational findings were further supported by an <jats:italic>in vitro</jats:italic> assay in Vero E6 cells using live SARS-CoV-2. The dose-response curves suggest that among tested NTP analogues, only remdesivir exerts significant inhibitory effects on viral replication. Our work provides plausible mechanisms at molecular level on how remdesivir inhibits viral RNA replication, and our findings may guide rational design for new treatments of COVID-19 targeting viral replication.</jats:p>
  • Self-normalizing learning on biomedical ontologies using a deep Siamese neural network

    Smaili, Fatima Z.; Gao, Xin; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2020-04-25) [Preprint]
    <jats:p>Motivation:Ontologies are widely used in biomedicine for the annotation and standardization of data.One of the main roles of ontologies is to provide structured background knowledge within a domain as well as a set of labels, synonyms, and definitions for the classes within a domain. The two types of information provided by ontologies have been extensively exploited in natural language processing and machine learning applications. However, they are commonly used separately, and thus it is unknown if joining the two sources of information can further benefit data analysis tasks. Results:We developed a novel method that applies named entity recognition and normalization methods on texts to connect the structured information in biomedical ontologies with the information contained in natural language. We apply this normalization both to literature and to the natural language information contained within ontologies themselves. The normalized ontologies and text are then used to generate embeddings, and relations between entities are predicted using a deep Siamese neural network model that takes these embeddings as input. We demonstrate that our novel embedding and prediction method using self normalized biomedical ontologies significantly outperforms the state of the art methods in embedding ontologies on two benchmark tasks: prediction of interactions between proteins and prediction of gene disease associations. Our method also allows us to apply ontology based annotations and axioms to the prediction of toxicological effects of chemicals where our method shows superior performance. Our method is generic and can be applied in scenarios where ontologies consisting of both structured information and natural language labels or synonyms are used.</jats:p>
  • Prediction of novel virus-host interactions by integrating clinical symptoms and protein sequences

    Liu-Wei, Wang; Kafkas, Senay; Chen, Jun; Tegner, Jesper; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2020-04-25) [Preprint]
    <jats:p>Motivation: Infectious diseases from novel viruses are becoming a major public health concern. Fast identification of virus-host interactions can reveal mechanistic insights of infectious diseases and shed light on potential treatments and drug discoveries. Current computational prediction methods for novel viruses are based only on protein sequences. Yet, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts. Results: We developed DeepViral, a deep learning method that predicts potential protein-protein interactions between human and viruses. First, human proteins and viruses were embedded in a shared space using their associated phenotypes, functions, taxonomic classification, as well as formalized background knowledge from biomedical ontologies. By extending a sequence learning model with phenotype features, our model can not only significantly improve over previous sequence-based approaches for inter-species interaction prediction, but also identify pathways of viral targets under a realistic experimental setup for novel viruses. Availability:https://github.com/bio-ontology-research-group/DeepViral</jats:p>
  • Molecular Basis for the Adaptive Evolution of Environment Sensing by H-NS Proteins

    Zhao, Xiaochuan; Kharchenko, Vladlena; Shahul Hameed, Umar Farook; Liao, Chenyi; Huser, Franceline; Remington, Jacob M; Radhakrishnan, Anand; Jaremko, Mariusz; Jaremko, Lukasz; Arold, Stefan T.; Li, Jianing (Cold Spring Harbor Laboratory, 2020-04-24) [Preprint]
    <jats:p>The DNA-binding protein H-NS is a pleiotropic gene regulator in gram-negative bacteria. Through its capacity to sense temperature and other environmental factors, H-NS allows pathogens like Salmonella to adapt their gene expression, and hence toxicity and biological responses, to their presence inside or outside warm-blooded hosts. To investigate how this sensing mechanism may have evolved to fit different bacterial lifestyles, we compared H-NS orthologs from bacteria that infect humans, plants, and insects, and from bacteria that live on a deep-sea hypothermal vent. The combination of biophysical characterization, high-resolution proton-less NMR spectroscopy and molecular simulations revealed, at an atomistic level, how the same general mechanism was adapted to specific habitats and lifestyles. In particular, we demonstrate how environment-sensing characteristics arise from specifically positioned intra- or intermolecular electrostatic interactions. Our integrative approach clarified the mechanism for H-NS–mediated environmental sensing and suggests that it resulted from the exaptation of an ancestral protein feature.</jats:p>

View more