### Recent Submissions

• #### Identifying Novel Drug Targets by iDTPnd: A Case Study of Kinase Inhibitors.

(Genomics, proteomics & bioinformatics, Elsevier BV, 2021-04-01) [Article]
Current FDA-approved kinase inhibitors cause diverse adverse effects, some of which are due to the mechanism-independent effects of these drugs. Identifying these mechanism-independent interactions could improve drug safety and support drug repurposing. We have developed iDTPnd (integrated Drug Target Predictor with negative dataset), a computational approach for large-scale discovery of novel targets for known drugs. For a given drug, we construct a positive and a negative structural signature that captures the weakly conserved structural features of drug binding sites. To facilitate assessment of unintended targets, iDTPnd also provides a docking-based interaction score and its statistical significance. We were able to confirm the interaction of sorafenib, imatinib, dasatinib, sunitinib, and pazopanib with their known targets at a sensitivity and specificity of 52% and 55%, respectively. We have validated 10 predicted novel targets by using in vitro experiments. Our results suggest that proteins other than kinases, such as nuclear receptors, cytochrome P450, or MHC Class I molecules can also be physiologically relevant targets of kinase inhibitors. Our method is general and broadly applicable for the identification of protein-small molecule interactions, when sufficient drug-target 3D data are available. The code for constructing the structural signature is available at https://sfb.kaust.edu.sa/Documents/iDTP.zip.
• #### Towards Similarity-based Differential Diagnostics For Common Diseases

(Computers in Biology and Medicine, Elsevier BV, 2021-04-01) [Article]
Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
• #### Synergy and allostery in ligand binding by HIV-1 Nef

(Biochemical Journal, Portland Press Ltd., 2021-03-31) [Article]
The Nef protein of human and simian immunodeficiency viruses boosts viral pathogenicity through its interactions with host cell proteins. By combining the polyvalency of its large unstructured regions with the binding selectivity and strength of its folded core domain, Nef can associate with many different host cell proteins, thereby disrupting their functions. For example, the combination of a linear proline-rich motif and hydrophobic core domain surface allows Nef to bind tightly and specifically to SH3 domains of Src family kinases. We investigated whether the interplay between Nef’s flexible regions and its core domain could allosterically influence ligand selection. We found that the flexible regions can associate with the core domain in different ways, producing distinct conformational states that alter the way in which Nef selects for SH3 domains and exposes some of its binding motifs. The ensuing crosstalk between ligands might promote functionally coherent Nef-bound protein ensembles by synergizing certain subsets of ligands while excluding others. We also combined proteomic and bioinformatics analyses to identify human proteins that select SH3 domains in the same way as Nef. We found that only 3% of clones from a whole-human fetal library displayed Nef-like SH3 selectivity. However, in most cases, this selectivity appears to be achieved by a canonical linear interaction rather than by a Nef-like “tertiary” interaction. Our analysis supports the contention that Nef’s mode of hijacking SH3 domains is a virus-specific adaptation with no or very few cellular counterparts. Thus, the Nef tertiary binding surface is a promising virus-specific drug target.
• #### Time scale observability and constructibility of linear dynamic equations

(International Journal of Control, Informa UK Limited, 2021-03-25) [Article]
This paper investigates the observability and constructibility problems of time-varying linear dynamic equations using time scale theory. First, we define observability, reachability and constructibility operators on time scales. Some necessary and sufficient conditions are proposed to ensure the observability on non-uniform time domains based on some linear algebra tools. Then, constructibility is also examined using the same approach. Moreover, the link between observability and constructibility concepts on arbitrary time sets is discussed. Further, the observability and reachability duality relationship for time-varying linear systems on time scales is established. The current work unifies and extends some existing results given for standard cases (i.e. the continuous line and the discrete time domain) to non-uniform time domains. Finally, the obtained results are described with an illustrative example.
• #### Bi-allelic variants in HOPS complex subunit VPS41 cause cerebellar ataxia and abnormal membrane trafficking.

(Brain : a journal of neurology, Oxford University Press (OUP), 2021-03-25) [Article]
Membrane trafficking is a complex, essential process in eukaryotic cells responsible for protein transport and processing. Deficiencies in vacuolar protein sorting (VPS) proteins, key regulators of trafficking, cause abnormal intracellular segregation of macromolecules and organelles and are linked to human disease. VPS proteins function as part of complexes such as the homotypic fusion and vacuole protein sorting (HOPS) tethering complex, composed of VPS11, VPS16, VPS18, VPS33A, VPS39 and VPS41. The HOPS-specific subunit VPS41 has been reported to promote viability of dopaminergic neurons in Parkinson's disease but to date has not been linked to human disease. Here, we describe five unrelated families with nine affected individuals, all carrying homozygous variants in VPS41 that we show impact protein function. All affected individuals presented with a progressive neurodevelopmental disorder consisting of cognitive impairment, cerebellar atrophy/hypoplasia, motor dysfunction with ataxia and dystonia, and nystagmus. Zebrafish disease modelling supports the involvement of VPS41 dysfunction in the disorder, indicating lysosomal dysregulation throughout the brain and providing support for cerebellar and microglial abnormalities when vps41 was mutated. This provides the first example of human disease linked to the HOPS-specific subunit VPS41 and suggests the importance of HOPS complex activity for cerebellar function.
• #### Fractional-order model representations of apparent vascular compliance as an alternative in the analysis of arterial stiffness: an in-silico study

(Physiological measurement, IOP Publishing, 2021-03-24) [Article]
Recent studies have demonstrated the advantages of fractional order calculus tools for probing the viscoelastic properties of collagenous tissue, characterizing the arterial blood ﬂow and red cell membrane mechanics, and modeling the aortic valve cusp. In this article, we present a novel lumped parameter equivalent circuit models of the apparent arterial compliance using a fractional-order capacitor (FOC). FOC, which generalizes capacitors and resistors, displays a fractional-order behavior that can capture both elastic and viscous properties through a power-law formulation. The proposed framework describes the dynamic relationship between the blood pressure input and blood volume, using linear fractional-order diﬀerential equations. The results show that the proposed models present reasonable ﬁt performance with in-silico data of more than 4,000 subjects. Additionally, strong correlations have been identiﬁed between the fractional-order parameter estimates and the central hemodynamic determinants as well as pulse wave velocity indexes. Therefore, fractional-order based paradigm of arterial compliance shows prominent potential as an alternative tool in the analysis of arterial stiﬀness.
• #### MYH1 is a candidate gene for recurrent rhabdomyolysis in humans

(American Journal of Medical Genetics Part A, Wiley, 2021-03-23) [Article]
Rhabdomyolysis is a serious medical condition characterized by muscle injury, and there are recognized genetic causes especially in recurrent forms. The majority of these cases, however, remain unexplained. Here, we describe a patient with recurrent rhabdomyolysis in whom extensive clinical testing failed to identify a likely etiology. Whole-exome sequencing revealed a novel missense variant in MYH1, which encodes a major adult muscle fiber protein. Structural biology analysis revealed that the mutated residue is extremely well conserved and is located in the actin binding cleft. Furthermore, immediately adjacent mutations in that cleft in other myosins are pathogenic in humans. Our results are consistent with the finding that MYH1 is mutated in rhabdomyolysis in horses and suggest that this gene should be investigated in cases with recurrent rhabdomyolysis.
• #### Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach

(Journal of Cheminformatics, Springer Science and Business Media LLC, 2021-03-23) [Article]
AbstractTwo-dimensional (2D) chemical fingerprints are widely used as binary features for the quantification of structural similarity of chemical compounds, which is an important step in similarity-based virtual screening (VS). Here, using an eigenvalue-based entropy approach, we identified 2D fingerprints with little to no contribution to shaping the eigenvalue distribution of the feature matrix as related ones and examined the degree to which these related 2D fingerprints influenced molecular similarity scores calculated with the Tanimoto coefficient. Our analysis identified many related fingerprints in publicly available fingerprint schemes and showed that their presence in the feature set could have substantial effects on the similarity scores and bias the outcome of molecular similarity analysis. Our results have implication in the optimal selection of 2D fingerprints for compound similarity analysis and the identification of potential hits for compounds with target biological activity in VS.
• #### Biomedical computing in the Arab world

(Communications of the ACM, Association for Computing Machinery (ACM), 2021-03-22) [Article]
HEALTH CHALLENGES REPRESENT one of the longstanding issues in the Arab region that hinder its ability to develop. Prevalence of diseases such as cardiovascular diseases, liver cirrhosis and cancer among many others has contributed to the deteriorated health status across the region leading to lower life expectancy compared to other regions. For instance, the average life expectancy in the Arab world is approximately 70 years, which is at least 10 years lower than most high-income countries.
• #### Chromatin phosphoproteomics unravels a function for AT-hook motif nuclear localized protein AHL13 in PAMP-triggered immunity

(NCBI, 2021-03-22) [Bioproject, Dataset]
We report the transcriptome composition of ahl13-1 compared to WT (col-0) plant without treatment and after Pst hrcC-application Overall design: Illumina high-sequencing plateform was used to analyse the transcriptome composition of col0 and ahl13-1 under treated and untreated conditions. col0 samples are in GEO Series GSE118854.
• #### Fish Growth Trajectory Tracking via Reinforcement Learning in Precision Aquaculture

(arXiv, 2021-03-12) [Preprint]
This paper studies the fish growth trajectory tracking via reinforcement learning under a representative bioenergetic growth model. Due to the complex aquaculture condition and uncertain environmental factors such as temperature, dissolved oxygen, un-ionized ammonia, and strong nonlinear couplings, including multi-inputs of the fish growth model, the growth trajectory tracking problem can not be efficiently solved by the model-based control approaches in precision aquaculture. To this purpose, we formulate the growth trajectory tracking problem as sampled-data optimal control using discrete state-action pairs Markov decision process. We propose two Q-learning algorithms that learn the optimal control policy from the sampled data of the fish growth trajectories at every stage of the fish life cycle from juveniles to the desired market weight in the aquaculture environment. The Q-learning scheme learns the optimal feeding control policy to fish growth rate cultured in cages and the optimal feeding rate control policy with an optimal temperature profile for the aquaculture fish growth rate in tanks. The simulation results demonstrate that both Q-learning strategies achieve high trajectory tracking performance with less amount feeding rates.
• #### 1'-Ribose cyano substitution allows Remdesivir to effectively inhibit nucleotide addition and proofreading during SARS-CoV-2 viral RNA replication.

(Physical chemistry chemical physics : PCCP, Royal Society of Chemistry (RSC), 2021-03-10) [Article]
COVID-19 has recently caused a global health crisis and an effective interventional therapy is urgently needed. Remdesivir is one effective inhibitor for SARS-CoV-2 viral RNA replication. It supersedes other NTP analogues because it not only terminates the polymerization activity of RNA-dependent RNA polymerase (RdRp), but also inhibits the proofreading activity of intrinsic exoribonuclease (ExoN). Even though the static structure of Remdesivir binding to RdRp has been solved and biochemical experiments have suggested it to be a "delayed chain terminator", the underlying molecular mechanisms is not fully understood. Here, we performed all-atom molecular dynamics (MD) simulations with an accumulated simulation time of 24 microseconds to elucidate the inhibitory mechanism of Remdesivir on nucleotide addition and proofreading. We found that when Remdesivir locates at an upstream site in RdRp, the 1'-cyano group experiences electrostatic interactions with a salt bridge (Asp865-Lys593), which subsequently halts translocation. Our findings can supplement the current understanding of the delayed chain termination exerted by Remdesivir and provide an alternative molecular explanation about Remdesivir's inhibitory mechanism. Such inhibition also reduces the likelihood of Remdesivir to be cleaved by ExoN acting on 3'-terminal nucleotides. Furthermore, our study also suggests that Remdesivir's 1'-cyano group can disrupt the cleavage site of ExoN via steric interactions, leading to a further reduction in the cleavage efficiency. Our work provides plausible and novel mechanisms at the molecular level of how Remdesivir inhibits viral RNA replication, and our findings may guide rational design for new treatments of COVID-19 targeting viral replication.
• #### Complete Genome Sequence of Cellulomonas sp. JZ18, a Root Endophytic Bacterium Isolated from the Perennial Desert Tussock-Grass Panicum turgidum

(Current Microbiology, Springer Science and Business Media LLC, 2021-03-08) [Article]
Cellulomonas sp. JZ18 is a gram-positive, rod shaped bacterium that was previously isolated from the root endosphere of the perennial desert tussock-grass Panicum turgidum. Genome coverage of PacBio sequencing was approximately 199X. Genome assembly generated a single chromosome of 7,421,843 base pairs with a guanine-cytosine (GC) content of 75.60% with 3240 protein coding sequences, 361 pseudo genes, three ribosomal RNA operons, three non-coding RNAs and 45 transfer RNAs. Comparison of JZ18′s genome with type strains from the same genus, using digital DNA–DNA hybridization and average nucleotide identity calculations, revealed that JZ18 might potentially belong to a new species. Functional analysis revealed the presence of genes that may complement previously observed biochemical and plant phenotypes. Furthermore, the presence of a number of enzymes could be of potential use in industrial processes as biocatalysts. Genome sequencing and analysis, coupled with comparative genomics, of endophytic bacteria for their potential plant growth promoting activities under different soil conditions will accelerate the knowledge and applications of biostimulants in sustainable agriculture.
• #### DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes.

(Bioinformatics (Oxford, England), Oxford University Press (OUP), 2021-03-08) [Article]
MotivationInfectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.ResultsWe developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.AvailabilityCode and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
• #### DeepMOCCA: A pan-cancer prognostic model identifies personalized prognostic markers through graph attention and multi-omics data integration

(Cold Spring Harbor Laboratory, 2021-03-03) [Preprint]
Combining multiple types of genomic, transcriptional, proteomic, and epigenetic datasets has the potential to reveal biological mechanisms across multiple scales, and may lead to more accurate models for clinical decision support. Developing efficient models that can derive clinical outcomes from high-dimensional data remains problematical; challenges include the integration of multiple types of omics data, inclusion of biological background knowledge, and developing machine learning models that are able to deal with this high dimensionality while having only few samples from which to derive a model. We developed DeepMOCCA, a framework for multi-omics cancer analysis. We combine different types of omics data using biological relations between genes, transcripts, and proteins, combine the multi-omics data with background knowledge in the form of protein-protein interaction networks, and use graph convolution neural networks to exploit this combination of multi-omics data and background knowledge. DeepMOCCA predicts survival time for individual patient samples for 33 cancer types and outperforms most existing survival prediction methods. Moreover, DeepMOCCA includes a graph attention mechanism which prioritizes driver genes and prognostic markers in a patient-specific manner; the attention mechanism can be used to identify drivers and prognostic markers within cohorts and individual patients.
• #### DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

(Genomics, Proteomics & Bioinformatics, Elsevier BV, 2021-03-02) [Article]
Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic works have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in a same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, DeeReCT-APA, to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a CNN-LSTM architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can predict quantitatively the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and shed light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.
• #### Application and evaluation of knowledge graph embeddings in biomedical data

(PeerJ Computer Science, PeerJ, 2021-02-18) [Article]
Linked data and bio-ontologies enabling knowledge representation, standardization, and dissemination are an integral part of developing biological and biomedical databases. That is, linked data and bio-ontologies are employed in databases to maintain data integrity, data organization, and to empower search capabilities. However, linked data and bio-ontologies are more recently being used to represent information as multi-relational heterogeneous graphs, “knowledge graphs”. The reason being, entities and relations in the knowledge graph can be represented as embedding vectors in semantic space, and these embedding vectors have been used to predict relationships between entities. Such knowledge graph embedding methods provide a practical approach to data analytics and increase chances of building machine learning models with high prediction accuracy that can enhance decision support systems. Here, we present a comparative assessment and a standard benchmark for knowledge graph-based representation learning methods focused on the link prediction task for biological relations. We systematically investigated and compared state-of-the-art embedding methods based on the design settings used for training and evaluation. We further tested various strategies aimed at controlling the amount of information related to each relation in the knowledge graph and its effects on the final performance. We also assessed the quality of the knowledge graph features through clustering and visualization and employed several evaluation metrics to examine their uses and differences. Based on this systematic comparison and assessments, we identify and discuss the limitations of knowledge graph-based representation learning methods and suggest some guidelines for the development of more improved methods.
• #### Co-occurrence of mcr-1 and mcr-8 genes in a multidrug-resistant Klebsiella pneumoniae from a 2015 clinical isolate.

(International journal of antimicrobial agents, Elsevier BV, 2021-02-16) [Article]
Polymyxin E (colistin) is among the last-resort antibiotics used to treat carbapenem-resistant Enterobacteriaceae-related infections [1]. Colistin had been discontinued for human use since it was found to be associated with nephrotoxicity and neurotoxicity, however, since the 1990s, clinicians were required to reconsider the clinical value of a modified version as a last resort antibiotic. Currently, single plasmid-mediated mobile colistin resistance (mcr) gene is recognized as a global threat, but multiple mcr gene combinations in the same pathogen are rarely identified [2]. Membrane binding domains in MCR proteins are necessary for membrane charge modifications involved in the colistin resistance mechanism [3]. Wang et al. discussed the first mcr-8 containing isolate of Klebsiella pneumoniae (KP) with a high minimum inhibitory concentration (MIC) to colistin in a Chinese hospital in 2016, which was prior to the discovery of the mcr-8 gene in 2017 from a pig farm in China related to colistin usage in animals [3]. Typically, mcr-variants are investigated through molecular identification tools or next-generation sequencing (NGS), which is not routinely implemented in clinical molecular biology laboratories. Such studies identified a close evolutionary relationship between mcr and phosphoethanolamine transferase (eptA) of Neisseria that is known to induce colistin resistance [3]. Multiple variants of mcr genes including co-occurrences in the same bacterial strain in the clinic has been identified through epidemiological surveillance of disease-causing pathogens. Continuous monitoring and identification of clinical cases that harbor unique mcr-variants is essential for understanding and monitoring colistin resistant bacteria. In this study, we identified a unique combination of colistin-resistance genes in a multidrug-resistant (MDR) KP clinical isolate and explored the role of the variant mcr genes in colistin resistance.
• #### Transmission dynamics of SARS-CoV-2 on the Diamond Princess uncovered using viral genome sequence analysis.

(Gene, Elsevier BV, 2021-02-15) [Article]
An outbreak of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) occurred aboard the Diamond Princess cruise ship between her January 20 departure and late February 2020. Here, we used phylodynamic analyses to investigate the transmission dynamics of SARS-CoV-2 during the outbreak. Using a Bayesian coalescent-based method, the estimated mean nucleotide substitution rate of 240 SARS-CoV-2 whole-genome sequences was approximately 7.13 × 10$^{-4}$ substitutions per site per year. Population dynamics and the effective reproductive number (R$_{e}$) of SARS-CoV-2 infections were estimated using a Bayesian framework. The estimated origin of the outbreak was January 21, 2020. The infection spread substantially before quarantine on February 5. The R$_{e}$ peaked at 6.06 on February 4 and gradually declined to 1.51, suggesting that transmission continued slowly even after quarantine. These findings highlight the high transmissibility of SARS-CoV-2 and the need for effective measures to control outbreaks in confined settings.
• #### CovMT: an interactive SARS-CoV-2 mutation tracker, with a focus on critical variants.

(The Lancet. Infectious diseases, Elsevier BV, 2021-02-11) [Article]
The number of confirmed SARS-CoV-2 cases worldwide has now reached around 100 million, with 2·1 million reported deaths1 and more than 450 000 SARS-CoV-2 genomes already sequenced. It is vital to keep track of mutations in the genome of SARS-CoV-2, especially in the spike protein's receptor binding domain (RBD) region, which could potentially impact disease severity and treatment strategies.2, 3, 4 In the wake of a recent increase in cases with a more infective variant featuring an RBD mutation (N501Y, B.1.1.7) in the UK, countries worldwide are concerned about the spread of this or similar variants. Increasing sequencing efforts and user-friendly mutation tracking systems are needed for timely tracking of SARS-CoV-2 variants.