### Recent Submissions

• #### Chromatin phosphoproteomics unravels a function for AT-hook motif nuclear localized protein AHL13 in PAMP-triggered immunity

(Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, 2021-01-08) [Article]
In many eukaryotic systems during immune responses, mitogen-activated protein kinases (MAPKs) link cytoplasmic signaling to chromatin events by targeting transcription factors, chromatin remodeling complexes, and the RNA polymerase machinery. So far, knowledge on these events is scarce in plants and no attempts have been made to focus on phosphorylation events of chromatin-associated proteins. Here we carried out chromatin phosphoproteomics upon elicitor-induced activation of Arabidopsis. The events in WT were compared with those in mpk3, mpk4, and mpk6 mutant plants to decipher specific MAPK targets. Our study highlights distinct signaling networks involving MPK3, MPK4, and MPK6 in chromatin organization and modification, as well as in RNA transcription and processing. Among the chromatin targets, we characterized the AT-hook motif containing nuclear localized (AHL) DNA-binding protein AHL13 as a substrate of immune MAPKs. AHL13 knockout mutant plants are compromised in pathogen-associated molecular pattern (PAMP)-induced reactive oxygen species production, expression of defense genes, and PAMP-triggered immunity. Transcriptome analysis revealed that AHL13 regulates key factors of jasmonic acid biosynthesis and signaling and affects immunity toward Pseudomonas syringae and Botrytis cinerea pathogens. Mutational analysis of the phosphorylation sites of AHL13 demonstrated that phosphorylation regulates AHL13 protein stability and thereby its immune functions.
• #### Transcriptomic analysis identifies organ-specific metastasis genes and pathways across different primary sites.

(Journal of translational medicine, Springer Science and Business Media LLC, 2021-01-08) [Article]
BackgroundMetastasis is the most devastating stage of cancer progression and often shows a preference for specific organs.MethodsTo reveal the mechanisms underlying organ-specific metastasis, we systematically analyzed gene expression profiles for three common metastasis sites across all available primary origins. A rank-based method was used to detect differentially expressed genes between metastatic tumor tissues and corresponding control tissues. For each metastasis site, the common differentially expressed genes across all primary origins were identified as organ-specific metastasis genes.ResultsPathways enriched by these genes reveal an interplay between the molecular characteristics of the cancer cells and those of the target organ. Specifically, the neuroactive ligand-receptor interaction pathway and HIF-1 signaling pathway were found to have prominent roles in adapting to the target organ environment in brain and liver metastases, respectively. Finally, the identified organ-specific metastasis genes and pathways were validated using a primary breast tumor dataset. Survival and cluster analysis showed that organ-specific metastasis genes and pathways tended to be expressed uniquely by a subgroup of patients having metastasis to the target organ, and were associated with the clinical outcome.ConclusionsElucidating the genes and pathways underlying organ-specific metastasis may help to identify drug targets and develop treatment strategies to benefit patients.
• #### Molecular basis for the adaptive evolution of environment sensing by H-NS proteins

(eLife, eLife Sciences Publications, Ltd, 2021-01-07) [Article]
The DNA-binding protein H-NS is a pleiotropic gene regulator in gram-negative bacteria. Through its capacity to sense temperature and other environmental factors, H-NS allows pathogens like Salmonella to adapt their gene expression to their presence inside or outside warm-blooded hosts. To investigate how this sensing mechanism may have evolved to fit different bacterial lifestyles, we compared H-NS orthologs from bacteria that infect humans, plants, and insects, and from bacteria that live on a deep-sea hypothermal vent. The combination of biophysical characterization, high-resolution proton-less NMR spectroscopy and molecular simulations revealed, at an atomistic level, how the same general mechanism was adapted to specific habitats and lifestyles. In particular, we demonstrate how environment-sensing characteristics arise from specifically positioned intra- or intermolecular electrostatic interactions. Our integrative approach clarified the exact modus operandi for H-NS–mediated environmental sensing and suggests that this sensing mechanism resulted from the exaptation of an ancestral protein feature.
• #### Engineered Microgels—Their Manufacturing and Biomedical Applications

(Micromachines, MDPI AG, 2021-01-01) [Article]
Microgels are hydrogel particles with diameters in the micrometer scale that can be fabricated in different shapes and sizes. Microgels are increasingly used for biomedical applications and for biofabrication due to their interesting features, such as injectability, modularity, porosity and tunability in respect to size, shape and mechanical properties. Fabrication methods of microgels are divided into two categories, following a top-down or bottom-up approach. Each approach has its own advantages and disadvantages and requires certain sets of materials and equipments. In this review, we discuss fabrication methods of both top-down and bottom-up approaches and point to their advantages as well as their limitations, with more focus on the bottom-up approaches. In addition, the use of microgels for a variety of biomedical applications will be discussed, including microgels for the delivery of therapeutic agents and microgels as cell carriers for the fabrication of 3D bioprinted cell-laden constructs. Microgels made from well-defined synthetic materials with a focus on rationally designed ultrashort peptides are also discussed, because they have been demonstrated to serve as an attractive alternative to much less defined naturally derived materials. Here, we will emphasize the potential and properties of ultrashort self-assembling peptides related to microgels.
• #### A Siamese neural network model for the prioritization of metabolic disorders by integrating real and simulated data.

(Bioinformatics (Oxford, England), Oxford University Press (OUP), 2020-12-31) [Article]
MotivationUntargeted metabolomic approaches hold a great promise as a diagnostic tool for inborn errors of metabolisms (IEMs) in the near future. However, the complexity of the involved data makes its application difficult and time consuming. Computational approaches, such as metabolic network simulations and machine learning, could significantly help to exploit metabolomic data to aid the diagnostic process. While the former suffers from limited predictive accuracy, the latter is normally able to generalize only to IEMs for which sufficient data are available. Here, we propose a hybrid approach that exploits the best of both worlds by building a mapping between simulated and real metabolic data through a novel method based on Siamese neural networks (SNN).ResultsThe proposed SNN model is able to perform disease prioritization for the metabolic profiles of IEM patients even for diseases that it was not trained to identify. To the best of our knowledge, this has not been attempted before. The developed model is able to significantly outperform a baseline model that relies on metabolic simulations only. The prioritization performances demonstrate the feasibility of the method, suggesting that the integration of metabolic models and data could significantly aid the IEM diagnosis process in the near future.Availability and implementationMetabolic datasets used in this study are publicly available from the cited sources. The original data produced in this study, including the trained models and the simulated metabolic profiles, are also publicly available (Messa et al., 2020).
• #### Quantum-based interval selection of the Semi-classical Signal Analysis method

(IEEE, 2020-12-18) [Conference Paper]
Semi-classical Signal Analysis (SCSA) is a signal representation algorithm utilizing the Schrödinger eigenvalue problem. The algorithm has found many applications, from signal processing to machine learning and denoising due to its adaptive and localized nature. So far, the algorithm’s design parameter was tuned heuristically, without using the knowledge of the quantum mechanical principles residing in the SCSA formulation. In this work, we extend the SCSA framework by calculating the bounds of the reconstruction parameter. The derived bounds are effectively the sampling theorem for SCSA, which is of paramount importance for the application of the theory. Moreover, guidelines towards an optimal choice of the parameter are provided, eliminating the heuristic scanning step.
• #### KAUST Metagenomic Analysis Platform (KMAP), Enabling Access to Massive Analytics of Re-Annotated Metagenomic Data.

(Research Square, 2020-12-14) [Preprint]
Abstract Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. However, this also generates a procedural bottleneck for on-going re-analysis as reference databases grow and methods improve, and analyses need be updated for consistency, which require acceess to increasingly demanding bioinformatic and computational resources. Here, we present the KAUST Metagenomic Analysis Platform (KMAP), a new integrated open web-based tool for the comprehensive exploration of shotgun metagenomic data. We illustrate the capacities KMAP provides through the re-assembly of ~27,000 public metagenomic samples captured in ~450 studies sampled across ~77 diverse habitats, resulting in 36 new habitat-specific gene catalogs, all based on full-length (complete) genes. Extensive taxonomic and gene annotations are stored in Gene Information Tables (GITs), a simple tractable data integration format useful for analysis through command line or for database management. KMAP facilitates the exploration and comparison of microbial GITs across different habitats with over 275 million genes.
• #### Exploring binary relations for ontology extension and improved adaptation to clinical text

(Cold Spring Harbor Laboratory, 2020-12-05) [Preprint]
• #### Indigenous Arabs have an intermediate frequency of a Neanderthal-derived COVID-19 risk haplotype compared with other world populations.

(Clinical genetics, Wiley, 2020-11-27) [Article]
SARS-CoV-2 has been identified as the cause of an ongoing pandemic (COVID-19) that has infected more than 25 m individuals and caused more than 1 m deaths worldwide (WHO). The highly variable clinical course despite a relatively stable viral genome strongly implicates host factors, including genetics. Mendelian large effect variants have recently been identified although these likely account for a very small number of cases.1 On the other hand, the contribution of several common variants has been demonstrated,2 particularly one locus on chr3 which was identified in the first major GWAS on genetic predisposition to severe COVID19.3 Interestingly, a very recent study has convincingly shown that the risk haplotype in the chr3 locus was introgressed into modern humans from Neanderthal.4 The distribution of this risk haplotype was estimated for a wide range of human populations although Middle Eastern Arabs were missing.4 Here, we calculate the distribution of the risk haplotype in Arabia and discuss that in the context of the overall Neanderthal ancestry in the local population. Representative samples from the major indigenous tribes in Arabia were chosen for analysis with informed consent and genotyped as described in detail elsewhere (Mineta et al, 2020).5 We first confirmed by Haploview that the 13 SNPs that constitute the risk haplotype are in complete LD in indigenous Arabs using previously published WGS data. We then tested the frequency of rs13078854 in 953 samples representing the 28 major indigenous tribes in Arabia. The overall risk allele frequency was 8.6% with 135 heterozygotes and 14 homozygotes (of note, homozygotes had only been documented among South Asians [~10%] and 1 individual in Colombia). As shown in Figure 1, the distribution was largely similar between the different regions.
• #### Recessive, Deleterious Variants in SMG8 Expand the Role of Nonsense-Mediated Decay in Developmental Disorders in Humans.

(American journal of human genetics, Elsevier BV, 2020-11-25) [Article]
We have previously described a heart-, eye-, and brain-malformation syndrome caused by homozygous loss-of-function variants in SMG9, which encodes a critical component of the nonsense-mediated decay (NMD) machinery. Here, we describe four consanguineous families with four different likely deleterious homozygous variants in SMG8, encoding a binding partner of SMG9. The observed phenotype greatly resembles that linked to SMG9 and comprises severe global developmental delay, microcephaly, facial dysmorphism, and variable congenital heart and eye malformations. RNA-seq analysis revealed a general increase in mRNA expression levels with significant overrepresentation of core NMD substrates. We also identified increased phosphorylation of UPF1, a key SMG1-dependent step in NMD, which most likely represents the loss of SMG8--mediated inhibition of SMG1 kinase activity. Our data show that SMG8 and SMG9 deficiency results in overlapping developmental disorders that most likely converge mechanistically on impaired NMD.
• #### A single neuron subset governs a single coactive neuron circuit in Hydra vulgaris , representing a prototypic feature of neural evolution

(Cold Spring Harbor Laboratory, 2020-11-23) [Preprint]
The last common ancestor of Bilateria and Cnidaria is believed to be one of the first animals to develop a nervous system over 500 million years ago. Many of the genes involved in the neural function of the advanced nervous system in Bilateria are well conserved in Cnidaria. Thus, Cnidarian representative species, Hydra, is considered to be a living fossil and a good model organism for the study of the putative primitive nervous system in its last common ancestor. The diffuse nervous system of Hydra consists of several peptidergic neuron subsets. However, the specific functions of these subsets remain unclear. Using calcium imaging, here we show that the neuron subsets that express neuropeptide, Hym-176 function as motor neurons to evoke longitudinal contraction. We found that all neurons in a subset defined by the Hym-176 gene (Hym-176A) or its paralogs (Hym-176B) expression are excited simultaneously, which is then followed by longitudinal contraction. This indicates not only that these neuron subsets are motor neurons but also that a single molecularly defined neuron subset forms a single coactive motor circuit. This is in contrast with the Bilaterian nervous system, where a single molecularly defined neuron subset harbors multiple coactive circuits, showing a mixture of neurons firing with different timings. Furthermore, we found that the two motor circuits, one expressing Hym-176B in the body column and the other expressing Hym-176A in the foot, are coordinately regulated to exert region-specific contraction. Our results demonstrate that one neuron subset is likely to form a monofunctional circuit as a minimum functional unit to build a more complex behavior in Hydra. We propose that this simple feature (one subset, one circuit, one function) found in Hydra is a fundamental trait of the primitive nervous system.
• #### DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier

(PLOS Computational Biology, Public Library of Science (PLoS), 2020-11-18) [Article]
Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases.
• #### PATHcre8: A Tool That Facilitates the Searching for Heterologous Biosynthetic Routes

(ACS Synthetic Biology, American Chemical Society (ACS), 2020-11-16) [Article]
Developing computational tools that can facilitate the rational design of cell factories producing desired products at increased yields is challenging, as the tool needs to take into account that the preferred host organism usually has compounds that are consumed by competing reactions that reduce the yield of the desired product. On the other hand, the preferred host organisms may not have the native metabolic reactions needed to produce the compound of interest; thus, the computational tool needs to identify the metabolic reactions that will most efficiently produce the desired product. In this regard, we developed the generic tool PATHcre8 to facilitate an optimized search for heterologous biosynthetic pathway routes. PATHcre8 finds and ranks biosynthesis routes in a large number of organisms, including Cyanobacteria. The tool ranks the pathways based on feature scores that reflect reaction thermodynamics, the potentially toxic products in the pathway (compound toxicity), intermediate products in the pathway consumed by competing reactions (product consumption), and host-specific information such as enzyme copy number. A comparison with several other similar tools shows that PATHcre8 is more efficient in ranking functional pathways. To illustrate the effectiveness of PATHcre8, we further provide case studies focused on isoprene production and the biodegradation of cocaine. PATHcre8 is free for academic and nonprofit users and can be accessed at https://www.cbrc.kaust.edu.sa/pathcre8/.
• #### A nanobody-functionalized organic electrochemical transistor for the rapid detection of SARS-CoV-2 or MERS antigens at the physical limit

(Cold Spring Harbor Laboratory, 2020-11-13) [Preprint]
The COVID-19 pandemic highlights the need for rapid protein detection and quantification at the single-molecule level in a format that is simple and robust enough for widespread point-of-care applications. We here introduce a modular nanobody-organic electrochemical transistor architecture that enables the fast and specific detection and quantification of single-molecule to nanomolar protein antigen concentrations in complex bodily fluids. The sensor combines a new solution-processable organic semiconductor material in the transistor channel with the high-density and orientation-controlled bioconjugation of nanobody fusion proteins on disposable gate electrodes. It provides results after a 10 minutes exposure to 5 μL of unprocessed samples, maintains high specificity and single-molecule sensitivity in human saliva or serum, and is rapidly reprogrammed towards any protein target for which nanobodies exist. We demonstrate the use of this highly modular platform for the detection of green fluorescent protein, SARS-CoV-1/2, and MERS-CoV spike proteins and validate the sensor for COVID-19 screening in unprocessed clinical nasopharyngeal swab and saliva samples.
• #### “What Doesn’t Kill You Makes You Stronger”: Future Applications of Amyloid Aggregates in Biomedicine

(Molecules, MDPI AG, 2020-11-11) [Article]
Amyloid proteins are linked to the pathogenesis of several diseases including Alzheimer’s disease, but at the same time a range of functional amyloids are physiologically important in humans. Although the disease pathogenies have been associated with protein aggregation, the mechanisms and factors that lead to protein aggregation are not completely understood. Paradoxically, unique characteristics of amyloids provide new opportunities for engineering innovative materials with biomedical applications. In this review, we discuss not only outstanding advances in biomedical applications of amyloid peptides, but also the mechanism of amyloid aggregation, factors affecting the process, and core sequences driving the aggregation. We aim with this review to provide a useful manual for those who engineer amyloids for innovative medicine solutions.
• #### Succinic semialdehyde dehydrogenase deficiency presenting with central hypothyroidism

(Clinical Case Reports, Wiley, 2020-11-11) [Article]
Central hypothyroidism might be another clinical sign of SSADH deficiency which prompts urinary organic acid screening for GHB in central hypothyroidism patients. Studies on GABA and thyroid hormone interaction might be a concept of a new therapy.
• #### Few-shot learning for classification of novel macromolecular structures in cryo-electron tomograms

(PLOS Computational Biology, Public Library of Science (PLoS), 2020-11-11) [Article]
Cryo-electron tomography (cryo-ET) provides 3D visualization of subcellular components in the near-native state and at sub-molecular resolutions in single cells, demonstrating an increasingly important role in structural biology in situ. However, systematic recognition and recovery of macromolecular structures in cryo-ET data remain challenging as a result of low signal-to-noise ratio (SNR), small sizes of macromolecules, and high complexity of the cellular environment. Subtomogram structural classification is an essential step for such task. Although acquisition of large amounts of subtomograms is no longer an obstacle due to advances in automation of data collection, obtaining the same number of structural labels is both computation and labor intensive. On the other hand, existing deep learning based supervised classification approaches are highly demanding on labeled data and have limited ability to learn about new structures rapidly from data containing very few labels of such new structures. In this work, we propose a novel approach for subtomogram classification based on few-shot learning. With our approach, classification of unseen structures in the training data can be conducted given few labeled samples in test data through instance embedding. Experiments were performed on both simulated and real datasets. Our experimental results show that we can make inference on new structures given only five labeled samples for each class with a competitive accuracy (> 0.86 on the simulated dataset with SNR = 0.1), or even one sample with an accuracy of 0.7644. The results on real datasets are also promising with accuracy > 0.9 on both conditions and even up to 1 on one of the real datasets. Our approach achieves significant improvement compared with the baseline method and has strong capabilities of generalizing to other cellular components.
• #### Cover Image: Novel tumour suppressor roles for GZMA and RASGRP1 in Theileria annulata-transformed macrophages and human B lymphoma cells (Cellular Microbiology 12/2020)

(Cellular Microbiology, Wiley, 2020-11-05) [Article]
Theileria annulata is a tick-transmitted apicomplexan parasite that infects and transforms bovine leukocytes into disseminating tumours that cause a disease called tropical theileriosis. Using comparative transcriptomics we identified genes transcriptionally perturbed during Theileria-induced leukocyte transformation. Dataset comparisons highlighted a small set of genes associated with Theileria-transformed leukocyte dissemination. The roles of Granzyme A (GZMA) and RAS guanyl-releasing protein 1 (RASGRP1) were verified by CRISPR/Cas9-mediated knockdown. Knocking down expression of GZMA and RASGRP1 in attenuated macrophages led to a regain in their dissemination in Rag2/γC mice confirming their role as dissemination suppressors in vivo. We further evaluated the roles of GZMA and RASGRP1 in human B lymphomas by comparing the transcriptome of 934 human cancer cell lines to that of Theileria-transformed bovine host cells. We confirmed dampened dissemination potential of human B lymphomas that overexpress GZMA and RASGRP1. Our results provide evidence that GZMA and RASGRP1 have a novel tumour suppressor function in both T. annulata-infected bovine host leukocytes and in human B lymphomas.
• #### Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species

(PLOS Computational Biology, Public Library of Science (PLoS), 2020-11-05) [Article]
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use three species and build cross-species training sets with two of them and evaluate the performance of the remaining one. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
• #### Exponential increase of plastic burial in mangrove sediments as a major plastic sink

(Science Advances, American Association for the Advancement of Science (AAAS), 2020-10-28) [Article]
Sequestration of plastics in sediments is considered the ultimate sink of marine plastic pollution that would justify unexpectedly low loads found in surface waters. Here, we demonstrate that mangroves, generally supporting high sediment accretion rates, efficiently sequester plastics in their sediments. To this end, we extracted microplastics from dated sediment cores of the Red Sea and Arabian Gulf mangrove (Avicennia marina) forests along the Saudi Arabian coast. We found that microplastics <0.5 mm dominated in mangrove sediments, helping explain their scarcity, in surface waters. We estimate that 50 ± 30 and 110 ± 80 metric tons of plastic may have been buried since the 1930s in mangrove sediments across the Red Sea and the Arabian Gulf, respectively. We observed an exponential increase in the plastic burial rate (8.5 ± 1.2% year$^{−1}$) since the 1950s in line with the global plastic production increase, confirming mangrove sediments as long-term sinks for plastics.