Recent Submissions

  • Submarine optical fiber communication provides an unrealized deep-sea observation network

    Guo, Yujian; Marin, Juan M.; Ashry, Islam; Trichili, Abderrahmen; Havlik, Michelle-Nicole; Ng, Tien Khee; Duarte, Carlos M.; Ooi, Boon S. (Scientific Reports, Springer Science and Business Media LLC, 2023-09-18) [Article]
    Oceans are crucial to human survival, providing natural resources and most of the global oxygen supply, and are responsible for a large portion of worldwide economic development. Although it is widely considered a silent world, the sea is filled with natural sounds generated by marine life and geological processes. Man-made underwater sounds, such as active sonars, maritime traffic, and offshore oil and mineral exploration, have significantly affected underwater soundscapes and species. In this work, we report on a joint optical fiber-based communication and sensing technology aiming to reduce noise pollution in the sea while providing connectivity simultaneously with a variety of underwater applications. The designed multifunctional fiber-based system enables two-way data transfer, monitoring marine life and ship movement near the deployed fiber at the sea bottom and sensing temperature. The deployed fiber is equally harnessed to transfer energy that the internet of underwater things (IoUTs) devices can harvest. The reported approach significantly reduces the costs and effects of monitoring marine ecosystems while ensuring data transfer and ocean monitoring applications and providing continuous power for submerged IoUT devices.
  • Dipeptide-Based Photoreactive Instant Glue for Environmental and Biomedical Applications

    Bilalis, Panagiotis; Alrashoudi, Abdulelah Α.; Susapto, Hepi Hari; Moretti, Manola; Alshehri, Salwa; Abdelrahman, Sherin; Elsakran, Amr; Hauser, Charlotte (Accepted by ACS Applied Materials and Interfaces, 2023-09-13) [Article]
    Nature-inspired smart materials offer numerous advantages over environment-friendliness and efficiency. Emulating the excellent adhesive properties of mussels foot proteins, where the Lysine is in close proximity with the 3,4-dihydroxy-L-phenylalanine (DOPA), we report the synthesis of a novel photo-curable peptide-based adhesive consisting exclusively of these two amino acids. Our adhesive is a highly concentrated aqueous solution of a monomer, a crosslinker and a photoinitiator. Lap-shear adhesion measurements on plastic and glass surfaces and comparison with different types of commercial adhesives showed that the adhesive strength of our glue is comparable when applied in the air and superior when used underwater. No toxicity of our adhesive was observed when the cytocompatibility on human dermal fibroblast cells was assessed. Preliminary experiments with various tissues and coral fragments showed that our adhesive could be applied to wound healing and coral reef restoration. Given the convenience of the facile synthesis, biocompatibility, ease of application underwater and high adhesive strength, we expect that our adhesive may find application, but not limited, to the biomedical and environmental field.
  • Evaluation of Potential Peptide-Based Inhibitors Against SARS-CoV-2 and Variants of Concern

    Boshah, Hattan; Samkari, Faris; Valle Pérez, Alexander Uriel; Alsawaf, Sarah; Aldoukhi, Ali; Bilalis, Panagiotis; Alshehri, Salwa; Susapto, Hepi Hari; Hauser, Charlotte (Accepted by BioMed Research International, 2023-09-11) [Article]
    The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has greatly affected all aspect of life. Although several vaccines and pharmaceuticals have been developed against SARS-CoV-2, the emergence of mutated variants has raised several concerns. The angiotensin-converting enzyme (ACE2) receptor cell entry mechanism of this virus has not changed despite the vast mutation in emerging variants. Inhibiting the spike protein by which the virus identifies the host ACE2 receptor is a promising therapeutic countermeasure to keep pace with rapidly emerging variants. Here, we synthesized two ACE2-derived peptides, P1 and P25, to target and potentially inhibit SARS-CoV-2 cell entry. These peptides were evaluated in vitro using pseudoviruses that contained the SARS-CoV-2 original spike protein, the Delta mutated spike protein, or the Omicron spike protein. An in silico investigation was also done for these peptides to evaluate the interaction of the synthesized peptides and the SARS-CoV-2 variants. The P25 peptide showed a promising inhibition potency against the tested pseudoviruses and an even higher inhibition against the Omicron variant. The IC50 of the Omicron variant was 60.8 µM, while the IC50s of the SARS-CoV-2 original strain and the Delta variant were 455.2 µM and 546.4 µM, respectively. The in silico experiments also showed that the amino acid composition design and structure of P25 boosted the interaction with the spike protein. These findings suggest that ACE2- derived peptides, such as P25, have the potential to inhibit SARS-CoV-2 cell entry in vitro. However, further in vivo studies are needed to confirm their therapeutic efficacy against emerging variants.
  • Model-based versus model-free feeding control and water-quality monitoring for fish-growth tracking in aquaculture systems

    Aljehani, Fahad; Ndoye, Ibrahima; Laleg-Kirati, Taous-Meriem (IFAC Journal of Systems and Control, Elsevier BV, 2023-09-03) [Article]
    This paper proposes model-based and model-free control approaches to monitor the feeding rate and water quality for fish-growth tracking in aquaculture systems. The representative fish-growth model is revisited, which describes the total biomass change by incorporating the fish population density and mortality. Due to the challenging task of measuring the total fish biomass and population data, the new dynamic population model is validated with individual fish-growth data for tracking control. Ammonia exposure is a significant challenge in the fish-population growth tracking problem, affecting fish health and survival. To address this challenge, traditional and optimal controllers are first designed to track the weight reference within suboptimal temperature and dissolved oxygen (DO) profiles under various un-ionized ammonia (UIA) exposure levels by manipulating relative feeding. Then, a Q-learning approach is proposed to learn an optimal feeding-control policy from simulated data on fish-growth weight trajectories while managing ammonia effects. The proposed Q-learning feeding control prevents fish mortality and achieves good tracking errors for fish weight under UIA levels. However, it maintains a relative food consumption that potentially underfeeds fish. Finally, an optimal predictive algorithm that includes the temperature, DO, and UIA is proposed to optimize the feeding and water quality of the dynamic fish-population growth process, indicating that fish mortality is decreased and food consumption is reduced in all cases of UIA exposure.
  • Top abundant deep ocean heterotrophic bacteria can be retrieved by cultivation

    Sanz-Saez, Isabel; Sanchez, Pablo; Salazar, Guillem; Sunagawa, Shinichi; de Vargas, Colomban; Bowler, Chris; Sullivan, Matthew B.; Wincker, Patrick; Karsenti, Eric; Pedrós-Alió, Carlos; Agusti, Susana; Gojobori, Takashi; Duarte, Carlos M.; Gasol, Josep M.; Sánchez, Olga; Acinas, Silvia G (ISME Communications, Springer Science and Business Media LLC, 2023-09-02) [Article]
    Traditional culture techniques usually retrieve a small fraction of the marine microbial diversity, which mainly belong to the so-called rare biosphere. However, this paradigm has not been fully tested at a broad scale, especially in the deep ocean. Here, we examined the fraction of heterotrophic bacterial communities in photic and deep ocean layers that could be recovered by culture-dependent techniques at a large scale. We compared 16S rRNA gene sequences from a collection of 2003 cultured heterotrophic marine bacteria with global 16S rRNA metabarcoding datasets (16S TAGs) covering surface, mesopelagic and bathypelagic ocean samples that included 16 of the 23 samples used for isolation. These global datasets represent 60 322 unique 16S amplicon sequence variants (ASVs). Our results reveal a significantly higher proportion of isolates identical to ASVs in deeper ocean layers reaching up to 28% of the 16S TAGs of the bathypelagic microbial communities, which included the isolation of 3 of the top 10 most abundant 16S ASVs in the global bathypelagic ocean, related to the genera Sulfitobacter, Halomonas and Erythrobacter. These isolates contributed differently to the prokaryotic communities across different plankton size fractions, recruiting between 38% in the free-living fraction (0.2–0.8 µm) and up to 45% in the largest particles (20–200 µm) in the bathypelagic ocean. Our findings support the hypothesis that sinking particles in the bathypelagic act as resource-rich habitats, suitable for the growth of heterotrophic bacteria with a copiotroph lifestyle that can be cultured, and that these cultivable bacteria can also thrive as free-living bacteria.
  • Cross-parametric generative adversarial network-based magnetic resonance image feature synthesis for breast lesion classification

    Fan, Ming; Huang, Guangyao; Lou, Junhong; Gao, Xin; Zeng, Tieyong; Li, Lihua (IEEE Journal of Biomedical and Health Informatics, Institute of Electrical and Electronics Engineers (IEEE), 2023-09-01) [Article]
    Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) contains information on tumor morphology and physiology for breast cancer diagnosis and treatment. However, this technology requires contrast agent injection with more acquisition time than other parametric images, such as T2-weighted imaging (T2WI). Current image synthesis methods attempt to map the image data from one domain to another, whereas it is challenging or even infeasible to map the images with one sequence into images with multiple sequences. Here, we propose a new approach of cross-parametric generative adversarial network (GAN)-based feature synthesis (CPGANFS) to generate discriminative DCE-MRI features from T2WI with applications in breast cancer diagnosis. The proposed approach decodes the T2W images into latent cross-parameter features to reconstruct the DCE-MRI and T2WI features by balancing the information shared between the two. A Wasserstein GAN with a gradient penalty is employed to differentiate the T2WI-generated features from ground-truth features extracted from DCE-MRI. The synthesized DCE-MRI feature-based model achieved significantly (p = 0.036) higher prediction performance (AUC = 0.866) in breast cancer diagnosis than that based on T2WI (AUC = 0.815). Visualization of the model shows that our CPGANFS method enhances the predictive power by levitating attention to the lesion and the surrounding parenchyma areas, which is driven by the interparametric information learned from T2WI and DCE-MRI. Our proposed CPGANFS provides a framework for cross-parametric MR image feature generation from a single-sequence image guided by an information-rich, time-series image with kinetic information. Extensive experimental results demonstrate its effectiveness with high interpretability and improved performance in breast cancer diagnosis.
  • SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions

    Li, Xiaokun; Yang, Qiang; Luo, Gongning; Xu, Long; Dong, Weihe; Wang, Wei; Dong, Suyu; Wang, Kuanquan; Xuan, Ping; Gao, Xin (Bioinformatics Advances, Oxford University Press (OUP), 2023-08-26) [Article]
    Motivation: Accurate identification of target proteins that interact with drugs is a vital step in silico, which can significantly foster the development of drug repurposing and drug discovery. In recent years, numerous deep learning-based methods have been introduced to treat drug-target interaction (DTI) prediction as a classification task. The output of this task is binary identification suggesting the absence or presence of interactions. However, existing studies often (i) neglect the unique molecular attributes when embedding drugs and proteins, and (ii) determine the interaction of drug-target pairs without considering biological interaction information. Results: In this study, we propose an end-to-end attention-derived method based on the self-attention mechanism and graph neural network, termed SAGDTI. The aim of this method is to overcome the aforementioned drawbacks in the identification of DTI interaction. SAGDTI is the first method to sufficiently consider the unique molecular attribute representations for both drugs and targets in the input form of the SMILES sequences and three-dimensional structure graphs. In addition, our method aggregates the feature attributes of biological information between drugs and targets through multi-scale topologies and diverse connections. Experimental results illustrate that SAGDTI outperforms existing prediction models, which benefit from the unique molecular attributes embedded by atom-level attention and biological interaction information representation aggregated by node-level attention. Moreover, a case study on SARS-CoV-2 shows that our model is a powerful tool for identifying DTI interactions in real life.
  • MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning

    Ma, Jiani; Li, Chen; Zhang, Yiwen; Wang, Zhikang; Li, Shanshan; Guo, Yuming; Zhang, Lin; Liu, Hui; Gao, Xin; Song, Jiangning (Bioinformatics, Oxford University Press (OUP), 2023-08-23) [Article]
    Motivation: Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization and reliable negative sample selection, remain to be addressed. Results: To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new “guilty-by-association”-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.
  • AI identifies potent inducers of breast cancer stem cell differentiation based on adversarial learning from gene expression data

    Li, Zhongxiao; Napolitano, Antonella; Fedele, Monica; Gao, Xin; Napolitano, Francesco (Cold Spring Harbor Laboratory, 2023-08-22) [Preprint]
    Cancer stem cells (CSCs) are a subpopulation of cancer cells within tumors that exhibit stem-like properties, and represent a potentially effective therapeutic target towards long-term remission by means of differentiation induction. By leveraging an Artificial Intelligence (AI) approach solely based on transcriptomics data, this study scored a large library of small molecules based on their predicted ability to induce differentiation in stem-like cells. In particular, a deep neural network model was trained using publicly available single-cell RNA-Seq data obtained from untreated human induced pluripotent stem cells at various differentiation stages and subsequently utilized to screen drug-induced gene expression profiles from the LINCS database. The challenge of adapting such different data domains was tackled by devising an adversarial learning approach that was able to effectively identify and remove domain-specific bias during the training phase. Experimental validation in MDA-MB-231 and MCF7 cells demonstrated the efficacy of 5 out of 6 tested molecules among those scored highest by the model. In particular, the efficacy of triptolide, OTS-167, quinacrine, granisetron, and A-443654 offer a potential avenue for targeted therapies against breast CSCs.
  • Deep-Learning–Based Screening and Ancillary Testing for Thyroid Cytopathology

    Dov, David; Elliott Range, Danielle; Cohen, Jonathan; Bell, Jonathan; Rocke, Daniel J.; Kahmke, Russel R.; Weiss-Meilik, Ahuva; Lee, Walter T.; Henao, Ricardo; Carin, Lawrence; Kovalsky, Shahar Z. (The American Journal of Pathology, Elsevier BV, 2023-08-21) [Article]
    Thyroid cancer is the most common malignant endocrine tumor. The key test to assess preoperative risk of malignancy is cytologic evaluation of fine-needle aspiration biopsies (FNABs). The evaluation findings can often be indeterminate, leading to unnecessary surgery for benign post-surgical diagnoses. We have developed a deep-learning algorithm to analyze thyroid FNAB whole-slide images (WSIs). We show, on the largest reported data set of thyroid FNAB WSIs, clinical-grade performance in the screening of determinate cases and indications for its use as an ancillary test to disambiguate indeterminate cases. The algorithm screened and definitively classified 45.1% (130/288) of the WSIs as either benign or malignant with risk of malignancy rates of 2.7% and 94.7%, respectively. It reduced the number of indeterminate cases (N = 108) by reclassifying 21.3% (N = 23) as benign with a resultant risk of malignancy rate of 1.8%. Similar results were reproduced using a data set of consecutive FNABs collected during an entire calendar year, achieving clinically acceptable margins of error for thyroid FNAB classification.
  • Deep Learning Enhanced Tandem Repeat Variation Identification via Multi-Modal Conversion of Nanopore Reads Alignment

    Liao, Xingyu; Zhou, Juexiao; Zhang, Bin; Li, Xingyi; Xu, Xiaopeng; Li, Haoyang; Gao, Xin (Cold Spring Harbor Laboratory, 2023-08-20) [Preprint]
    Identification of tandem repeat (TR) variations plays a crucial role in advancing our understanding of genetic diseases, forensic analysis, evolutionary studies, and crop improvement, thereby contributing to various fields of research and practical applications. However, traditional TR identification methods are often limited to processing genomes obtained through sequence assembly and cannot directly start detection from sequencing reads. Furthermore, the inflexibility of detection mode and parameters hinders the accuracy and completeness of the identification, rendering the results unsatisfactory. These shortcomings result in existing TR variation identification methods being associated with high computational cost, limited detection sensitivity, precision and comprehensiveness. Here, we propose DeepTRs, a novel method for identifying TR variations, which enables direct TR variation identification from raw Nanopore sequencing reads and achieves high sensitivity, accuracy, and completeness results through the multi-modal conversion of Nanopore reads alignment and deep learning. Comprehensive evaluations demonstrate that DeepTRs outperform existing methods.
  • Wiskott-Aldrich Syndrome Protein Regulates Nucleolar Organization and Function in Innate Immune Response

    Zhou, Xuan; Yuan, Baolei; Tian, Yeteng; Zhou, Juexiao; Wang, Mengge; Shakir, Ismail; Zhang, Yingzi; Bi, Chongwei; Aljamal, Bayan Mohammed; Hashem, Mais Omar; Abuyousef, Omar Imad; Abdulwahab, Firdous Mohammed; Ali, Afshan; Dunn, Sarah; Moresco, James; Yates, John Robert; Frassoni, Francesco; Gao, Xin; Alkuraya, Fowzan S.; Belmonte, Juan Carlo Izpisua; Li, Mo (Cold Spring Harbor Laboratory, 2023-08-16) [Preprint]
    Wiskott-Aldrich syndrome (WAS) is a primary immunodeficiency disorder caused by the dysfunction of the WAS protein (WASP). Using an isogenic macrophage model derived from genome edited induced pluripotent stem cells we demonstrated that WASP functions in the nucleolus, which plays important roles in immune regulation. The absence of WASP resulted in smaller and misshapen nucleoli, decreased fibrillar center territory, and impaired ribosomal RNA (rRNA) transcription. The nucleolar and rRNA phenotypes were confirmed in WAS patient samples. Furthermore, WASP interacts with nucleolar proteins, including nucleophosmin 1 (NPM1) and fibrillarin (FBL). NPM1 deficiency is known to cause elevated cytokine expression following lipopolysaccharide (LPS) stimulation. Consistently, WASP deficient cells displayed lower levels of NPM1 and a heightened inflammatory cytokine response to LPS, which was rescued by overexpressing NPM1. Together, our research provides novel insights into the critical role of WASP in nucleolar function and the modulation of inflammatory cytokine production.
  • lzx325/DREDDA:

    Li, Zhongxiao; Napolitano, Antonella; Fedele, Monica; Gao, Xin; Napolitano, Francesco (Github, 2023-08-10) [Software]
  • Improving the classification of cardinality phenotypes using collections.

    Alghamdi, Sarah M.; Hoehndorf, Robert (Journal of biomedical semantics, Springer Science and Business Media LLC, 2023-08-07) [Article]
    MotivationPhenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena.ResultsWe reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
  • Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project

    Stenton, Sarah L.; O'Leary, Melanie; Lemire, Gabrielle; VanNoy, Grace E.; DiTroia, Stephanie; Ganesh, Vijay S.; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S.; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza Th.; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O.B.; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S.; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G.; Savojardo, Castrense; Schols, Peter; Shen, Yang; Sivadasan, Naveen; Smedley, Damian; Soru, Dorian; Srinivasan, Rajgopal; Sun, Yuanfei; Sunderam, Uma; Tan, Wuwei; Tiwari, Naina; Wang, Xiao; Wang, Yaqiong; Williams, Amanda; Worthey, Elizabeth A.; Yin, Rujie; You, Yuning; Zeiberg, Daniel; Zucca, Susanna; Bakolitsa, Constantina; Brenner, Steven E.; Fullerton, Stephanie M.; Radivojac, Predrag; Rehm, Heidi L.; O'Donnell-Luria, Anne� (Cold Spring Harbor Laboratory, 2023-08-05) [Preprint]
    Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.
  • Leveraging AI Advances and Online Tools for Structure-Based Variant Analysis.

    Guzmán-Vega, Francisco J.; Alvarez, Ana C. Gonzalez; Pena Guerra, Karla; Cardona-Londoño, Kelly J; Arold, Stefan T. (Current protocols, Wiley, 2023-08-04) [Article]
    Understanding how a gene variant affects protein function is important in life science, as it helps explain traits or dysfunctions in organisms. In a clinical setting, this understanding makes it possible to improve and personalize patient care. Bioinformatic tools often only assign a pathogenicity score, rather than providing information about the molecular basis for phenotypes. Experimental testing can furnish this information, but this is slow and costly and requires expertise and equipment not available in a clinical setting. Conversely, mapping a gene variant onto the three-dimensional (3D) protein structure provides a fast molecular assessment free of charge. Before 2021, this type of analysis was severely limited by the availability of experimentally determined 3D protein structures. Advances in artificial intelligence algorithms now allow confident prediction of protein structural features from sequence alone. The aim of the protocols presented here is to enable non-experts to use databases and online tools to investigate the molecular effect of a genetic variant. The Basic Protocol relies only on the online resources AlphaFold, Protein Structure Database, and UniProt. Alternate Protocols document the usage of the Protein Data Bank, SWISS-MODEL, ColabFold, and PyMOL for structure-based variant analysis.
  • Counterfactual Learning on Heterogeneous Graphs with Greedy Perturbation

    Qiang, Yang; Ma, Changsheng; Zhang, Qiannan; Gao, Xin; Zhang, Chuxu; Zhang, Xiangliang (ACM, 2023-08-04) [Conference Paper]
    Due to the growing importance of using graph neural networks in high-stakes applications, there is a pressing need to interpret the predicted results of these models. Existing methods for explanation have mainly focused on generating sub-graphs comprising important edges for a specific prediction. However, these methods face two issues. Firstly, they lack counterfactual validity as removing the subgraph may not affect the prediction, and generating plausible counterfactual examples has not been adequately explored. Secondly, they cannot be extended to heterogeneous graphs as the complex information involved in such graphs increases the difficulty of generating interpretations. This paper proposes a novel counterfactual learning method, named CF-HGExplainer, for heterogeneous graphs. The method incorporates a semantic-aware attentive pooling strategy for the heterogeneous graph classifier and designs a heterogeneous decision boundaries extraction module to find the common logic for similar graphs based on the extracted graph embeddings from the classifier. Additionally, we propose to greedily perturb nodes and edges based on the distribution of node features and edge plausibility to train a neural network for heterogeneous edge weight learning. Extensive experiments on two public academic datasets demonstrate the effectiveness of CF-HGExplainer compared to state-of-the-art methods on the graph classification task and graph interpretation task.
  • CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs)

    Aspromonte, Maria Cristina; Conte, Alessio Del; Zhu, Shaowen; Tan, Wuwei; Shen, Yang; Zhang, Yexian; Li, Qi; Wang, Maggie Haitian; Babbi, Giulia; Bovo, Samuele; Martelli, Pier Luigi; Casadio, Rita; Althagafi, Azza Th.; Toonsi, Sumyyah; Kulmanov, Maxat; Hoehndorf, Robert; Katsonis, Panagiotis; Williams, Amanda; Lichtarge, Olivier; Xian, Su; Surento, Wesley; Pejaver, Vikas; Mooney, Sean D.; Sunderam, Uma; Sriniva, Rajgopal; Murgia, Alessandra; Piovesan, Damiano; Tosatto, Silvio C. E.; Leonardi, Emanuela (Research Square Platform LLC, 2023-08-02) [Preprint]
    In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient’s phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
  • Fusing Peptide Epitopes for Advanced Multiplex Serological Testing for SARS-CoV-2 Antibody Detection

    Aldoukhi, Ali; Bilalis, Panagiotis; Alhattab, Dana Majed; PéreZ, Alexander Uriel U. Valle; Pérez, Hepi Hari; Perez Pedroza, Rosario; Backhoff-García, Emiliano; Alsawaf, Sarah; Alshehri, Salwa; Boshah, Hattan; Alrashoudi, Abdulelah; Aljabr, Waleed; Alaamery, Manal; Alrashed, May; Hasanato, Rana; Farzan, Raed; Alsubki, Roua; Moretti, Manola; Abedalthagafi, Malak; Hauser, Charlotte (Accepted by ACS Bio & Med Chem Au, 2023-08-01) [Article]
    The tragic COVID-19 pandemic, which has seen a total of 655 million cases worldwide and a death toll of over 6.6 million seems finally tailing off. Even so, new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continue to arise, the severity of which cannot be predicted in advance. This is concerning for the maintenance and stability public health, since immune evasion and increased transmissibility may arise. Therefore, it is crucial to continue monitoring antibody responses to SARS-CoV-2 in the general population. As a complement to polymerase chain reaction (PCR) tests, multiplex immunoassays are elegant tools that useindividual protein or peptide antigens simultaneously to provide a high level of sensitivity and specificity. To further improve these aspects of SARS-CoV-2 antibody detection, as well as accuracy, we have developed an advanced serological peptide-based multiplex assay using antigen-fused peptide epitopes derived from both, the spike and the nucleocapsid proteins. The significance of the epitopes selected for antibody detection has been verified by in silico molecular docking simulations between the peptide epitopes and reported SARS-CoV-2 antibodies. Peptides can be more easily and quickly modified and synthesized than full length proteins and can therefore be used in a more cost-effective manner. Three different fusion-epitope peptides (FEPs) were synthesized and tested by enzyme-linked immunosorbent assay (ELISA). A total of 145 blood serum samples were used, compromising 110 COVID-19 serum samples from COVID-19 patients and 35 negative control serum samples taken from COVID-19 free individuals before the outbreak. Interestingly, our data demonstrates that the sensitivity, specificity, and accuracy of the results for the FEP antigens are higher than for single peptide epitopes or mixtures of single peptide epitopes. Our FEP concept can be applied to different multiplex immunoassays testing not only for SARS CoV-2 but also for various other pathogens. A significantly improved peptide-based serological assay may support the development of commercial point-of-care tests, such as lateral-flow-assays (LFAs).
  • Stylized Projected GAN: A Novel Architecture for Fast and Realistic Image Generation

    Muttakin, Md Nurul; Sultan, Malik Shahid; Hoehndorf, Robert; Ombao, Hernando (arXiv, 2023-07-30) [Preprint]
    Generative Adversarial Networks are used for generating the data using a generator and a discriminator, GANs usually produce high-quality images, but training GANs in an adversarial setting is a difficult task. GANs require high computation power and hyper-parameter regularization for converging. Projected GANs tackle the training difficulty of GANs by using transfer learning to project the generated and real samples into a pre-trained feature space. Projected GANs improve the training time and convergence but produce artifacts in the generated images which reduce the quality of the generated samples, we propose an optimized architecture called Stylized Projected GANs which integrates the mapping network of the Style GANs with Skip Layer Excitation of Fast GAN. The integrated modules are incorporated within the generator architecture of the Fast GAN to mitigate the problem of artifacts in the generated images.

View more