AuthorsSlater, Luke T
Williams, John A
Schofield, Paul N
Gkoutos, Georgios V
KAUST DepartmentBio-Ontology Research Group (BORG)
Computational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
KAUST Grant NumberURF/1/3790-01-01.
Online Publication Date2021-09-27
Print Publication Date2021-11
Permanent link to this recordhttp://hdl.handle.net/10754/669299
MetadataShow full item record
AbstractIdentification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.
CitationSlater, L. T., Williams, J. A., Karwath, A., Fanning, H., Ball, S., Schofield, P. N., … Gkoutos, G. V. (2021). Multi-faceted semantic clustering with text-derived phenotypes. Computers in Biology and Medicine, 138, 104904. doi:10.1016/j.compbiomed.2021.104904
SponsorsGVG and LTS acknowledge support from support from the NIHR Birmingham ECMC, NIHR Birmingham SRMRC, Nanocommons H2020-EU (731032) and the NIHR Birmingham Biomedical Research Centre and the MRC HDR UK (HDRUK/CFC/01), an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, the Medical Research Council or the Department of Health. RH, PNS and GVG were supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3790-01-01. AK was supported by by the Medical Research Council (MR/S003991/1) and the MRC HDR UK (HDRUK/CFC/01). PNS and GVG acknowledge the support of the Alan Turing Institute, UK
Except where otherwise noted, this item's license is described as This is an open access article under the CC BY-NC-ND license.
- Klarigi: Characteristic explanations for semantic biomedical data.
- Authors: Slater LT, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV
- Issue date: 2023 Feb
- An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering.
- Authors: Li M, Chen T, Ryu KH, Jin CH
- Issue date: 2021
- Evaluating semantic similarity methods for comparison of text-derived phenotype profiles.
- Authors: Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV
- Issue date: 2022 Feb 5
- Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study.
- Authors: Deng L, Chen L, Yang T, Liu M, Li S, Jiang T
- Issue date: 2021 Jun 15
- HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey.
- Authors: Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A
- Issue date: 2022 Jan 6