Evaluating the effect of annotation size on measures of semantic similarity

Handle URI:
http://hdl.handle.net/10754/623323
Title:
Evaluating the effect of annotation size on measures of semantic similarity
Authors:
Kulmanov, Maxat ( 0000-0003-1710-1820 ) ; Hoehndorf, Robert ( 0000-0001-8149-5890 )
Abstract:
Background: Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products.; Results: Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation.; Conclusions: Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Kulmanov M, Hoehndorf R (2017) Evaluating the effect of annotation size on measures of semantic similarity. Journal of Biomedical Semantics 8. Available: http://dx.doi.org/10.1186/s13326-017-0119-z.
Publisher:
Springer Nature
Journal:
Journal of Biomedical Semantics
Issue Date:
13-Feb-2017
DOI:
10.1186/s13326-017-0119-z
Type:
Article
ISSN:
2041-1480
Sponsors:
This research was supported by funding from the King Abdullah University of Science and Technology.
Additional Links:
http://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0119-z
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorKulmanov, Maxaten
dc.contributor.authorHoehndorf, Roberten
dc.date.accessioned2017-05-04T06:39:20Z-
dc.date.available2017-05-04T06:39:20Z-
dc.date.issued2017-02-13en
dc.identifier.citationKulmanov M, Hoehndorf R (2017) Evaluating the effect of annotation size on measures of semantic similarity. Journal of Biomedical Semantics 8. Available: http://dx.doi.org/10.1186/s13326-017-0119-z.en
dc.identifier.issn2041-1480en
dc.identifier.doi10.1186/s13326-017-0119-zen
dc.identifier.urihttp://hdl.handle.net/10754/623323-
dc.description.abstractBackground: Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products.en
dc.description.abstractResults: Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation.en
dc.description.abstractConclusions: Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions.en
dc.description.sponsorshipThis research was supported by funding from the King Abdullah University of Science and Technology.en
dc.publisherSpringer Natureen
dc.relation.urlhttp://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0119-zen
dc.rightsThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectSemantic similarityen
dc.subjectOntologyen
dc.subjectGene ontologyen
dc.titleEvaluating the effect of annotation size on measures of semantic similarityen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalJournal of Biomedical Semanticsen
dc.eprint.versionPublisher's Version/PDFen
kaust.authorKulmanov, Maxaten
kaust.authorHoehndorf, Roberten
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.