Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach

License
http://creativecommons.org/licenses/by/4.0/

Type
Article

Authors
Kuwahara, Hiroyuki
Gao, Xin

KAUST Department
Computational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group

KAUST Grant Number
BAS/1/1624-01
FCC/1/1976-18
FCC/1/1976-23
FCC/1/1976-25
FCC/1/1976-26
URF/1/3412-01
URF/1/3450-01

Online Publication Date
2021-03-23

Print Publication Date
2021-12

Date
2021-03-23

Submitted Date
2020-05-08

Abstract
AbstractTwo-dimensional (2D) chemical fingerprints are widely used as binary features for the quantification of structural similarity of chemical compounds, which is an important step in similarity-based virtual screening (VS). Here, using an eigenvalue-based entropy approach, we identified 2D fingerprints with little to no contribution to shaping the eigenvalue distribution of the feature matrix as related ones and examined the degree to which these related 2D fingerprints influenced molecular similarity scores calculated with the Tanimoto coefficient. Our analysis identified many related fingerprints in publicly available fingerprint schemes and showed that their presence in the feature set could have substantial effects on the similarity scores and bias the outcome of molecular similarity analysis. Our results have implication in the optimal selection of 2D fingerprints for compound similarity analysis and the identification of potential hits for compounds with target biological activity in VS.

Citation
Kuwahara, H., & Gao, X. (2021). Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach. Journal of Cheminformatics, 13(1). doi:10.1186/s13321-021-00506-2

Acknowledgements
This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards No. BAS/1/1624-01, URF/1/3412-01, URF/1/3450-01, FCC/1/1976-18, FCC/1/1976-23, FCC/1/1976-25, FCC/1/1976-26, and FCS/1/4102-02.

Publisher
Springer Nature

Journal
Journal of Cheminformatics

DOI
10.1186/s13321-021-00506-2
10.1101/853762

Additional Links
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00506-2

Permanent link to this record