DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning

Handle URI:
http://hdl.handle.net/10754/621869
Title:
DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning
Authors:
Soufan, Othman ( 0000-0002-4410-1853 ) ; Ba Alawi, Wail ( 0000-0002-2747-4703 ) ; Afeef, Moataz A.; Essack, Magbubah ( 0000-0003-2709-5356 ) ; Kalnis, Panos ( 0000-0002-5060-1360 ) ; Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Abstract:
Background Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label. Thus, the devised method provides an increased probability for more accurate predictions of compounds that were not tested in particular assays. Results Here we present DRABAL, a novel MLC solution that incorporates structure learning of a Bayesian network as a step to model dependency between the HTS assays. In this study, DRABAL was used to process more than 1.4 million interactions of over 400,000 compounds and analyze the existing relationships between five large HTS assays from the PubChem BioAssay Database. Compared to different MLC methods, DRABAL significantly improves the F1Score by about 22%, on average. We further illustrated usefulness and utility of DRABAL through screening FDA approved drugs and reported ones that have a high probability to interact with several targets, thus enabling drug-multi-target repositioning. Specifically DRABAL suggests the Thiabendazole drug as a common activator of the NCP1 and Rab-9A proteins, both of which are designed to identify treatment modalities for the Niemann–Pick type C disease. Conclusion We developed a novel MLC solution based on a Bayesian active learning framework to overcome the challenge of lacking fully labeled training data and exploit actual dependencies between the HTS assays. The solution is motivated by the need to model dependencies between existing experimental confirmatory HTS assays and improve prediction performance. We have pursued extensive experiments over several HTS assays and have shown the advantages of DRABAL. The datasets and programs can be downloaded from https://figshare.com/articles/DRABAL/3309562.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Soufan O, Ba-Alawi W, Afeef M, Essack M, Kalnis P, et al. (2016) DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning. Journal of Cheminformatics 8. Available: http://dx.doi.org/10.1186/s13321-016-0177-8.
Publisher:
Springer Nature
Journal:
Journal of Cheminformatics
KAUST Grant Number:
URF/1/1976-02
Issue Date:
10-Nov-2016
DOI:
10.1186/s13321-016-0177-8
Type:
Article
ISSN:
1758-2946
Sponsors:
Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) and KAUST Office of Sponsored Research (OSR) under Award No. URF/1/1976-02. The computational analysis for this study was performed on the Dragon and Snapdragon compute clusters of the Computational Bioscience Research Center at KAUST.
Is Supplemented By:
Soufan, O., Ba-Alawi, W., Moataz Afeef, Magbubah Essack, Kalnis, P., & Bajic, V. (2016). DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning. Figshare. https://doi.org/10.6084/m9.figshare.c.3696499; DOI:10.6084/m9.figshare.c.3696499; HANDLE:http://hdl.handle.net/10754/624144
Additional Links:
http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0177-8
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorSoufan, Othmanen
dc.contributor.authorBa Alawi, Wailen
dc.contributor.authorAfeef, Moataz A.en
dc.contributor.authorEssack, Magbubahen
dc.contributor.authorKalnis, Panosen
dc.contributor.authorBajic, Vladimir B.en
dc.date.accessioned2016-11-23T13:48:38Z-
dc.date.available2016-11-23T13:48:38Z-
dc.date.issued2016-11-10en
dc.identifier.citationSoufan O, Ba-Alawi W, Afeef M, Essack M, Kalnis P, et al. (2016) DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning. Journal of Cheminformatics 8. Available: http://dx.doi.org/10.1186/s13321-016-0177-8.en
dc.identifier.issn1758-2946en
dc.identifier.doi10.1186/s13321-016-0177-8en
dc.identifier.urihttp://hdl.handle.net/10754/621869-
dc.description.abstractBackground Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label. Thus, the devised method provides an increased probability for more accurate predictions of compounds that were not tested in particular assays. Results Here we present DRABAL, a novel MLC solution that incorporates structure learning of a Bayesian network as a step to model dependency between the HTS assays. In this study, DRABAL was used to process more than 1.4 million interactions of over 400,000 compounds and analyze the existing relationships between five large HTS assays from the PubChem BioAssay Database. Compared to different MLC methods, DRABAL significantly improves the F1Score by about 22%, on average. We further illustrated usefulness and utility of DRABAL through screening FDA approved drugs and reported ones that have a high probability to interact with several targets, thus enabling drug-multi-target repositioning. Specifically DRABAL suggests the Thiabendazole drug as a common activator of the NCP1 and Rab-9A proteins, both of which are designed to identify treatment modalities for the Niemann–Pick type C disease. Conclusion We developed a novel MLC solution based on a Bayesian active learning framework to overcome the challenge of lacking fully labeled training data and exploit actual dependencies between the HTS assays. The solution is motivated by the need to model dependencies between existing experimental confirmatory HTS assays and improve prediction performance. We have pursued extensive experiments over several HTS assays and have shown the advantages of DRABAL. The datasets and programs can be downloaded from https://figshare.com/articles/DRABAL/3309562.en
dc.description.sponsorshipResearch reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) and KAUST Office of Sponsored Research (OSR) under Award No. URF/1/1976-02. The computational analysis for this study was performed on the Dragon and Snapdragon compute clusters of the Computational Bioscience Research Center at KAUST.en
dc.publisherSpringer Natureen
dc.relation.urlhttp://jcheminf.springeropen.com/articles/10.1186/s13321-016-0177-8en
dc.rightsThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleDRABAL: novel method to mine large high-throughput screening assays using Bayesian active learningen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalJournal of Cheminformaticsen
dc.eprint.versionPublisher's Version/PDFen
kaust.authorSoufan, Othmanen
kaust.authorBa Alawi, Wailen
kaust.authorAfeef, Moataz A.en
kaust.authorEssack, Magbubahen
kaust.authorKalnis, Panosen
kaust.authorBajic, Vladimir B.en
kaust.grant.numberURF/1/1976-02en
dc.relation.isSupplementedBySoufan, O., Ba-Alawi, W., Moataz Afeef, Magbubah Essack, Kalnis, P., & Bajic, V. (2016). DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning. Figshare. https://doi.org/10.6084/m9.figshare.c.3696499en
dc.relation.isSupplementedByDOI:10.6084/m9.figshare.c.3696499en
dc.relation.isSupplementedByHANDLE:http://hdl.handle.net/10754/624144en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.