Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

Handle URI:
http://hdl.handle.net/10754/617533
Title:
Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes
Authors:
Abusamra, Heba; Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Abstract:
The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Applied Mathematics and Computational Science Program; Computational Bioscience Research Center (CBRC)
Publisher:
Springer Nature
Journal:
BMC Genomics
Conference/Event name:
The 3rd International Genomic Medicine Conference (3rd IGMC 2015)
Issue Date:
20-Jul-2016
DOI:
10.1186/s12864-016-2858-0
Type:
Presentation
Additional Links:
http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2858-0
Appears in Collections:
Applied Mathematics and Computational Science Program; Computer Science Program; Computational Bioscience Research Center (CBRC); Presentations; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAbusamra, Hebaen
dc.contributor.authorBajic, Vladimir B.en
dc.date.accessioned2016-07-26T09:28:24Z-
dc.date.available2016-07-26T09:28:24Z-
dc.date.issued2016-07-20-
dc.identifier.doi10.1186/s12864-016-2858-0en
dc.identifier.urihttp://hdl.handle.net/10754/617533-
dc.description.abstractThe native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.en
dc.publisherSpringer Natureen
dc.relation.urlhttp://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2858-0en
dc.titleClustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genesen
dc.typePresentationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalBMC Genomicsen
dc.conference.date30 November - 3 December 2015en
dc.conference.nameThe 3rd International Genomic Medicine Conference (3rd IGMC 2015)en
dc.conference.locationJeddah, Kingdom of Saudi Arabiaen
dc.contributor.institutionCenter of Excellence in Genomic Medicine Research, King Abdulaziz University, PO Box 80216, Jeddah 21589, Saudi Arabiaen
kaust.authorAbusamra, Hebaen
kaust.authorBajic, Vladimir B.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.