Discovering highly informative feature set over high dimensions

Handle URI:
http://hdl.handle.net/10754/564625
Title:
Discovering highly informative feature set over high dimensions
Authors:
Zhang, Chongsheng; Masseglia, Florent; Zhang, Xiangliang ( 0000-0002-3574-5665 )
Abstract:
For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.
KAUST Department:
Machine Intelligence & kNowledge Engineering Lab; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
2012 IEEE 24th International Conference on Tools with Artificial Intelligence
Conference/Event name:
2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
Issue Date:
Nov-2012
DOI:
10.1109/ICTAI.2012.149
Type:
Conference Paper
ISSN:
10823409
ISBN:
9780769549156
Appears in Collections:
Conference Papers; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorZhang, Chongshengen
dc.contributor.authorMasseglia, Florenten
dc.contributor.authorZhang, Xiangliangen
dc.date.accessioned2015-08-04T07:05:25Zen
dc.date.available2015-08-04T07:05:25Zen
dc.date.issued2012-11en
dc.identifier.isbn9780769549156en
dc.identifier.issn10823409en
dc.identifier.doi10.1109/ICTAI.2012.149en
dc.identifier.urihttp://hdl.handle.net/10754/564625en
dc.description.abstractFor many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.subjectFeature Selectionen
dc.subjecthigh dimensionsen
dc.subjectUnsuperviseden
dc.titleDiscovering highly informative feature set over high dimensionsen
dc.typeConference Paperen
dc.contributor.departmentMachine Intelligence & kNowledge Engineering Laben
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.identifier.journal2012 IEEE 24th International Conference on Tools with Artificial Intelligenceen
dc.conference.date7 November 2012 through 9 November 2012en
dc.conference.name2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012en
dc.conference.locationAthensen
dc.contributor.institutionHenan University, 475004 Kaifeng, Chinaen
dc.contributor.institutionZenith Team, INRIA, 34095 Montpellier, Franceen
kaust.authorZhang, Xiangliangen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.