Self-adaptive change detection in streaming data with non-stationary distribution
Type
Conference PaperAuthors
Zhang, Xiangliang
Wang, Wei

KAUST Department
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionComputer Science Program
Machine Intelligence & kNowledge Engineering Lab
Date
2010Permanent link to this record
http://hdl.handle.net/10754/564264
Metadata
Show full item recordAbstract
Non-stationary distribution, in which the data distribution evolves over time, is a common issue in many application fields, e.g., intrusion detection and grid computing. Detecting the changes in massive streaming data with a non-stationary distribution helps to alarm the anomalies, to clean the noises, and to report the new patterns. In this paper, we employ a novel approach for detecting changes in streaming data with the purpose of improving the quality of modeling the data streams. Through observing the outliers, this approach of change detection uses a weighted standard deviation to monitor the evolution of the distribution of data streams. A cumulative statistical test, Page-Hinkley, is employed to collect the evidence of changes in distribution. The parameter used for reporting the changes is self-adaptively adjusted according to the distribution of data streams, rather than set by a fixed empirical value. The self-adaptability of the novel approach enhances the effectiveness of modeling data streams by timely catching the changes of distributions. We validated the approach on an online clustering framework with a benchmark KDDcup 1999 intrusion detection data set as well as with a real-world grid data set. The validation results demonstrate its better performance on achieving higher accuracy and lower percentage of outliers comparing to the other change detection approaches. © 2010 Springer-Verlag.Citation
Zhang, X., & Wang, W. (2010). Self-adaptive Change Detection in Streaming Data with Non-stationary Distribution. Lecture Notes in Computer Science, 334–345. doi:10.1007/978-3-642-17316-5_33Publisher
Springer NatureConference/Event name
6th International Conference on Advanced Data Mining and Applications, ADMA 2010ISBN
3642173152; 9783642173158ae974a485f413a2113503eed53cd6c53
10.1007/978-3-642-17316-5_33