Show simple item record

dc.contributor.advisorZhang, Xiangliang
dc.contributor.authorQahtan, Abdulhakim Ali Ali
dc.date.accessioned2016-05-11T12:19:23Z
dc.date.available2016-05-11T12:19:23Z
dc.date.issued2016-05-11
dc.identifier.doi10.25781/KAUST-F235N
dc.identifier.urihttp://hdl.handle.net/10754/609049
dc.description.abstractRecent advances in computing technology allow for collecting vast amount of data that arrive continuously in the form of streams. Mining data streams is challenged by the speed and volume of the arriving data. Furthermore, the underlying distribution of the data changes over the time in unpredicted scenarios. To reduce the computational cost, data streams are often studied in forms of condensed representation, e.g., Probability Density Function (PDF). This thesis aims at developing an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track estimates the PDF of the stream at a set of resampling points and uses interpolation to estimate the density at any given point. To reduce the interpolation error and computational complexity, we introduce adaptive resampling where more/less resampling points are used in high/low curved regions of the PDF. The PDF values at the resampling points are updated online to provide up-to-date model of the data stream. Comparing with other existing online density estimators, KDE-Track is often more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time). The anytime available PDF estimated by KDE-Track can be applied for visualizing the dynamic density of data streams, outlier detection and change detection in data streams. In this thesis work, the first application is to visualize the taxi traffic volume in New York city. Utilizing KDE-Track allows for visualizing and monitoring the traffic flow on real time without extra overhead and provides insight analysis of the pick up demand that can be utilized by service providers to improve service availability. The second application is to detect outliers in data streams from sensor networks based on the estimated PDF. The method detects outliers accurately and outperforms baseline methods designed for detecting and cleaning outliers in sensor data. The third application is to detect changes in data streams. We propose a framework based on Principal Component Analysis (PCA) that reduces the problem of detecting changes in multidimensional data into the problem of detecting changes in the projected data on the principal components. We provide a theoretical analysis, which is support by experimental results to show that utilizing PCA reflects different types of changes in data streams on the projected data over one or more principal components. Our framework is accurate in detecting changes with low computational costs and scales well for high dimensional data.
dc.language.isoen
dc.subjectdata streams
dc.subjectdensity estimation
dc.subjectdynamic density
dc.subjectchange detection
dc.subjectouttier detection
dc.titleEfficient Estimation of Dynamic Density Functions with Applications in Streaming Data
dc.typeDissertation
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberWang, Suojin
dc.contributor.committeememberGama, Joao
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberGao, Xin
thesis.degree.disciplineComputer Science
thesis.degree.nameDoctor of Philosophy
refterms.dateFOA2017-05-11T00:00:00Z


Files in this item

Thumbnail
Name:
AbdulhakimQahtanThesis copy.pdf
Size:
15.20Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record