Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

Handle URI:
http://hdl.handle.net/10754/625200
Title:
Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking
Authors:
Huang, Huang ( 0000-0002-5950-4698 )
Abstract:
This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as separability and full symmetry. We formulate test functions as functions of temporal lags for each pair of spatial locations and develop a rank-based testing procedure induced by functional data depth for assessing these properties. The method is illustrated using simulated data from widely used spatio-temporal covariance models, as well as real datasets from weather stations and climate model outputs.
Advisors:
Sun, Ying ( 0000-0001-6703-4270 )
Committee Member:
Alouini, Mohamed-Slim ( 0000-0003-4827-1793 ) ; Genton, Marc G. ( 0000-0001-6467-2998 ) ; Keyes, David E. ( 0000-0002-4052-7224 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Applied Mathematics and Computational Science
Issue Date:
16-Jul-2017
Type:
Dissertation
Appears in Collections:
Dissertations

Full metadata record

DC FieldValue Language
dc.contributor.advisorSun, Yingen
dc.contributor.authorHuang, Huangen
dc.date.accessioned2017-07-16T12:50:32Z-
dc.date.available2017-07-16T12:50:32Z-
dc.date.issued2017-07-16-
dc.identifier.urihttp://hdl.handle.net/10754/625200-
dc.description.abstractThis thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as separability and full symmetry. We formulate test functions as functions of temporal lags for each pair of spatial locations and develop a rank-based testing procedure induced by functional data depth for assessing these properties. The method is illustrated using simulated data from widely used spatio-temporal covariance models, as well as real datasets from weather stations and climate model outputs.en
dc.language.isoenen
dc.subjectLarge spatial data seten
dc.subjectlow rank approximationen
dc.subjectFunctional Data Analysisen
dc.subjectspatio-temporal covarianceen
dc.subjectStatistical efficiencyen
dc.subjectOutlier detectionen
dc.titleComputational Methods for Large Spatio-temporal Datasets and Functional Data Rankingen
dc.typeDissertationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberAlouini, Mohamed-Slimen
dc.contributor.committeememberGenton, Marc G.en
dc.contributor.committeememberKeyes, David E.en
thesis.degree.disciplineApplied Mathematics and Computational Scienceen
thesis.degree.nameDoctor of Philosophyen
dc.person.id133284en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.