Analysis and Design of Linear Classifiers for High-Dimensional, Small Sample Size Data Using Asymptotic Random Matrix Theory
Embargo End Date2024-12-20
AuthorsNiyazi, Lama B.
Committee MembersKammoun, Abla
ProgramElectrical and Computer Engineering
KAUST DepartmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Access RestrictionsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation will become available to the public after the expiration of the embargo on 2024-12-20.
AbstractDue to a variety of potential barriers to sample acquisition, many of the datasets encountered in important classification applications, ranging from tumor identification to facial recognition, are characterized by small samples of high-dimensional data. In such situations, linear classifiers are popular as they have less risk of overfitting while being faster and more interpretable than non-linear classifiers. They are also easier to understand and implement for the inexperienced practitioner.
In this dissertation, several gaps in the literature regarding the analysis and design of linear classifiers for high-dimensional data are addressed using tools from the field of asymptotic Random Matrix Theory (RMT) which facilitate the derivation of limits of relevant quantities or distributions, such as the probability of misclassification of a particular classifier or the asymptotic distribution of its discriminant, in the RMT regime where both the sample size and dimensionality of the data grow together. The resulting insights extracted from these limits allow for a deeper understanding of the classifier's behavior as well as lay out the groundwork from which to propose modifications to the classifier in order to improve its performance. Asymptotic RMT is also used in this dissertation to derive estimators of quantities of interest which are consistent in the RMT regime. Besides this, the estimators facilitate the tuning of classifier hyperparameters without resort to empirical methods such as cross-validation which can be very computationally-taxing when high-dimensional data is involved.
This work begins with an asymptotic study of the discriminant-averaging and vote-averaging Randomly Projected Linear Discriminant Analysis (RP-LDA) ensemble classifiers -- two high-dimensional variants of the classical Linear Discriminant Analysis (LDA) classifier based on random projections. The asymptotically optimal ensemble based on randomly-projected LDA discriminants for Gaussian data is found to be a form of discriminant-averaging and it is shown that selecting projections for inclusion in the ensemble based on some metric of expected performance offers no performance advantage. Furthermore, a closer look at the infinite ensemble version of the discriminant-averaging RP-LDA ensemble classifier, where the Marzetta estimator of the precision matrix arises in the discriminant, reveals that the Marzetta estimator behaves as an inversion of a linear regularization of the sample covariance matrix. This has the implication that the discriminant-averaging RP-LDA ensemble classifier asymptotically behaves as a special case of Regularized LDA (R-LDA) with coarser parameter tuning since its regularization parameter varies with the integer projection dimension. From there, the class of rotationally-invariant estimators--to which the Marzetta estimator belongs--is studied in the context of classification. A modified LDA classifier based on a rotationally-invariant estimator of the sample covariance matrix having non-linear shrinkage which minimizes the probability of misclassification is proposed. Finally, a technique for tuning the weight vector of a generic binary linear classifier is developed. This technique is shown to not only yield performance gains for LDA in a small sample scenario, but to also compensate for the performance loss due to non-optimal native hyperparameters of classifiers such as the Support Vector Machine (SVM) with linear kernel, which would otherwise be computationally costly to tune optimally. The dissertation is concluded with an application of the weight vector tuning technique to the transfer learning of a deep neural network, showcasing the ubiquity of linear classifiers and the potential for widespread applicability of the proposed technique.