Asymptotic Performance Analysis of the Randomly-Projected RLDA Ensemble Classi er
Permanent link to this recordhttp://hdl.handle.net/10754/655994
MetadataShow full item record
AbstractReliability and computational e ciency of classi cation error estimators are critical factors in classi er design. In a high-dimensional data setting where data is scarce, the conventional method of error estimation, cross-validation, can be very computationally expensive. In this thesis, we consider a particular discriminant analysis type classi er, the Randomly-Projected RLDA ensemble classi er, which operates under the assumption of such a `small sample' regime. We conduct an asymptotic study of the generalization error of this classi er under this regime, which necessitates the use of tools from the eld of random matrix theory. The main outcome of this study is a deterministic function of the true statistics of the data and the problem dimension that approximates the generalization error well for large enough dimensions. This is demonstrated by simulation on synthetic data. The main advantage of this approach is that it is computationally e cient. It also constitutes a major step towards the construction of a consistent estimator of the error that depends on the training data and not the true statistics, and so can be applied to real data. An analogous quantity for the Randomly-Projected LDA ensemble classi er, which appears in the literature and is a special case of the former, is also derived. We motivate its use for tuning the parameter of this classi er by simulation on synthetic data.