Learning via Query Synthesis

Handle URI:
http://hdl.handle.net/10754/623482
Title:
Learning via Query Synthesis
Authors:
Alabdulmohsin, Ibrahim Mansour ( 0000-0002-9387-5820 )
Abstract:
Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. The advantages of this method are established both theoretically and empirically.
Advisors:
Zhang, Xiangliang ( 0000-0002-3574-5665 )
Committee Member:
Keyes, David E. ( 0000-0002-4052-7224 ) ; Wang, Wei; Gao, Xin ( 0000-0002-7108-3574 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
7-May-2017
Type:
Dissertation
Appears in Collections:
Dissertations

Full metadata record

DC FieldValue Language
dc.contributor.advisorZhang, Xiangliangen
dc.contributor.authorAlabdulmohsin, Ibrahim Mansouren
dc.date.accessioned2017-05-14T07:48:52Z-
dc.date.available2017-05-14T07:48:52Z-
dc.date.issued2017-05-07-
dc.identifier.urihttp://hdl.handle.net/10754/623482-
dc.description.abstractActive learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. The advantages of this method are established both theoretically and empirically.en
dc.language.isoenen
dc.subjectactive learningen
dc.subjectquery synthesisen
dc.subjectreverse engineeringen
dc.subjectsupport vector machineen
dc.subjectindefinite kernelsen
dc.subjectlinear classificationen
dc.titleLearning via Query Synthesisen
dc.typeDissertationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberKeyes, David E.en
dc.contributor.committeememberWang, Weien
dc.contributor.committeememberGao, Xinen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameDoctor of Philosophyen
dc.person.id124269en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.