Show simple item record

dc.contributor.advisorBajic, Vladimir B.
dc.contributor.authorIsmail, Anas
dc.date.accessioned2019-11-03T11:06:09Z
dc.date.available2019-11-03T11:06:09Z
dc.date.issued2019-10-31
dc.identifier.doi10.25781/KAUST-587QT
dc.identifier.urihttp://hdl.handle.net/10754/659504
dc.description.abstractThe need to measure similarity between two objects is everywhere. It is not always clear what it means for two objects to be similar. The definition changes depending on the area of application. However, similarity between two objects is generally defined as an inverse function to the distance between them. Also it is not always easy to apply distance functions on objects directly. Sometimes, we have to transform them or embed them in another space first before we can calculate distance and subsequently similarity. We introduce three similarity algorithms/measures to quantify similarity between objects in different applications. First, we propose the first non brute force algorithm to calculate the Gromov hyperbolicity constant. We present several approximate and exact algorithms to solve this problem. For example, we provide an exact algorithm to compute the hyperbolicity constant in time O (n3:686) for a discrete metric space. We also show that hyperbolicity at a fixed base-point cannot be computed in O(n2:05) time, unless there exists a faster algorithm for (max,min) matrix multiplication than currently known. Then, we present a new system to find proteins similar in functionality. We employ text mining techniques to map text similarity to similarity in functionality. We use manually curated data from Swiss-Prot to train and build our system. The result is a search engine that given a query protein, reports the top similar proteins in functionality with 99% accuracy. The system is tested extensively using GO annotations. We used this system, that predicts similarity in function, to enhance protein annotations. In particular, we were able to predict that some GO annotations should be added to some proteins. After careful literature reviews we were able to con rm many of those predictions, for example, in one case, we have 96% prediction accuracy. We also present a new algorithm for measuring the similarity between GPS traces. Our algorithm is robust against subsampling and supersampling. We perform experiments to compare this new similarity measure with the two main approaches that have been used so far: Dynamic Time Warping (DTW) and the Euclidean distance and our algorithm outperforms both of them in most of the cases.
dc.language.isoen
dc.subjectalgorithms
dc.subjecttrajectory
dc.subjectprotein
dc.subjectdata mining
dc.subjectGromov
dc.subjectDoc2 Vec
dc.titleSimilarity Algorithms for Embeddable Objects
dc.typeDissertation
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberLaleg-Kirati, Taous-Meriem
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberAl Jumaily, Adel
thesis.degree.disciplineComputer Science
thesis.degree.nameDoctor of Philosophy
kaust.request.doiyes


Files in this item

Thumbnail
Name:
PhD_thesis_final.pdf
Size:
1.368Mb
Format:
PDF
Embargo End Date:
2020-10-31

This item appears in the following Collection(s)

Show simple item record