Name:
MasterThesisMdNurulMuttakin.pdf
Size:
2.973Mb
Format:
PDF
Description:
MS Thesis
Embargo End Date:
2024-05-11
Type
ThesisAuthors
Muttakin, Md Nurul
Advisors
Hoehndorf, Robert
Committee members
Ombao, Hernando
Elhoseiny, Mohamed

Program
Computer ScienceDate
2023-05Embargo End Date
2024-05-11Permanent link to this record
http://hdl.handle.net/10754/691654
Metadata
Show full item recordAccess Restrictions
At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-05-11.Abstract
Machine learning models such as AlphaFold can generate protein 3D conformation from primary sequence up to experimental accuracy, which gives rise to a bunch of research works to predict protein functions from 3D structures. Almost all of these works attempted to use graph neural networks (GNN) to learn 3D structures of proteins from 2D contact maps/graphs. Most of these works use rich 1D features such as ESM and LSTM embedding in addition to the contact graph. These rich 1D features essentially obfuscate the learning capability of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from contact map graphs in the existing framework, where we attempt to incorporate distance information for better predictive performance. We found that GCNs fall far short with 1D-CNN without language models, even with distance information. Consequently, we further investigate the capabilities of GCNs to distinguish subgraph patterns corresponding to the InterPro domains. We found that GCNs perform better than highly rich sequence embedding with MLP in recognizing the structural patterns. Finally, we investigate the capability of GCNs to predict GO-terms (functions) individually. We found that GCNs perform almost on par in identifying GO-terms in the presence of only hard positive and hard negative examples. We also identified some GO-terms indistinguishable by GCNs and ESM2-based MLP models. This gives rise to new research questions to be investigated by future works.Citation
Muttakin, Md Nurul. (2023). Learning 3D structures for protein function prediction [KAUST Research Repository]. https://doi.org/10.25781/KAUST-VXC7Pae974a485f413a2113503eed53cd6c53
10.25781/KAUST-VXC7P