Protein Function Prediction Based on Sequence and Structure Information

Access Restrictions
At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2017-05-25.

The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

Smaili, F. Z. (2016). Protein Function Prediction Based on Sequence and Structure Information. KAUST Research Repository.


Permanent link to this record