Show simple item record

dc.contributor.authorUmarov, Ramzan
dc.contributor.authorLi, Yu
dc.contributor.authorVan Neste, Christophe
dc.date.accessioned2020-01-27T08:09:18Z
dc.date.available2020-01-27T08:09:18Z
dc.date.issued2020-1-20
dc.identifier.urihttp://hdl.handle.net/10754/661203
dc.description.abstractNNfold: RNA secondary structure prediction by deep learning RNA molecules have a plethora of functions within the cell. These functions can be divided into information-carrier, catalytic, or structural (scaffolding of other molecules), or a combination. For the catalytic or regulation functionality the structure that the RNA molecule has is pivotal and predicting to which structure it is most likely to fold is therefore essential to fully understand its biological role. In general, RNA affects extensively protein regulation, through its control of gene expression, post-transcriptional modifications, or translational regulation. RNA secondary structure can be obtained by techniques such as X-ray diffraction and NMR. However, biological experimental methods are still inefficient and expensive. Thus, computational prediction algorithms are still widely used for predicting RNA secondary structures. Taking the raw sequence represented in a string, we first use a one-hot encoding. The encoded matrix has a dimension of L by 4. Then, the encoding will go through two models,  the local model and the global model, to extract local contact information and global contact information, respectively. Regrading the local model, the input for the model are two chunks of the raw encoding, whose dimensions are 20 by 20. Then we concatenate those 20 by 20 chunk matrices into the L by L local contact information matrix. We used six 1D convolutional layers and one fully-connected layer to model the local information.  In terms of the global model, we use three 1D convolutional layers to predict whether a base can pair with any other base or not, whose output is a vector of length L.  In the vector, 1 means the corresponding base may pair with the other base and 0 means that the corresponding base does not pair with this base.  To combine the local information and the global information, we convert the global vector into a symmetric matrix of L by L and perform a pairwise multiplication between the global information and the local information,  enforcing the global constraint into the preliminary contact map. After combining the global information and the local information, the obtained global contact map may still violate the two constraints mentioned above.  We used the following greedy sorting algorithm to resolve the conflict. We introduce NNfold, a sequence based deep learning method to predict RNA secondary structure. The predictions are made in two steps: first we construct a matrix with likelihood of each nucleotide pairing by predicting all potential interactions using convolutional deep learning model. Next, we modify the base pairs list obtained from the matrix using second model whose output is used to ensure validity of the final secondary structure. NNfold performed much better than thermodynamics-based methods on the diverse set of RNA sequences, improving average F1 score by 0.20. It is also capable of predicting pseudoknots which is a challenging task for other approaches.
dc.relation.urlhttps://epostersonline.com//dh2020/node/34
dc.titleNNfold: RNA secondary structure prediction by deep learning
dc.typePoster
dc.conference.dateJAN 20 - 22, 2020
dc.conference.nameDigital Health 2020
dc.conference.locationKAUST
dc.contributor.institution
refterms.dateFOA2020-01-27T08:09:18Z


Files in this item

This item appears in the following Collection(s)

Show simple item record