FrameRate: learning the coding potential of unassembled metagenomic reads
dc.contributor.author | Wang, Liu-Wei | |
dc.contributor.author | Aubrey, Wayne | |
dc.contributor.author | Clare, Amanda | |
dc.contributor.author | Hoehndorf, Robert | |
dc.contributor.author | Creevey, Christopher J | |
dc.contributor.author | Dimonaco, Nicholas J | |
dc.date.accessioned | 2022-09-20T12:53:37Z | |
dc.date.available | 2022-09-20T12:53:37Z | |
dc.date.issued | 2022-09-19 | |
dc.identifier.citation | Liu-Wei, W., Aubrey, W., Clare, A., Hoehndorf, R., Creevey, C. J., & Dimonaco, N. J. (2022). FrameRate: learning the coding potential of unassembled metagenomic reads. https://doi.org/10.1101/2022.09.16.508314 | |
dc.identifier.doi | 10.1101/2022.09.16.508314 | |
dc.identifier.uri | http://hdl.handle.net/10754/681615 | |
dc.description.abstract | Motivation: Metagenomic assembly is a slow and computationally intensive process and despite needing iterative rounds for improvement and completeness the resulting assembly often fails to incorporate many of the input sequencing reads. This is further complicated when there is reduced read-depth and/or artefacts which result in chimeric assemblies both of which are especially prominent in the assembly of metagenomic datasets. Many of these limitations could potentially be overcome by exploiting the information content stored in the reads directly and thus eliminating the need for assembly in a number of situations. Results: We explored the prediction of coding potential of DNA reads by training a machine learning model on existing protein sequences. Named 'FrameRate', this model can predict the coding frame(s) from unassembled DNA sequencing reads directly, thus greatly reducing the computational resources required for genome assembly and similarity-based inference to pre-computed databases. Using the eggNOG-mapper function annotation tool, the predicted coding frames from FrameRate were functionally verified by comparing to the results from full-length protein sequences reconstructed with an established metagenome assembly and gene prediction pipeline from the same metagenomic sample. FrameRate captured equivalent functional profiles from the coding frames while reducing the required storage and time resources significantly. FrameRate was also able to annotate reads that were not represented in the assembly, capturing this 'missing' information. As an ultra-fast read-level assembly-free coding profiler, FrameRate enables rapid characterisation of almost every sequencing read directly, whether it can be assembled or not, and thus circumvent many of the problems caused by contemporary assembly workflows. | |
dc.description.sponsorship | N.J.D. was funded by an IBERS Aberystwyth PhD fellowship. C.J.C. wishes to acknowledge funding from the BBSRC (BB/E/W/10964A01 & BBS/OS/GC/000011B); DAFM Ireland/DAERA Northern Ireland (Meth-Abate, R3192GFS) and the EC via Horizon 2020 (818368, MASTER). W.L.W was funded by European Union’s Horizon 2020 research and innovation program, under the Marie SkłodowskaCurie Actions Innovative Training Networks grant agreement no. 955974 (VIROINF). | |
dc.publisher | Cold Spring Harbor Laboratory | |
dc.relation.url | http://biorxiv.org/lookup/doi/10.1101/2022.09.16.508314 | |
dc.rights | This is a preprint version of a paper and has not been peer reviewed. Archived with thanks to Cold Spring Harbor Laboratory. It is made available under a CC-BY 4.0 International license. | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.title | FrameRate: learning the coding potential of unassembled metagenomic reads | |
dc.type | Preprint | |
dc.contributor.department | Bio-Ontology Research Group (BORG) | |
dc.contributor.department | Computational Bioscience Research Center (CBRC) | |
dc.contributor.department | Computer Science Program | |
dc.contributor.department | Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division | |
dc.eprint.version | Pre-print | |
kaust.person | Hoehndorf, Robert | |
refterms.dateFOA | 2022-09-20T12:54:32Z |
Files in this item
This item appears in the following Collection(s)
-
Bio-Ontology Research Group (BORG)
For more information visit: https://cemse.kaust.edu.sa/borg -
Preprints
-
Computer Science Program
For more information visit: https://cemse.kaust.edu.sa/cs -
Computational Bioscience Research Center (CBRC)
-
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/