Long Read Based Individual Molecule Sequencing and Real-time Pathogen Detection

Access Restrictions
At the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation will become available to the public after the expiration of the embargo on 2022-10-04.

With the ability to produce reads with hundreds of kilobases in length, long-read sequencing technology is emerging as a powerful tool to decode complex genetic sequences that are previously inaccessible for short reads. Though the sequencing chemistry and base calling algorithm are being actively developed, the accuracy of the current long-read sequencing is still considerably low, thus limiting its applications. In this dissertation, I present three long read based DNA sequencing methods to overcome the limitation of read accuracy, contribute to a better understanding of Cas9 editing outcomes and mitochondrial DNA heterogeneity, and pave the way for real-time pathogen detection and mutation surveillance. The development of IDMseq enables the single-base-resolution haplotype-resolved quantitative characterization of diverse types of rare variants. IDMseq provides the first quantitative evidence of persistent nonrandom large structural variants following repair of double-strand breaks induced by CRISPR-Cas9 in human ESCs. The development of iMiGseq represents the first mitochondrial DNA sequencing method that provides ultra-sensitive variant detection, complete haplotyping, and unbiased evaluation of heteroplasmy levels, all at the individual mitochondrial DNA molecule level. iMiGseq uncovers unappreciated levels of heteroplasmic variants in single healthy human oocytes well below the current 1% detection limit, of which numerous variants are deleterious and associated with late-onset mitochondrial disease and cancer. It could comprehensively characterize and haplotype single-nucleotide and structural variants of mitochondrial DNA and their genetic linkage in NARP/Leigh syndrome patient-derived cells. The development of NIRVANA deals with the COVID-19 pandemic. NIRVANA can simultaneously detect SARS-CoV-2 and three co-infecting respiratory viruses, and monitor mutations for up to 96 samples in real time. It provides a promising solution for rapid field-deployable detection and mutation surveillance of pandemic viruses. Taken all together, IDMseq, iMiGseq and NIRVANA utilize the advantage of long reads, overcome the limitation of low accuracy, and facilitate the application of long-read sequencing technologies in multidisciplinary fields.

Bi, C. (2021). Long Read Based Individual Molecule Sequencing and Real-time Pathogen Detection. KAUST Research Repository. https://doi.org/10.25781/KAUST-V01RD


Permanent link to this record