At the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2017-12-01.
The advancement of Nucleic acids (DNA and RNA) sequencing technology has enabled many projects targeted towards the identification of genome structure and transcriptome complexity of organisms. The first conclusions of the human and mouse projects have underscored two important, yet unexpected, findings. First, while almost the entire genome is transcribed, only 5% of it encodes for proteins. Thereby, most transcripts are noncoding RNA. This includes both short RNA (<200 nucleotides (nt)) comprising piRNAs; microRNAs (miRNAs); endogenous Short Interfering RNAs (siRNAs) among others, and includes lncRNA (>200nt). Second, a significant portion of the mammalian genome (45%) is composed of Repeat Elements (REs). RE are mostly relics of ancestral viruses that during evolution have invaded the host genome by producing thousands of copies. Their roles within their host genomes have yet to be fully explored considering that they sometimes produce lncRNA, and have been shown to influence expression at the transcriptional and post-transcriptional levels. Moreover, because some REs can still mobilize within host genomes, host genomes have evolved mechanisms, mainly epigenetic, to maintain REs under tight control. Recent reports indicate that REs activity is regulated in somatic cells, particularily in the brain, suggesting a physiological role of RE mobilization during normal development. In this thesis, I focus on the analysis of ncRNAs, specifically REs; piRNAs; lncRNAs in human and mouse post-mitotic somatic cells. The main aspects of this analysis are:
Using sRNA-Seq, I show that piRNAs, a class of ncRNAs responsible for the silencing of Transposable elements (TEs) in testes, are present also in adult mouse brain. Furthermore, their regulation shows only a subset of testes piRNAs are expressed in the brain and may be controlled by known neurogenesis factors.
To investigate the dynamics of the transcriptome during cellular differentiation, I examined deep RNA-Seq and Cap Analysis of Gene Expression (CAGE) data from time-course progression program of primary human skeletal muscle cell differentiation. I contrasted this program with Duchenne Muscular Dystrophy (DMD) donors. I identified novel candidates, protein-coding genes and lncRNAs, that may be involved in myogenesis and reaffirmed known myogenic players.
Using RNA-Seq data, I designed a novel pipeline to identify possible de novo insertion sites during muscular differentiation, which I have also tested on embryonic mouse cerebral cortex.