An alternative class of targets for microRNAs containing CG dinucleotide

Background MicroRNAs are endogenous ∼23nt RNAs which regulate mRNA targets mainly through perfect pairing with their seed region (positions 2-7). Several instances of bulge UTR sequence can also be recognized by miRNA as their target. But such non-Watson-Crick base pairings are incompletely understood. Results We found a group of miRNAs which had very few conservative targets while potentially having a subclass of bulge message RNA targets. Compared with the canonical target, these bulge targets had a lower negative correlation with the miRNA expression, and either were downregulated in the miRNA overexpression experiment or upregulated in the miRNA knock-down experiment. Conclusions We proved that the bulge target exists widely in certain groups of miRNAs and such non-canonical targets can be recoginized by miRNA. Incorporating these bulge targets, combined with evolutionary conservation, will reduce the false-positive rate of microRNA computational target prediction.


Abstract 24
Background 25 MicroRNAs are endogenous ~23nt RNAs which regulate mRNA targets mainly 26 through perfect pairing with their seed region (positions 2-7). Several instances of 27 bulge UTR sequence can also be recognized by miRNA as their target. But such non-28 Watson-Crick base pairings are incompletely understood. 29

30
We found a group of miRNAs which had very few conservative targets while 31 potentially having a subclass of bulge message RNA targets. Compared with the 32 canonical target, these bulge targets had a lower negative correlation with the miRNA 33 expression, and either were downregulated in the miRNA overexpression experiment 34 or upregulated in the miRNA knock-down experiment. 35

36
We proved that the bulge target exists widely in certain groups of miRNAs and such 37 non-canonical targets can be recoginized by miRNA. Incorporating these bulge 38 targets, combined with evolutionary conservation, will reduce the false-positive rate 39 of microRNA computational target prediction. 40 Background 41 MicroRNAs(miRNAs) are ~23 nucleotide RNAs that regulate eukaryotic gene 42 expression post-transcriptally [1]. miRNAs use base-pairing to guide RNA-induced 43 silencing complexes (RISCs) to specific message RNAs with fully or partly 44 complementary sequences, primarily in the 3' untranslated region [2]. The best 45 characterized features determining animal miRNA-target recognition are six-46 nucleotide (nt) long seed sites, which perfectly complement the 5' end of the miRNA 47 -3 -predicting conserved targets above the noise of false-positive predictions in most 49 miRNAs [4]. 50 Most of the miRNA-target prediction algrorithms rely heavily on seed rules and 51 evolutionary conservation [5,6]. However, such strategies suffer from missing the 52 noncanonical target sites [7]. Several bioloical studies have functionally validated the 53 existence of imperfect binding sites [8-10]. 54 Recently, Ago HITS-CLIP was used to precisely map the miRNA-binding sites in 55 both Caenorhabditis elegans [11] and mouse brains [7]. However, about one-quarter 56 of the total binding sites did not follow the classical seed rules in mouse brains [7]. 57 Further analysis revealed that the miR-124, one of the most abundant miRNAs in Ago 58 complex in mouse brains, has plenty of noncanonical bulge sites. More recently, an 59 improved CLIP-seq method, CLASH (cross linking, ligation and sequencing of 60 hybrids), revealed around 60% of the seed interactions are noncanonical, containing 61 bulged or mismatched nucleotides [12]. 62 Although these studies strongly suggest the existence of bulge sites, the general 63 features of their interactions with miRNAs are largely unknown, partly due to the 64 difficulty in determining how frequently such atypical sites are used in vivo and what 65 are the general rules to predict them. 66 Here, we analyze a group of highly conserved miRNAs in verterbrate, but with 67 relative fewer conservative target using the seed rule. Meanwhile, these miRNAs all 68 have a common feature, this being that their seed region contains CG dinucleotide 69 (hereafter refer as CG dimer). We found these potential miRNA regulatory sites have 70 a nucleotide bulge compared with a fully complementary sequence. This expands our 71 insight into miRNA-target interaction.

73
MicroRNA containing CG dimer has fewer cononical targets 74 Evolutionary conservation has been widely used to identify miRNA-binding sites 75 together with the seed rule. We searched for the orthologs of all the miRNAs 76 annotated by miRbase (miRbase version 17) [13] using their mature sequence in the 77 genomes of 23 species (Supplement Table 1

87
To uncover the possible bulge site, we allow one nucleotide insertion in every 88 position in the seed region ( Figure 1) for all the verterbrate conservative miRNAs. 89 Using these artificial seed sequences, we find that only the bulge site inserted between 90 CG dimer can increase the target number and conservation of the target sites ( Figure  91 2). In contrast, the random bulge at the target binding site did not increase the 92 conservation rate. 93 Transcriptome-wide evidence for miRNA repression through bulge target site 94 We used human age series mRNA and miRNA expression data [15] to quantify the 95 transcriptome correlation between CG dimer miRNAs and their bulge target. The expression than the background (Wilcox test, p < 0.01) and for miR-191, the bulge 98 target even outperforms the seed target (Wilcox test, p < 0.01) (Figure 3). 99 We also use public data on transcriptome change after over-expression or knock-100 down individual miRNAs from GEO. For miR-126, miR210 and miR-184, all the 101 bulge targets were significantly down-regulated after over-expression (Table 2) and in 102 the case of the knock-down experiment for miR-1204, the bulge target is also much 103 more highly expressed compared with the non-target gene (Wilcox test, p < 0.01). 104 Free energies of CG bulge target duplexes are significantly lower than the 105 random bulges 106 We compared the mimuim free energy (MFE) between the canonical target, CG bulge 107 target and target with random bulges using RNAhybrid [16]. The non-canonical target 108 with a bulge between CG has a significantly lower MFE compared with the target 109 with random bulge (Wilcox test, p < 0.05, Figure 4). 110

111
To allow direct mapping of miRNA-target interactions, we use the CLASH dataset 112 [12] to validate our bulge target for the miRNAs containing CG dimer. Briefly, the 113 RNA molecules present in AGO-associated miRNA-target duplexes were partially 114 hydrolyzed, ligated, reverse transcribed and subjected to illumina sequencing. 115 Compared with the HITS-CLIP and PAR-CLIP dataset, CLASH technology 116 generated a group of reads which contain the miRNAs and their target site sequence 117 together (chimeric reads). In all the six independent CLASH experiments, we find 10 118 CG dimer miRNAs were detected in all the chimeric reads and 8 miRNAs had, in 119 total, 264 chimeric reads containing a bulge nucleotide between the CG dinucleotide 120 target site (Supplementary Table 4). For all miRNAs detected in the CLASH dataset, 121 the non-canonical interactions (G.U pairs, all possible one nt mismatch or bulge; non-122 canonical seed) were about 1.7-fold more than the perfect seed base pairing. But within the CG miRNA, only the bulge targets between CG dimer, which in 124 comparison to randomized sequences, showed strong enrichment among all the 125 interactions ( Figure 5). 126

127
The aim of this study is to identify the general features of miRNAs and their bulge 128 target interactions. We used the non-canonical miRNA target's interactome which 129 contains bulge nucleotide between CpG dinucleotide to test whether a bulge position 130 is random or has specific rules. This sub-class of miRNA was observed to have a few 131 seed targets and these seed targets are evolutionally less conservative, which makes 132 these miRNAs potantially non-canonical target rich. Multistep validation, which 133 included evolutionary, overexpression, correlation and CLASH analysis, supports the 134 reliability that between CpG in the seed region, there is a bulge containing a target The seed sequences for the CG dimer miRNAs were extracted to find three types of defined as a seed target. For the bulge target, we allowed one extra nucleotide to exist 174 between CG dimer. Randomly inserted single nucleotide seed sequences were used as 175 control. The occurences of the homologous target sites in different species were 176 summed up for seed, bulge and control separately as the conservation rates. The miRNA-mRNA interaction sequences was download from the joural's website in 189 the supplementary Data section [12]. The miR-184 seed sequence was used to illustrate a canonical target match, and a 211 non-canonical target match with a bulge nuleotide between the CG dinucleotide. 212

216
Cumulative distribution of correlation coefficient between miR-191 and target 217 expression level. 218   q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q