Thursday, June 25, 2009

Reconstructing non-coding RNAs

We observed that sequence tags (from deep sequencing data sets) in the micro RNA size range cover other types of non-coding RNAs such as tRNA and snoRNA. Using public available data sets we set out to reconstruct full length non-coding RNAs from Drosophila. We started with 12 GEO data sets including 90 experiments and 56M sequence tags of which 11M were unique 6M could be mapped perfectly to the genome yielding 68M hits. These mapped tags were assembled into tag contigs (TC) yielding 0.5M TCs where all tags were on the same strand and disregarding coding and repetitive regions (Fig. 1). For each TC the Tag depth was determined as the maximum number of overlapping sequence tags for each TC indicating the expression level.

Figure 1. Assembly of sequence tags into Tag Contigs.

Inspection of TCs overlapping with annotated non-coding RNAs revealed that it was indeed possible to reconstruct tRNAs as well as both box H/ACA and box C/D snoRNAs. Moreover, plotting the TC length and Tag depth revealed that these non-coding RNAs form well defined clusters (Fig. 2). Testing un-annotated TCs from these clusters by Northern blotting we validated the existence of transcripts of the expected length from from these 8 and 26 previously unrecognized box H/ACA and box C/D snoRNAs, respectively. However, as indicated in grey in Figure 2 a large number of un-annotated TCs indicates the existence of many more non-coding RNAs.

Figure 2. Tag Contigs lengths plotted against Tag depth.

Manuscript abstract is here.