Wednesday, July 30, 2008

Non-coding RNA

Currently I am working as a Postdoctoral Fellow at Professor John Matticks group at the Institute of Molecular Bioscience, University of Queensland where I am once again engaged in the world of RNA. And the world of RNA is getting bigger and bigger for each day as it turns out that the eukaryotic cell (as well as the prokaryotic) is not a protein machine, but an RNA machine. It is becoming evident that introns and untranslated regions of mRNA harbor non-coding RNA. Moreover, the so-called "junk DNA" in intergenic regions are full of RNA encoding regions, repeats all of sudden express rasiRNA and piRNAs, annotated mRNAs turn out to be wrong and are in fact long non-coding RNAs, and some mRNAs have a non-coding functional state prior to translation.

non-coding RNA discovery

Addressing this novel world of RNA is tricky because many function as digital interfaces where function arises from basepairing and not from a catalytic three dimensional structure of the RNA moity. Such digital RNAs may be short as the miRNAs and their basepairing interaction may be imperfect, which renders analysis based on primary structure difficult. Not only is it difficult to identify the encoding loci in the genome, the targets are just as elusive. Prediction of secondary structure is unreliable and only assists in identifying stem containing RNAs such as pre-miRNA while e.g. piRNA do not show any stem forming potential. So in order to discover novel non-coding RNAs or novel classes of non-coding RNAs, it is necessary to create a platform that allows integration with comparative genomics, motif searches, expression data and deep sequencing data, etc. The UCSC genome browser can be used as such a platform, and the Institute of Molecular biology hosts a mirror which we have direct access to for fast generation of custom tracks and tweaking of the source code. I have used the comparative PhastCons data to locate and cluster distinct conservation profiles, where some are specific to miRNA and others to snoRNA while some are putative non-coding RNAs. Next step will be to evaluate these putative RNAs with expression data from deep sequencing. Because our group is part of the RIKEN consortium, we have access to very large amount of sequence data along with the public available data that is incoming at an impresive rate. I use suffix array software for fast sequence mapping (I mapped 60 mio tags to the human genome in one hour) which covers the conserved blocks with tiled tags that can be scored to give an idea of uniqueness and expression. Finally, probes can be designed for DNA arrays.

Below is an image from the UCSC genome browser showing a miRNA loci. From this it is evident that not only the miRNA is expressed from the stem indicated by the RNAfold track, but also the star sequences.

No comments: