The Friedman Lab Chronicles: Full-length transcriptome assembly from RNA-Seq data without a reference genome

Sunday, May 15, 2011

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Just hot off the press - a new paper appeared in advanced online publication in Nature Biotechnology. This paper describe a new computational pipeline to recover RNA transcript sequence and abundance (aka the "transcriptome") from RNA-seq data without using the genome as a reference.

This paper is a joint work of Moran from our group and Brian Haas and Manfred Grabher, both from the Broad Institute. The three of them (Brian, Manfred and Moran) developed a three-part tool to handle massive number of sequencing reads and assembling them to accurate account of the transcriptome. They also applied a very thorough evaluation of our method and how it compares to most of the state of the art methods in the field. This evaluation by itself is of interest to the this emerging area of analysis.

The basic problem is that RNA-seq returns many (millions) of sequences of different fragements of the original RNA molecules from the sample. To make sense of it, we need to assumble it (like puzzle) into longer pieces. The two general strategies for doing so are nicely illustrated in this figure (from a review by Haas and Zodie):

The "straightforward" approach is to align reads to the reference genome, and then use this mapping to guide reconstruction. The less obvious approach, that we took here, is to first assemble the puzzle, and then map to the genome. This turns out to be often as accurate (or even more), since it is less suceptibles to problems in mapping to the genome, differences between the reference genome and the actual sample, and partial/fragmented references.

Our strategy is based on three steps, each processing the data very efficiently while maintaining information and dealing with sequence errors and rare events.

No comments:

Post a Comment

Welcome

The blog aims to describe the progress of building a new molecular biology lab. I am a computer scientist who slowly shifted into biology until the stage where I decided to start my own experimental lab. My group and I went through many obstacles, some of which I describe in these posts. When I started this blog we moved into a temporary space in which we started doing science. Now we have our permanent lab and can focus more of our time on the science.

In this blog we will try to describe the steps as we go through them, as they represent an exciting time for us.

You can read more about us, and also see a timeline of major events prior to the beginning of this Blog.