RNASeq Pilot Project

For the last decade, MeV has been used by researchers around the world to explore and visualize high-throughput genomic data with a simple, button-driven user interface. It has brought sophisticated statistical tools to bench-top researchers who would otherwise have had no easy alternatives.

MeV has served us well all this time, but the world of genomics is changing (again) and MEV needs to adapt to those changes. mRNA-Seq technologies are becoming cheaper and more widely used, and there remains a dearth of useful analysis tools that biologists can use to investigate their data.

We intend to address this gap in the current available tools by adapting MeV to make use of next-generation high-throughput data. The required processes are similar: differential expression detection, clustering and network prediction are just as important in next-generation analysis as they have been in the world of DNA microarrays.

The TM4 development team has built a pilot project wherein MeV has been adapted to allow loading and analysis of mRNA-Seq data. This project has shown that it is, indeed, feasible to adjust MEV's data model and processing functions to handle this new data; that the memory footprint is not untenable, and that the existing features so important to microarray data analysis can easily be applied to the richer datasets now available.

The project also includes four new, mRNA-Seq specific modules: one based on the gene list enrichment package GOSeq, and three differential expression analysis packages, based on the R packages DESeq, DGESeq and EdgeR. These modules are built on the same simple user-interface that has made MeV accessible to researchers of all computer literacy levels. They sit alongside the classic modules like K-means clustering, EASE and Bayes Networks.

This project paves the way for future support of other next-generation sequencing data. We will use the lessons we have learned in building it to bring fully-supported next-generation genomic data analysis to future versions of MeV. We soon hope to provide the community the same powerful, graphical tool that has assisted so many in getting the most out of their genomics data.

A few caveats:

  • The pilot project is currently only available for Windows users. Apologies to our Mac and Linux community; we will fully support RNASeq analysis on your platforms in the next full release of MeV. However, our development time will be much shortened by focusing on only one platform at this beginning stage.
  • We do not yet fully support annotation-dependent modules in the pilot. Therefore, the EASE, BN and LM modules have been disabled until that support has been implemented. We will fully support those modules in future MeV releases.

 

Please download the pilot and try it out. Use the Quickstart Guide to get started.

Count to RPKM and vice versa

When either RPKM or Count information is provided MeV calculates the other based on the publication by Mortazavi et al. Nature Methods - 5, 621 - 628 (2008). The supplemntary section describes the approach in detail. Here is the basic formula used: RPKM = Count/Library Size/TranscriptLength*1e+9

Rules and requirements

  • When RPKM is provided, A library Size file is requred. When Count is provided the file is optional and MeV takes the sum of the counts of each sample as the library size.
  • When transcript_length annotation column is left empty in the data file MeV calculates the same from the locus as the diff between start and end BP.