Getting Started with RNA-Seq Analysis

This guide is an introduction to using the new RNASeq functions in MeV. The guide contains a brief tour of the new RNASeq file loader and a demonstration of a few of the new functions we have added specifically to support RNASeq data. The guide will first walk you through loading the data using the new RNA-Seq file loader. Then it will describe using an RNA-Seq-optimized module, EdgeR, to find differentially expressed genes between two groups of samples. Finally, it will demonstrate how to examine these differentially expressed genes for functional themes using the new module GOSeq.

These new options were added in MeV v4.7. If you already have MeV v4.7 installed, you can skip the Setup step and go directly to Loading a Data Set.

Loading an RNA-Seq Data Set

  1. In the Multiple Array Viewer, go to File -> Load Data.
  2. When the window titled Expression File Loader appears, click Select File loader -> RNASeq DGE Files. The RNASeq file loader screen will appear. RNA-Seq file loader: The RNASeq data loader accepts raw count data, RPKM or FPKM, mapped to either ENSEMBL IDs or RefSeq IDs.
  3. Click the Browse button at the upper right side of the screen. In the file browser that appears, navigate to the MeV folder, then open the data/rnaseq folder. Choose the file TagSeqExample.txt. This file contains raw count data. 
  4. Choose the appropriate parameters for each of the drop-down menus at the top of the file loader screen. For the data file we have selected, choose the Data Type Count, the Species Human, the Reference Genome RefSeq, and the UCSC build hg19. Leave Read Length blank.
  5. Click the Load button.





 

AttachmentSize
rnaseq-loader.gif44.68 KB

Differential Expression Detection

Begin your RNASeq analysis by testing for differential expression of all of the unique reads. To do this, we will use a module called edgeR, based on the Empirical Analysis of Digital Gene Expression data in R package written by Mark Robinson.

  1. In the row of colorful buttons across the top of the MultiExperiment Viewer window, EdgeR module selection: The edgeR module can be found in the Statistics drop-down menu.click the one labeled Statistics. Choose Empirical Analysis of Digital Gene Expression data in R (edgeR). An initialization dialog will appear. 
  2. Select the group membership for each of the six samples. Click "Group 1" for the first four samples, and "Group 2" for the remaining two samples.
  3. Leave the default values for the Inference Algorithm and p-value/FDR parameters.
  4. Click Ok. The analysis will run and display the results in the result tree, on the left of the Multiple Array Viewer window.

EdgeR Initialization Dialog: The edgeR initialization dialog.

 

 

 

Differential Expression Results

 

  1. Open up the result node labeled edgeR, and expand the nodes to find one labeled Significant Gene List. Click on this node to select it and display the list of genes found to be differentially expressed between the two sample groups you selected in the previous section. You can click on the links to launch a web browser displaying more information about individual genes. 
  2. EdgeR Output: Results of the edgeR module, showing significantly differentially expressed genes/transcripts. Right-click to reveal a context menu with many powerful options.Right-click on the window in a cell with no links (the Stored Color column is a good bet). Choose Store entire cluster and click Ok to label each of the genes in this window with a color. This color label will be visible anywhere a gene display is shown in MeV - even in the results of other modules.

Examining RNA-Seq Differential expression list for signature themes

Now that we have a list of differentially expressed genes, we can examine it for themes. To do this, we will use the GOSeq module. This module is based on the R package GOSeq, by Matthew Young. It is designed to find enriched gene groups in length-biased data, such as RNASeq data. Compare it to tools like EASE for microarray data.

  1. From the Statistics drop-down menu, choose the item Gene Ontology analysis for GOSeq Initialization Dialog: The GOSeq initialization dialog.RNA-seq.
  2. Click the Cluster Analysis tab at the top of the Initialization Dialog.
  3. Leave the GOSeq parameters Significance Level: Alpha, Number of Permutations and Number of Genes per Transcript Length Bin set at their default values.
  4. You should have a cluster pre-selected in the cluster selector dialog. If you have more than one cluster available in this dialog, choose the one you want to examine for geneset enrichment.
  5. Choose Download from GeneSigDb from the drop-down menu. Click the Download button.
  6. Check that the Choose Annotation Type drop-down menu is set at GENE_SYMBOL.
  7. Leave the File Location field blank.
  8. Click Ok. GOSeq will run.

 

 

Signature theme results

In the Result Tree, you will see a new result node named GOSEQ. GOSeq Output: Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.

  1. Open this node and select the node labeled Results Table. This table contains the complete list of genelists downloaded from the GeneSigDb database, as well as a rating for each list as to wether the contents of that list is enriched in the selected group of differentiated genes used to run GOSeq.
  2. Double-click on the header labeled p-value to sort the list. Those gene lists with low p-values, like Human StemCell_Brendel05_21genes, listed here, are enriched in the set of differentially expressed genes we found in our previous edgeR analysis. You can explore this gene list by going to the GeneSigDb website.

 
Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.

From here, you can continue examining gene signatures of interest by searching the GeneSigDb website, or continue on with another analysis by simply selecting it from one of the drop-down menus. For this pilot, most of the standard MeV modules are available to use. A few of them, like the EASE and GSEA modules, require specific annotation files that are currently only available for DNA micoarray data. Part of the full RNASeq implementation project will be to adapt MEV to fully support RNASeq analysis in all modules. However, that support is not yet available.