This guide is an introduction to using the new RNASeq functions in MeV. The guide contains a brief tour of the new RNASeq file loader and a demonstration of a few of the new functions we have added specifically to support RNASeq data. The guide will first walk you through loading the data using the new RNA-Seq file loader. Then it will describe using an RNA-Seq-optimized module, EdgeR, to find differentially expressed genes between two groups of samples. Finally, it will demonstrate how to examine these differentially expressed genes for functional themes using the new module GOSeq.
These new options were added in MeV v4.7. If you already have MeV v4.7 installed, you can skip the Setup step and go directly to Loading a Data Set.
Differential Expression Detection
Begin your RNASeq analysis by testing for differential expression of all of the unique reads. To do this, we will use a module called edgeR, based on the Empirical Analysis of Digital Gene Expression data in R package written by Mark Robinson.
- In the row of colorful buttons across the top of the MultiExperiment Viewer window, EdgeR module selection: The edgeR module can be found in the Statistics drop-down menu.click the one labeled Statistics. Choose Empirical Analysis of Digital Gene Expression data in R (edgeR). An initialization dialog will appear.
- Select the group membership for each of the six samples. Click "Group
1" for the first four samples, and "Group 2" for the remaining two
samples.
- Leave the default values for the Inference Algorithm and p-value/FDR parameters.
- Click Ok. The analysis will run and display the results in the result tree, on the left of the Multiple Array Viewer window.
EdgeR Initialization Dialog: The edgeR initialization dialog.
Differential
Expression Results
- Open up the result node
labeled edgeR, and expand the nodes to find one labeled Significant Gene List. Click on
this node to select it and display the list of genes found to be
differentially expressed between the two sample groups you selected in the
previous section. You can click on the links to launch a web browser
displaying more information about individual genes.
- EdgeR Output: Results of the edgeR module, showing significantly differentially expressed genes/transcripts. Right-click to reveal a context menu with many powerful options.Right-click on the window
in a cell with no links (the Stored
Color column is a good bet). Choose Store entire cluster and click Ok to label each of the genes
in this window with a color. This color label will be visible anywhere a
gene display is shown in MeV - even in the results of other modules.
Examining RNA-Seq Differential expression list for signature themes
Now that we have a list of differentially expressed genes, we can examine it for themes. To do this, we will use the GOSeq module. This module is based on the R package GOSeq, by Matthew Young. It is designed to find enriched gene groups in length-biased data, such as RNASeq data. Compare it to tools like EASE for microarray data.
- From the Statistics drop-down menu, choose the item Gene Ontology analysis for GOSeq Initialization Dialog: The GOSeq initialization dialog.RNA-seq.
- Click the Cluster Analysis tab at the top of the Initialization Dialog.
- Leave the GOSeq parameters Significance Level: Alpha, Number of
Permutations and Number of Genes per Transcript Length Bin set at their
default values.
- You should have a cluster pre-selected in the cluster selector dialog.
If you have more than one cluster available in this dialog, choose the
one you want to examine for geneset enrichment.
- Choose Download from GeneSigDb from the drop-down menu. Click the Download button.
- Check that the Choose Annotation Type drop-down menu is set at GENE_SYMBOL.
- Leave the File Location field blank.
- Click Ok. GOSeq will run.
Signature theme results
In the Result Tree, you will see a new result node named GOSEQ. GOSeq Output: Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.
- Open this node and select the node labeled Results Table. This table contains the complete list of genelists downloaded from the GeneSigDb database, as well as a rating for each list as to wether the contents of that list is enriched in the selected group of differentiated genes used to run GOSeq.
- Double-click on the header labeled p-value to sort the list. Those gene
lists with low p-values, like Human StemCell_Brendel05_21genes, listed
here, are enriched in the set of differentially expressed genes we found
in our previous edgeR analysis. You can explore this gene list by going
to the GeneSigDb website.
Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.
From here, you can continue examining gene signatures of interest by searching the GeneSigDb website, or continue on with another analysis by simply selecting it from one of the drop-down menus. For this pilot, most of the standard MeV modules are available to use. A few of them, like the EASE and GSEA modules, require specific annotation files that are currently only available for DNA micoarray data. Part of the full RNASeq implementation project will be to adapt MEV to fully support RNASeq analysis in all modules. However, that support is not yet available.