The tutorials found here focus on a particular aspect of using MeV.
MeV has supported the Gaggle framework since September 2008 (MeV v4.2). Gaggle is a powerful communications system that allows connects supported programs (geese) to seamlessly transmit data to one another, without the need for intermediate flat files. MeV can use the Gaggle to send and receive data with other systems biology platforms, such as R, Cytoscape, and various web databases.
To start using the Gaggle, you need to connect to the framework. To do this, simply select the Utilities -> Gaggle -> Connect to Gaggle menu at the top of any Multiple Array Viewer (MAV) window. A Gaggle Boss will launch, and the MAV window you used will connect to it. (The Boss may be minimized when it starts.) You should be able to see the Multiple Array Viewer window listed in the Boss window as one of the "listening" Geese. As you launch other Gaggle-enabled programs or open more Multiple Array Viewer windows, you will see them listed here.
You can see whether MeV is connected correctly with a quick test. Open a result viewer from one of MeV's many analysis modules, say, KMC. Choose a cluster, open an Experiment Viewer window (the one with a heatmap) and right click. Choose the menu option Broadcast Namelist to Gaggle. Now look open the Boss window and choose the tab marked Clipboard. You should see a list of the genes that were displayed in the MeV result window. Note that if you have more than one annotation type loaded into MEV, the annotation type that will be broadcast to the Gaggle will be the one currently displayed in the viewer. You can change the displayed annotation type with the Display -> Gene/Row Labels menu option.
MeV will allow broadcasts out of many of its viewers, including the Excel-like Table Viewer, the Expression Graphs and the Centroid Graphs. The Cluster Managers also allow broadcasting. Just click to select the cluster or clusters you want, and right-click to pull up the popup menu with the Gaggle broadcasting options. You can also broadcast data from the viewers in the bottom half of the Cluster Manager.
Each Multiple Array Viewer window is treated as a separate program by the Gaggle Boss. This means that different Multiple Array Viewer windows can broadcast data to each other via the Gaggle. This is an excellent way to move genelists between MAVs.* To do this, launch another Multiple Array Viewer window with the File -> New Multiple Array Viewer menu option in the small, skinny window at the top of your screen. Load a dataset in this window (for the purposes of this tutorial, choose the same dataset that was loaded into the first window). If you look at the Gaggle Boss, you will see that two Multiple Array Viewers are listed in the Gaggle tab. The second Viewer window automatically connected to the Boss, since the first one was already connected.
In the second MAV, a new window will appear, asking you what type of annotation is contained in the incoming broadcast. Choose the correct one from the drop-down menu, and MeV will search through the list of annotations of that type. It will choose only the exact matches, and present a list of those matching genes in another table. You can inspect the results of the match here, and choose whether to store the resulting matches as a cluster.
*It is not recommended to use Gaggle broadcasts to move expression data (matrices) between MAVs. Each Gaggle matrix broadcast can contain only one annotation value per row (gene) in the matrix, whereas MeV may have many different annotation values loaded for each one. MeV provides a method for creating a new MAV and populating it with a complete set of data from a viewer. This method is called Launch New Session and is found in the right-click menu of most viewers. In the Cluster Manager, it is in the popup menu under Open/Launch -> Launch MeV Session.
MeV communicates with online resources through a Firefox plugin named Firegoose. This plugin can be used to send lists of gene names to websites such as DAVID, EGRIN, EMBL String, Entrez Gene, Entrez Protein, Gaggle Web Applications, Halo Annotations, KEGG Pathway database and others. Here we will focus on DAVID and Entrez Gene. You will need to install Firegoose and restart your browser to follow along with this part of the tutorial.
Firegoose does not automatically connect to the Gaggle. To connect, go to the Firegoose toolbar in your web browser an choose GAGGLE -> Connect to Gaggle. In the Gaggle Boss window, both Multiple Array Viewer and Firegoose should be listed as "listening."
DAVID (the Database for Annotation, Visualization and Integrated Discovery) is a useful tool for identifying enriched biological themes, finding enriched fun4ctional-related gene groups, clustering annotation terms, visualizing pathways, and more. (See the DAVID website for more details.)
Choose a list of genes to broadcast to the DAVID website. DAVID accepts the following annotation provided by RESOURCERER: AffyID, Refseq Accession, Genbank Accession, ENTREZ ID, Gene Symbol, Unigene ID. After broadcasting from MeV, you should see the Firegoose toolbar change to show a dataset name in the drop-down menu. Your list of genes is now in Firegoose. In the Target drop-down menu to the right, choose DAVID. The David website should load into your browser window. Now click the Broadcast button. Firegoose will load the list of genes into the DAVID input panel. A new tab will appear with the gene list on the right. Select the Identifier type of the gene list (the gene/row names being viewed in MeV), then click "Submit List."
Entrez Gene is NCBI's database of genes, found here. It provides a gene sumary, genomic regions, transcripts, and products, genomic context, a bibliography, interactions, GO terms, protein information, reference assembly etc. Entrez Gene only accepts 100 genes at a time, and it accepts the following annotation types: Entrez Gene ID, Genbank Accession, Refseq Accession, and Gene Name. If using Entrez Gene IDs, any "NA" values must be trimmed out of the gene list by going to the clipboard in the Gaggle Boss. Delete the "NA"s and the white space left behind, then click "Broadcast" and proceed as usual through Firegoose.
The Genome Browser
A very useful Goose is the ISB Genome Browser, a Java application that displays systems biology data on a genomic coordinate track. If MeV has chromosomal coordinate information loaded into its annotation model, it can broadcast expression data to the Genome Browser. There, the expression data will be displayed as a heatmap along with any other genomic tracks you chose to load. Full documentation for the Genome Browser is at that program's website.
To send expression data to the Genome Browser, right-click on one of MeV's many viewer windows, such as the KMC Experiment Viewer. Choose the popup menu option Broadcast matrix to Genome Browser. Since there is no Genome Browser currently running and connected to the Gaggle Boss, MeV will attempt to launch the Genome Browser via Java WebStart. MeV will also pop up a window explaining to you that you will need to load an appropriate genome into the Genome Browser. Follow the Getting Started instructions at the Genome Browser website to do this. When the genome is loaded, return to MEV's warning window and click OK.
If the Genome Browser receives the broadcast correctly, a window titled "Received Matrix Broadcast" will appear. You should be able to accept all of the defaults in this window. Your data should be in the browser now, though you may need to zoom out quite a bit to see it. Use the slide on the left-hand side of the window to do that.
Those who use the R or Matlab environments can make use of the available Goose plugins as well. A tutorial on these tools is beyond the scope of this document, and we refer you to the R Goose and the MatGoose websites.
Advanced MeV and Gaggle users may want to read the MeV and Gaggle developer documentation, which describes the various metadata values that can be included in both incoming and outgoing broadcasts to the Gaggle.
Bayesian Network (BN) analysis attempts to learn biologically meaningful gene interaction network (Directed Acyclical Graphs – DAGs) from of microarray data. The underlying assumption in this effort lies in the fact that most popular bioinformatics methods are good at identifying genes that discriminate between two or more groups or experimental conditions but often fails to elucidate the underlying mechanism of gene interaction that captures a biological process. This novel method uses seeds leaned from biomedical literature, protein-protein interactions or KEGG interactions or any combination to construct a starting ‘prior’ network. The machine learning algorithm then uses information from the microarray data to learn and refine the network constructed from prior information and predicts a high-confidence network. We also use bootstrap the samples to control for over fitting the network.
The BN module in MeV can be accessed from under the "Miscellaneous" category of algorithms. We provide annotations and support files for all major Affymetrix platform required to run this module and we are constantly adding new arrays and platform to our database.
Bin(State): The biological expression levels of a gene with possible values Over-expressed/Up, Under-expressed/Down, and Unchanged/Neutral.
Perturbation: Changing the state of a gene.
Conditional Probability Table(CPT): A table indicating the likelihood of finding a gene in each of the three bins; over-expressed, under-expressed, and unchanged. This table is dependent on the states of all direct parent genes.
Bin(State) Probability Table: The probability table indicating the chances that a gene will be over-expressed, under-expressed, or unchanged. These values are calculated for each gene and are based on the set of all Conditional Probability Tables for every ancestor gene that lies upstream of a particular gene. Perturbations in the network have an effect on all the downstream BPTs.
As an extension to the already published method, we have also developed a Cytoscape plug-in that does predictive modeling of the BN described above. For each network loaded with an associated conditional probability table (CPT), the plugin calculates probabilities of expression (bin/state probabilities) for each network node. For each node, the CPT is used to determine a specific probability table that is not dependent on its parent node(s). When a network is perturbed using BNPredict, the set of CPTs for the network is updated to reflect the set of conditions applied to the system. From this updated set of CPTs, a new collection of bin probability tables are calculated. By examining the expression patterns and the changes to the expression patterns as a result of perturbations it is possible to predict network reactions to specific stimulus.
In a nutshell, it attempts to predict the state of gene A given the state of its parent Gene B. The possible states a gene can exist in are: up regulated, down regulated or unchanged. Once a network has been leaned we find the conditional probability table (CPT) associated with each node (gene). The CPT of a node constitutes the individual probabilities of the gene being up, down or unchanged given its parent(s) is/are in state up, down, unchanged (any combination of parent and state). Knowing the CPT of any node we can then predict the exact probability of the node being in any one state given any combination of parents and their states.
This method allows researchers to conditionally alter gene expression and predict resulting changes in a biological process based on microarray data which can then be experimentally validated. We hope this novel approach would enable researchers to (a) get a better understanding of the biological interactions that exists in an enriched set of genes and (b) also to predict yet unknown processes and/or interactions. The BN module in MeV can be accessed from under the "Miscellaneous" category of algorithms. We provide annotations and support files for all major Affymetrix platform required to run this module and we are constantly adding new arrays and platform to our database.
The BnPredict plugin displays nodes in colors that contain a great deal of significance. The display may use either a two-color system (default) or a multi-color system. Many find the two-color system to be more understandable and logical, though it discards potentially important information. The multi-color system is more complex as incorporated an additional color-dimension. The two-color system is simpler and may be more visually intuitive to the user. Bin probabilities of over expression and under expression are compared while the neutrally expressed bin is ignored.
If the over expressed bin is more probable than the under expressed bin, the node will appear as a shade of the up-color. The relative dominance of expression (either under or over) will determine the intensity of the expression color. Equal probabilities for the up bin and down bin will result in a white node. The following example might explain the 2-color system.
In this example (subset of the network shown), we can see that the network has been perturbed at gene CCL5 where it has been set to under expressed. A quick visual examination shows that the result of a down regulation of gene CCL5 has an immediate impact on its direct children. Both genes BAD and UBE2L3 can be seen to be significantly more likely to be under expressed when CCL5 is in the down state. As we move downstream in the network we can see that the effect on gene WARS is also towards down-regulation, but the faintness of the color would indicate that there is a significantly reduced response. On the other arm, we see that the grandchild node, ESRRA has a greater relative probability of being over expressed. This might suggest that there is an inhibitory relationship between ESRRA and UBE2L3. The white color of gene NUCB1 indicates that responses have significantly diminished that far down the network. No other genes have been affected by this perturbation in this network. The multi-color system uses a similar approach as the two-color system, but includes the information contained in the neutral bin. The neutral bin's probability affects the color brightness. A more probable bin will result in a darker color. For example, a node which is frequently over-expressed, rarely neutral and rarely under-expressed will appear brighter than a node which is frequently over-expressed, frequently neutral and rarely under-expressed. In this example, the two-color system would treat the two nodes as equal while the multi-color system makes a distinction.
The specific probabilities that the coloring system uses depends on the currently selected display selection. When Probability is selected in the BnAttribute Panel, the coloring scheme will use the raw bin probability tables. When the Probability Change button is selected, each node's bin probability will be compared to the original bin probability. In this case relative changes in bin probability as a result of network perturbations are more easily observed visually.
By clicking the Up-Color or Down-Color buttons, users may select a desirable color scheme. For both coloring systems, the color sensitivity bar allows the user to adjust BnPredict's color sensitivity so that changes can be visually discernible relative to others in the network.
MeV provides a useful management system for evaluating groups of genes and samples. These groups, or clusters, can be generated through a number of expression analysis or annotation based techniques. Once stored, they can be visualized, manipulated and evaluated through the Cluster Manager tool, which is accessible from the MeV Result Tree.
Clusters of interest can be stored to a repository from the basic cluster viewers. Highlight the cluster of interest by clicking it, then in the right click menu select. Once a cluster is stored, the node on the result navigation tree will contain a list of stored clusters. Gene clusters and sample clusters are maintained in separate spreadsheets which are viewable from the Cluster Manager node. When storing a cluster to the repository an input dialog is presented which allows for three user defined fields to be associated with the cluster. Two optional text fields are used to capture a cluster name and a description of the algorithm or interesting features of the cluster. The third user input is a color used to identify genes or experiments which are members of the clusters. These colors can be tracked while performing analyses so that clustering consensus can be established. No two clusters may use the same identifying color.
The cluster tables contain the following columns:
-The Serial Number is a unique number which is sequentially assigned to easily identify a particular cluster.
-The Source field describes whether the cluster source was an algorithm, a cluster operation, or some other means of selecting a group of elements.
-The Algorithm Node or Factor field identifies the algorithm used, if the source was an algorithm, and includes the navigation tree result index (in parentheses).
-Thefield identifies the specific cluster node under the Algorithm Node from which the cluster was stored.
-Theis an optional user defined name for the cluster.
-Thefield can be used to contain details about the process used to create the cluster or specific features of interest in the cluster.
-Thefield shows the number of elements in the cluster.
-Thedisplays the user defined color for the cluster. If you click the color box a screen will show that allows you to change the color if you wish.
-Thecheck box allows you to show or repress the displayed color. This option can be useful when visualizing cluster intersections in viewers. Selecting only one cluster color to view can simplify interpretation.
Users have 6 options for display types and 5 options for the data to display.
Four MeV viewers are accessible from the Cluster Manager:
Five data options are available based on the clusters selected.
The spreadsheet allows single or multiple row selection (by holding down the control key when left clicking the mouse). A right click with one or more rows selected will display a menu that contains several options detailed below. Double-clicking on a cluster will open thedialog.
-Theoption allows the user to modify cluster label, remarks or the cluster color by displaying the input form with the current settings displayed.
-Themenu has two options. will pull up the source cluster viewer. The second option is which opens a new multiple array viewer containing only the data from the selected cluster or the union of the members of several clusters if several clusters are selected.
-Themenu allows for three possible operations to be performed if two or more clusters are selected. combines the members of the selected clusters and stores the resulting cluster on the list. Elements represented in more than one cluster of the input clusters are only represented once in the output cluster. The operation takes the elements from two or more clusters and produces a cluster containing all elements which are common to all clusters. The XOR (exclusive OR) operation produces a cluster containing elements that are members of one cluster or another but not members of more than one cluster.
-Options also exist to delete selected clusters or all clusters in the list as well as to save a selected cluster to a specified file.
Deleting clusters can be performed by selecting a single or multiple clusters in the cluster table or by selecting delete public cluster option from the menu in the viewer which contains the cluster.
You can alsoto a tab-delimited text file. Selecting this option from the right-click menu will cause a file chooser to appear. Select a file name and a place to save row/column data, log ratio expression values, and (optionally) Cy3 and Cy5 values for each gene in the cluster. Selecting will allow you to save the genes in all clusters in a similar way. This option is available from the cluster table as well as in the viewer.
One additional option is the option to delete all gene clusters or sample clusters. These global operations which effect all colored clusters is selected from themenu in the multiple array viewer by selecting or or can be done from the cluster tables.
Theallows one to create a cluster based on supplied identifiers. For example, if you wish to make a cluster out of specific genes that you know are important, you can paste those genes into the dialog box and the cluster will be created out of those specific genes you pick. Identifiers belonging to the cluster are pasted into the text area. The drop down list indicates the type of annotation being loaded. After searching for matches, the List Import Result dialog will be displayed. An intermediate dialog will appear to display the results of the import and to allow you to select a subset of the identified elements before saving the elements as a cluster. After review of identified elements a will be presented so that a cluster name, description, and color can be defined for the new cluster. This dialog also displays a table that contains matching elements. The rows in the table can be selected to remove unwanted entries before hitting the button to store the items to a cluster. The bottom section of the dialog also reports which indices were found and which were not found in the loaded data set.
The List Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.
Import ID Type
This drop down list contains the gene or experiment annotation types in the loaded data set. Select the annotation type that corresponds to the input ID list.
Paste List (Text Area)
Paste the ID list into the text area by left clicking the mouse in the text area and then using the ctrl-v key strokes to paste the identifiers in the list. Reset will clear the selections on the dialog.
Once the dialog is dismissed, if genes or experiments were found in the data set that match the input parameters, a cluster attributes dialog will be presented to collect cluster attributes including the desired cluster color.
The Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.
Select one or more annotation type from the left list and click the '>>>' button to move it to the list of annotations to be used.
The annotations types that you have selected will be used to create clusters. Each unique annotation in each annotation type will be used to create a distinct cluster. Only samples (or genes) that have identical annotations for the specified type will be placed in the same cluster.
The Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.
Import Parameters Using the combobox, select the annotation type that contains the numerical values that you wish to cluster.
Fill in values for the upper and lower limits that are to contain the desired cluster.
The upper and lower limits that you have input will be used to create a cluster. All samples (or genes) with a numerical value for the selected annotation type within the specified limits will be placed in a single cluster.
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
Traditional statistics use adjusted P-values with some arbitrary cutoff, treating genes with slightly different P-values as different entities. Also, small differences in mRNA abundance are often not detected, nor are large changes in just a few genes.
GSEA remedies this by using all the genes in your expression data for the analysis. GSEA also compiles per-gene statistics across genes within a gene set, allowing for the detection of small changes in many genes or large changes in few genes.
The GSEA algorithm implemented in MeV v4.3 is based on Zhen Jiang and Robert Gentleman's 2007 Bioinformatics paper (Jiang, Z., Gentleman, R., (2007). Bioinformatics. 2007 Feb 1; 23(3):306-13. Extensions to gene set enrichment analysis).
The GSEA algorithm can be roughly divided in to three steps:
GSEA uses a set of parameter input dialogs that open sequentially to provide input options
that correspond to each step of the process. The first step of the process is data selection
which lets you assign phenotype/class labels to your samples.
The default assignment (Figure 1)is two groups (factors) with two levels per group. This can be changed to reflect the groups present in your data.
B-ALL and T-ALL are the two levels of this phenotype, so enter 2 in the “Number of levels of factor” textbox. If you have pre selected sample clusters and decide to use the “Cluster Selection” tab just assign group numbers using the Group Assignment drop down as shown in Figure 2.
You can also use the “Button Selection” tab shown in Figure 3 to achieve the same. Group1 and Group2 symbolize B-ALL and T-ALL. You can save these grouping using the “Save settings” button. To load saved groupings, use the “Load settings” button. Reset button will clear all your choices. Once you are done, click the Next > button.
This brings you to the “Parameter Selection” section shown in Figure 4.
Sample Probe Values:
Using Maximum Probe, the substituted values for Samples 1-3 would be 20, 20, 30. Using Median Probe, the values would be 20, 10, 15. With Standard Deviation (SD), the probe that has maximum SD across samples is used, so the values would be 20, 5, 10. NOTE: SD will be calculated by MeV on the fly. You do not have to do anything. This is just an example.
The ‘Browse’ button corresponding to “Select the directory containing your gene sets” lets you choose the directory containing gene set files. You can select the files you want to use from the “Available” panel.
“Selected” panel indicates the gene set files that you chose to use for this analysis. In addition to this, gene sets can also be downloaded from the MIT/Broad website http://www.broad.mit.edu/gsea/msigdb/downloads.jsp
If your gene set file is *.gmt or *.gmx format, “Select the identifier used to annotate genes in your selected gene set” drop down is automatically populated with “GENE_SYMBOL” as shown in figure above. In case of a custom gene set file, you must manually choose the gene identifier from the drop down.
The “Load Annotation Data” panel lets you upload annotations. Annotations are a MUST for running GSEA. Details on how to load annotations is described in Using the Annotation Feature in MeV manual.
The last step is to hit the Execute button. GSEA outputs besides the standard MeV viewers three new viewers namely “Test Statistic Graph”, “Leading Edge Graph Viewer” and “Geneset p-value graph”. “Significant Gene Sets” under “Table Views” lists gene sets sorted by their Over enriched (upper p values). Lower p values are the probability of seeing a test statistic lower than the observed one. Upper p values are the probability of seeing a test statistic higher than the observed one.
Right Clicking on the rows in the table as shown in Figure 5 lets you navigate to different viewers. "Excluded Gene Sets" table contains gene sets which do not pass the minimum genes per gene set criteria. These gene sets are not included in the analysis. "Probe to Gene Mapping" table shows all the probes mapping to a gene.
“Test Statistic Graph” shown in Figure 6 aims to show how genes within a gene set contribute to the overall gene-set-level metric. This metric is computed by summing the distance from the green line to the orange point and then normalizing this sum by the square root of the number of genes in the gene set.
“Leading Edge Graph” in Figure 7 shows which subset of genes within the gene set is contributing to the significance of the gene set level metric. The leading edge subset is calculated by first ranking the genes based on largest to smallest test statistics.
We then calculate the Jiang-Gentleman statistic for subsets of the gene set, starting with the first subset containing the gene with the largest t-statistic, and then incrementing the subset to include the next gene with the next largest t-statistic.
We iterate through until the final subset contains all the genes in the gene set. The subset which maximizes the Jiang-Gentleman statistic suggests that this group of genes contribute the most to the gene-set level metric.
This guide is an introduction to using the new RNASeq functions in MeV. The guide contains a brief tour of the new RNASeq file loader and a demonstration of a few of the new functions we have added specifically to support RNASeq data. The guide will first walk you through loading the data using the new RNA-Seq file loader. Then it will describe using an RNA-Seq-optimized module, EdgeR, to find differentially expressed genes between two groups of samples. Finally, it will demonstrate how to examine these differentially expressed genes for functional themes using the new module GOSeq.
These new options were added in MeV v4.7. If you already have MeV v4.7 installed, you can skip the Setup step and go directly to Loading a Data Set.
Begin your RNASeq analysis by testing for differential expression of all of the unique reads. To do this, we will use a module called edgeR, based on the Empirical Analysis of Digital Gene Expression data in R package written by Mark Robinson.
Differential Expression Results
Now that we have a list of differentially expressed genes, we can examine it for themes. To do this, we will use the GOSeq module. This module is based on the R package GOSeq, by Matthew Young. It is designed to find enriched gene groups in length-biased data, such as RNASeq data. Compare it to tools like EASE for microarray data.
Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.
From here, you can continue examining gene signatures of interest by searching the GeneSigDb website, or continue on with another analysis by simply selecting it from one of the drop-down menus. For this pilot, most of the standard MeV modules are available to use. A few of them, like the EASE and GSEA modules, require specific annotation files that are currently only available for DNA micoarray data. Part of the full RNASeq implementation project will be to adapt MEV to fully support RNASeq analysis in all modules. However, that support is not yet available.
Nested EASE (nEASE) is an extension of EASE. The nEASE algorithm includes a second, sub-level, iterative Fisher’s Exact Test on significantly enriched GO terms identified in a first-level EASE analysis. This sub-classification approach provides increased sensitivity for detecting enriched GO terms and thus affords a deeper understanding of possible mechanisms underlying a given condition under study. nEASE was added to MeV as a new feature for version 4.5.
Begin this tutorial after installing MeV.
Launch MeV by double-clicking on the TMEV.bat file (Windows), the tmev.sh file (Linux) or the MeV application icon (Mac OSX).
When you launch MeV, two windows will open. The small narrow window across the top of the screen is called the MeV main menubar. This window is used normally to open new MultipleArrayViewer windows and manage other MeV properties. We will not be using this menubar window for the purposes of this tutorial. The larger window that opens is called a MultipleArrayViewer (MAV). This is where the majority of our work will take place.
Download and unzip the file nease_example_files.zip. This file contains the expression data and supporting GO term files we will use to replicate the analysis presented in the manuscript.
Choose File-> Open Analysis from the MAV window. In the file chooser that opens, navigate to the folder where you unzipped nease_example_files.zip, and select the file ER_status_SAM_1_Miller.anl and choose Open. A saved analysis will be loaded into MeV. This may take some time.
These data are fully described in the manuscript. They are based on the data from Miller et al. (2005) and Minn et al. (2005).
After loading the analysis file, click the Meta-Analysis drop-down menu, and select EASE Cluster Analysis. An initialization dialog will appear.
Step 1: Selecting the EASE file system
To manually load an EASE file system click the button marked Custom in the EASE Annotation Analysis dialog which will bring up the EASE Advanced Parameters dialog, then click the Browse button. Proceed to navigate to the folder you previously downloaded, nease_example_files. Inside it, open the folder named data\ease\affy_HT_HG-U133A_EASE. You should see three folders inside, Data, Enhance and Lists. Do not select any of these folders. Simply click Open.
Make sure the option Select Background Population from File is selected in the EASE Advanced Parameter dialog. Click the button labeled File Browser in the Population panel and select the file Lists/affy_HT_HG-U133A/Populations/ProbesetIDs.txt.
Step 3: Annotation Parameter Selection
From the drop-down menu labeled Annotation Key in the MeV Annotation Key panel, select the heading PROBE_ID. Click the checkbox labeled Use Annotation Converter. Click the button labeled File Browser. The file selection dialog that opens should already be set to the correct directory. (If it is not, navigate to nease_example_files\affy_HT_HG-U133A_EASE\Data\Convert.) Select the file affy_HT_HG-U133A_ProbesetIDs.txt and click Open. Under the heading Gene Annotation /Gene Ontology Linking Files, click the Add Files button. Again, the resulting file browser should already be displaying a list of the available GO term files. (If not, navigate to nease_example_files\affy_HT_HG-U133A_EASE \Data\Class directory within the MeV folder.) Hold down the control key and click on the files GO Biological Process.txt, GO Molecular Function.txt and GO Cellular Component.txt to select them. When the three files are selected, click Open.
Click the OK button to return the the main EASE dialog.
Click to select the check box next to Run Nested EASE, near the bottom of the EASE Annotation Analysis dialog. Now click the Ok button to run EASE and Nested EASE. This will take some time.
The standard EASE analysis, as described in Hosack, et al. (2003), will run, followed by the Nested EASE analysis. The nEASE results are included as part of the standard EASE results, as a subnode on the result tree, labeled Nested EASE. Double-click the result node labeled EASE Analysis to see the nested ease results appear. We recommend that you expand the size of the window containing the Result Tree by clicking and dragging on the dividing bar between the Result Tree and the larger viewing window on the right. Click on the result nodes to explore the result data within.
The most useful view is found in a node labeled Nested EASE Summary Table, which contains the summarized data for all of the nEASE results. These are the data reported in the manuscript. To examine the genes which drive a result row in the nEASE Summary Table, left-click to select a row, and right-click to open up a context-sensitive menu. Choose Open Viewer -> Expression Image. MeV will open a heatmap view of the probe expression values that correspond to the genes that were members of that group.
All of the usual heatmap-related options are available in this window, including adjusting the color display, changing the gene annotation displayed, and storing the displayed genes in a new cluster. Note that the left-hand panel, the Result Tree, does not obviously change when this heatmap is opened. However, scrolling the Result Tree down will reveal that a new result node has been selected, corresponding to the Nested EASE result that was selected from the Summary Table.
References D. A. Hosack, G. Dennis, B. T. Sherman, H. C. Lane, R. A. Lempicki, Genome Biol 4 (2003). A. J. Minn, et al. , Nature 436, 518. (2005) L. D. Miller, et al. , Proceedings of the National Academy of Sciences 102, 13550 (2005).