Documentation

The MeV development team has assembled several tutorials that detail the use of various aspects of MeV.

Quickstart Guide

This guide is meant to be a quick reference for those who are just beginning to install and use MeV. Any questions that are not answered in this guide are most likely covered in the complete manual.

Don't have Java Installed?

It seems you don't have Java installed on your computer. MeV requires the Java Runtime Environment (JRE) to function. This is easily fixed. Java is free and easy to install on your computer. Just click the link below and follow the instructions.

 

Download and install Java.

 

We also recommend you install Java 3D, to get the most out of MeV.

After you have installed Java, just double-click MeV.exe to launch MeV.

 

 

Installing MeV

  1. First make sure that Java is properly installed on your computer (not to be confused with Javascript, which is not the same). You will need to have Java v1.6 or higher for a Windows PC/Linux and v1.5 or higher for Mac OSX. Go to http://java.com/ to get the latest version. Certain MeV modules also require Java 3D, which can be found here: http://java.sun.com/products/java-media/3D/download.html
  2. If you are reading this guide offline, go to mev.tm4.org. If you are reading this guide on the MeV website, look to the right hand side of the screen and click the “Download” link.
  3. A screen should pop up that asks what you want to do with the files. Whether using a PC, Mac, or Linux, you want open the files and download them to your computer. Then save the files to your computer if it does not automatically do so.
  4. Once downloaded, open the folder and unzip all files.
  5. To best access the folder again, copy the MeV_4_X folder onto a location on your hard drive. Make sure the directory does not contain any spaces in the location path. The desktop is not a good place to put the unzipped Mev folder as it expands to system path with spaces in it. Putting it directly under a drive, like C:\ might be a good idea.
  6. Open the MeV_4_X folder. If using Windows, double click the file called TMEV/tmev.bat to run the program. For Linux or Unix users, navigate to the MeV_4_X folder in a terminal, and make tmev.sh executable by typing “chmod u+x tmev.sh” then “./tmev.sh”. MeV can hereafter be run by double-clicking the tmev.sh file. For Macintosh users, open the TMEV_MacOSX_4_X file.

When MeV starts up, two windows will appear.When MeV starts up, two windows will appear.

Installation Path

Users are strongly discouraged not to install MeV under a directory which has spaces in the location name. E.g. of such directories on a Windows machine would be ‘C:\Documents and Settings’, ‘C:\Program Files’ etc. Some modules will not work if MeV is installed in any such directory.

Loading a Data Set

  1. In the Multiple Array Viewer, go to File -> Load Data. (There is also a Single Array Viewer that can only open one set of data at a time, but only the Multiple allows you to view many samples together. The real power of MeV is in the program’s analysis modules, found only in the Multiple Array Viewer. That is where the clustering and visualization of data can take place. Therefore, the remainder of this guide will focus on the Multiple Array Viewer. Also, In the Multiple Experiment Viewer, you can go to File ? New Multiple Array Viewer and have more than one experiment window open at a time.)
  2. When the Expression File Loader comes up, click Browse at the upper right side of the screen. Tab-delimited file loaderTab-delimited file loader
  3. To choose one of the data sets that were downloaded with the program, choose Desktop from the drop down menu at the top. Open the MeV_4_2 folder, then open the “data” folder. If you have your own data set, find that file on your computer instead.
  4. Choose a TDMS (Tab Delimited, Multiple Sample) file that ends with *.txt and click Open. (The full MeV manual has a complete list and description of different file formats. The link to the manual can be found at http://tm4.org/mev.html under the heading Appendix: File Format Descriptions.)
  5. When do I use Annotation? Annotation gives you much more specific information about the gene, which could be more useful down the line.
    1. To use the automatic Annotation loader, you must be using an Affymetrix dataset. If so, make sure to click the Affymetrix Array radio button. (The program automatically chooses the bubble to the left of Affymetrix Array, the Spotted DNA/cDNA Array or Other Array type, and both cannot be clicked at the same time.) TDMS loader with data ready to loadTDMS loader with data ready to load
    2. Note that in the sample data, the Affymetrix files have their own folder within the data folder. These have already been normalized. Again, for descriptions on other types of files please see the MeV Manual.
    3. If you want to use the automatic annotation downloader, you must be connected to the Internet. Click the Connect button. A new screen will pop up with two drop down menus. From the top one, choose the species you are working with, which in the example case is human. Then in the bottom drop down menu choose the correct annotation file, which is the type of chip used in the experiment. This file will begin with “affy” and is found a little further down in the menu. Then click Connect.
    4. Once the Annotation files are downloaded, click OK to return to the previous screen.
    5. If you want to use an annotation file from the data folder or from your computer instead, click Browse within the Annotation box and choose the file you want. Then click Open.
  6. Click the upper-leftmost expression value. That is, under the Expression Table, click the first box that contains a quantitative value. These include columns that contain numerical values such as sample data, but not columns that are labels, names, or other such unquantifiable values. (Note: There may be columns that have 0 or 1 in them to identify data, but are not expression values.)
  7. Click Load at the bottom of the screen.
  8. A diagram should come up that looks like lots of tiny green and red boxes. This is the Heatmap View. Each row represents a specific gene, whereas each column represents each sample, or experiment. The lightest green boxes are the most underexpressed genes, the brightest red being the most overexpressed genes. MeV expects that each sample loaded will have the same number of elements, in the same order, and that each gene (or spot) is aligned with that element in every other sample loaded. For example, using that rule, all input files will have data for gene x in row y. Clicking on a rectangle displays a dialog with detailed information about that spot.
  9. The MeV toolbar on the left hand side should have 5 options. This is called the Navigation Tree. Any heading that has a node to the left of it means that if you double click it, the information within that heading will appear below it. Click Main View to return to the original Heatmap View at any time.
  10. To find the basic information for the data set, double click Analysis Results ? Data Source Selection Information. This will tell you how many genes are in the data set and how many trials of the experiment were performed with each gene.
  11. Under History, the History Log will show you which tests were done on the data set and when.

Filtering your data

  1. Why filter? Many data sets contain tens to hundreds of thousands of genes, so filtering out the ones that contain little biological information makes doing analyses much easier. If the data set is not large, then you may not need to filter it.
  2. Go to the main menu on top and click Adjust Data ? Data Filters ? Variance Filters. Put in the percentage that you wish to filter out. We will use 50% here as a default, but you can choose a lower percentage to filter out even more genes.
  3. The filtered data can be seen by using the navigation tree on the left. It is under Analysis Results -> Data Filter -> Expression Image. If you go further and look under parameters, it will tell you the type of filter that was used and the percentage filtered out.
From now on, all tests done on the sample will be done on the filtered data. The algorithm toolbar on top, where the icons say Clustering, Statistics,… etc, is where all the main modules are found. The same modules can also be found under Analysis in the main toolbar above it.

Clustering

Clustering basically groups similar samples or genes together. This way it makes it much easier to test the data, since you can see which groups are more alike.

Clustering with the Hierarchical Clustering Module (HCL)

  1. HCL is a commonly used clustering algorithm. From the top left click the Clustering drop down menu. Choose the first option, Hierarchical Clustering. NOTE: The downside to this test is that it can take a long time to perform if there are a lot of genes, and if there are too many an error message may appear. 10,000 genes is a standard cutoff point, but even with that many genes it may take up to 20 minutes. If you just want to look at the similarities among samples and not genes, simply uncheck the box at the top that says Gene Tree. If you do have a lot of genes and want to group them into similar groups, the k-means cluster in the next section may be more useful
  2. Click OK at the bottom right.
  3. To view the clusters, find and double click HCL on the navigation tree on the left.
  4. You can store the clusters by clicking on the groups that you want together. That is, click the line that connects a certain group of genes. That group will then be highlighted. Right click on the line and choose Store Cluster.
  5. The program should automatically choose a new color for each cluster so that you can easily view and distinguish them.
  6. If you right click in the gene tree and select Gene Tree Properties, you can get much more detailed results.
  7. Under Distance Threshold Adjustment, you can adjust how many clusters will be created. The lower the distance range, the more clusters there will be since a shorter distance between similar genes means more groups. You will see a light gray triangular ‘wedges’ along the side as this distance threshold changes. Either use the slide or type in a value for the distance threshold until you get the desired number of clusters.
  8. If you click the box on the right Create Cluster Viewers, you will get different graphs and charts for each cluster created. These can be very useful visualization tools as well being able to look closer at each cluster.
  9. Click OK.
  10. If you did not get the desired number of clusters, you can always go back. Also, if you navigation tree gets too cluttered, simply right click the item your wish to delete and click Delete.
  11. You can view each individual cluster in the navigation tree on the left. You can either view different graphs of an individual cluster, or choose All Clusters and see all of them on one screen.
  12. The Expression images are the red and green rectangles like that of the main view. The Centroid Graphs show the means and ranges of each sample. The Expression Graphs give a more detailed representation of the gene behavior. Each individual line represents one gene, and the pink line in the middle shows the average expression of all the genes in each sample.
  13. If you wish to do an analysis on just one particular cluster (such as a t-test, which will be discussed in the Modules section), you can right click on the individual cluster and choose Launch New Session. This will open a new window with just the particular cluster you wish to analyze.
  14. If you double click Cluster Manager in the navigation tree, it will show you the list of all the clusters you made.

Clustering with K-Means clustering Module (KMC)

K-Mean clustering is useful for large data sets, as it does not use nearly as much memory as HCL. This module can also be found in the Clustering drop down menu. The benefits of this cluster are that it is much faster than the HCL and that you can choose exactly how many clusters you want.
  1. Select Clustering -> k-Means/Medians Clustering from the drop down menu.
  2. Choose whether you want to cluster the genes or the samples.
  3. Choose how many clusters you wish to form. Here we will use 10, but you can always go back and select fewer (or more) if those results do not give you the information you want.
  4. You can view the clusters in the navigation tree on the left just as you could for the HCL.
For more information on clustering, there is a more detailed section in the MeV Manual called Working with Clusters.

Modules

What do these do? They test for differences, patterns, and other such relationships among genes in the samples.

T-test Module

A t-test is the basic test to look at the differences among groups, if any, and how significant those differences are. In other words, it takes the means of two groups and determines if the difference in those means is significant.
  1. Below the main drop down menu, click the icon that says Statistics, and then choose T-tests.
  2. You first want to make sure you’re choosing a 2 sample t-test, since the screen will automatically come up with a one sample. The difference is that you want to test for the differences among two groups, such as cancerous cells versus normal ones, to try and find the significant differences between the two groups. Do this by choosing the Between Groups tab at the top of the t-test screen.
  3. Select which samples belong to which group. For example, all the columns that are labeled as cancerous, put them in Group A. All the normal samples, click the bubble next to Group B. If you have other groups that you don’t want to test against each other, you can mark them as Neither group.
  4. You also want to check Assume equal variance on the right. This means that the variance within all of the samples is the same, since the data has all been normalized.
  5. Note that the T-test box is rather large and the OK button may not automatically appear at the bottom of the page. This is merely a technical problem. In order to fix that, move the window down by clicking the bar on the top that says TTEST: T-test and dragging it. Shrink it down using the arrows at the edge of the window. Once shrunken, move the window back up and the OK button should appear on the bottom right. Click it to run the test.
  6. The result to the T-test is in the navigation tree along with the filtering and clustering results.

User Manual

 
The latest version of the MeV user manual is kept on the MeV website. A version of the manual is also included in your MeV download. Please check that you are referring to the correct version.
 
MeV User Manual
 
 

Tutorials

The tutorials found here focus on a particular aspect of using MeV.

MeV and Gaggle

MeV has supported the Gaggle framework since September 2008 (MeV v4.2). Gaggle is a powerful communications system that allows connects supported programs (geese) to seamlessly transmit data to one another, without the need for intermediate flat files. MeV can use the Gaggle to send and receive data with other systems biology platforms, such as R, Cytoscape, and various web databases.

 

 

The Gaggle Boss listing one Multiple Array Viewer as a Goose.The Gaggle Boss listing one Multiple Array Viewer as a Goose.Connecting to the Gaggle

To start using the Gaggle, you need to connect to the framework. To do this, simply select the Utilities -> Gaggle -> Connect to Gaggle menu at the top of any Multiple Array Viewer (MAV) window. A Gaggle Boss will launch, and the MAV window you used will connect to it. (The Boss may be minimized when it starts.) You should be able to see the Multiple Array Viewer window listed in the Boss window as one of the "listening" Geese. As you launch other Gaggle-enabled programs or open more Multiple Array Viewer windows, you will see them listed here.

 

Broadcasting a namelist from an Experiment ViewerBroadcasting a namelist from an Experiment ViewerA simple broadcast

You can see whether MeV is connected correctly with a quick test. Open a result viewer from one of MeV's many analysis modules, say, KMC. Choose a cluster, open an Experiment Viewer window (the one with a heatmap) and right click. Choose the menu option Broadcast Namelist to Gaggle. Now look open the Boss window and choose the tab marked Clipboard. You should see a list of the genes that were displayed in the MeV result window. Note that if you have more than one annotation type loaded into MEV, the annotation type that will be broadcast to the Gaggle will be the one currently displayed in the viewer. You can change the displayed annotation type with the Display -> Gene/Row Labels menu option.

The Cluster Manager allows many broadcast options.The Cluster Manager allows many broadcast options.MeV will allow broadcasts out of many of its viewers, including the Excel-like Table Viewer, the Expression Graphs and the Centroid Graphs. The Cluster Managers also allow broadcasting. Just click to select the cluster or clusters you want, and right-click to pull up the popup menu with the Gaggle broadcasting options. You can also broadcast data from the viewers in the bottom half of the Cluster Manager.

 

Transferring gene lists between Multiple Array Viewers

Each Multiple Array Viewer window is treated as a separate program by the Gaggle Boss. This means that different Multiple Array Viewer windows can broadcast data to each other via the Gaggle. This is an excellent way to move genelists between MAVs.* To do this, launch another Multiple Array Viewer window with the File -> New Multiple Array Viewer menu option in the small, skinny window at the top of your screen. Load a dataset in this window (for the purposes of this tutorial, choose the same dataset that was loaded into the first window). If you look at the Gaggle Boss, you will see that two Multiple Array Viewers are listed in the Gaggle tab. The second Viewer window automatically connected to the Boss, since the first one was already connected.

Choose from the drop-down the type of annotation being sent.Choose from the drop-down the type of annotation being sent.Now return to your first MAV window and choose a result viewer. Choose Broadcast Gene List to Gaggle from the popup menu.

In the second MAV, a new window will appear, asking you what type of annotation is contained in the incoming broadcast. Choose the correct one from the drop-down menu, and MeV will search through the list of annotations of that type. It will choose only the exact matches, and present a list of those matching genes in another table. You can inspect the results of the match here, and choose whether to store the resulting matches as a cluster.

 

*It is not recommended to use Gaggle broadcasts to move expression data (matrices) between MAVs. Each Gaggle matrix broadcast can contain only one annotation value per row (gene) in the matrix, whereas MeV may have many different annotation values loaded for each one. MeV provides a method for creating a new MAV and populating it with a complete set of data from a viewer. This method is called Launch New Session and is found in the right-click menu of most viewers. In the Cluster Manager, it is in the popup menu under Open/Launch -> Launch MeV Session.

Firegoose and Web Resources

MeV communicates with online resources through a Firefox plugin named Firegoose. This plugin can be used to send lists of gene names to websites such as DAVID, EGRIN, EMBL String, Entrez Gene, Entrez Protein, Gaggle Web Applications, Halo Annotations, KEGG Pathway database and others. Here we will focus on DAVID and Entrez Gene. You will need to install Firegoose and restart your browser to follow along with this part of the tutorial.

Firegoose does not automatically connect to the Gaggle. To connect, go to the Firegoose toolbar in your web browser an choose GAGGLE -> Connect to Gaggle. In the Gaggle Boss window, both Multiple Array Viewer and Firegoose should be listed as "listening."

 

DAVID (the Database for Annotation, Visualization and Integrated Discovery) is a useful tool for identifying enriched biological themes, finding enriched fun4ctional-related gene groups, clustering annotation terms, visualizing pathways, and more. (See the DAVID website for more details.)

Choose a list of genes to broadcast to the DAVID website. DAVID accepts the following annotation provided by RESOURCERER: AffyID, Refseq Accession, Genbank Accession, ENTREZ ID, Gene Symbol, Unigene ID. After broadcasting from MeV, you should see the Firegoose toolbar change to show a dataset name in the drop-down menu. Your list of genes is now in Firegoose. In the Target drop-down menu to the right, choose DAVID. The David website should load into your browser window. Now click the Broadcast button. Firegoose will load the list of genes into the DAVID input panel. A new tab will appear with the gene list on the right. Select the Identifier type of the gene list (the gene/row names being viewed in MeV), then click "Submit List."

Entrez Gene is NCBI's database of genes, found here. It provides a gene sumary, genomic regions, transcripts, and products, genomic context, a bibliography, interactions, GO terms, protein information, reference assembly etc. Entrez Gene only accepts 100 genes at a time, and it accepts the following annotation types: Entrez Gene ID, Genbank Accession, Refseq Accession, and Gene Name. If using Entrez Gene IDs, any "NA" values must be trimmed out of the gene list by going to the clipboard in the Gaggle Boss. Delete the "NA"s and the white space left behind, then click "Broadcast" and proceed as usual through Firegoose.

 

The Genome Browser

A very useful Goose is the ISB Genome Browser, a Java application that displays systems biology data on a genomic coordinate track. If MeV has chromosomal coordinate information loaded into its annotation model, it can broadcast expression data to the Genome Browser. There, the expression data will be displayed as a heatmap along with any other genomic tracks you chose to load. Full documentation for the Genome Browser is at that program's website.

MeV waits patiently while the Genome Browser starts up and gets a genome loaded.MeV waits patiently while the Genome Browser starts up and gets a genome loaded.To send expression data to the Genome Browser, right-click on one of MeV's many viewer windows, such as the KMC Experiment Viewer. Choose the popup menu option Broadcast matrix to Genome Browser. Since there is no Genome Browser currently running and connected to the Gaggle Boss, MeV will attempt to launch the Genome Browser via Java WebStart. MeV will also pop up a window explaining to you that you will need to load an appropriate genome into the Genome Browser. Follow the Getting Started instructions at the Genome Browser website to do this. When the genome is loaded, return to MEV's warning window and click OK.

 

Expression data displayed on a genomic mapExpression data displayed on a genomic mapIf the Genome Browser receives the broadcast correctly, a window titled "Received Matrix Broadcast" will appear. You should be able to accept all of the defaults in this window. Your data should be in the browser now, though you may need to zoom out quite a bit to see it. Use the slide on the left-hand side of the window to do that.

 

 

 

Advanced Gaggle Use

Those who use the R or Matlab environments can make use of the available Goose plugins as well. A tutorial on these tools is beyond the scope of this document, and we refer you to the R Goose and the MatGoose websites.

Advanced MeV and Gaggle users may want to read the MeV and Gaggle developer documentation, which describes the various metadata values that can be included in both incoming and outgoing broadcasts to the Gaggle.

BNPredict Plugin

By MeV Bn Team (Raktim, Dan)

BNPredict PluginBNPredict Plugin

Bayesian Network (BN) analysis attempts to learn biologically meaningful gene interaction network (Directed Acyclical Graphs – DAGs) from of microarray data. The underlying assumption in this effort lies in the fact that most popular bioinformatics methods are good at identifying genes that discriminate between two or more groups or experimental conditions but often fails to elucidate the underlying mechanism of gene interaction that captures a biological process. This novel method uses seeds leaned from biomedical literature, protein-protein interactions or KEGG interactions or any combination to construct a starting ‘prior’ network. The machine learning algorithm then uses information from the microarray data to learn and refine the network constructed from prior information and predicts a high-confidence network. We also use bootstrap the samples to control for over fitting the network.

The BN module in MeV can be accessed from under the "Miscellaneous" category of algorithms. We provide annotations and support files for all major Affymetrix platform required to run this module and we are constantly adding new arrays and platform to our database.

Definitions

Bin(State): The biological expression levels of a gene with possible values Over-expressed/Up, Under-expressed/Down, and Unchanged/Neutral.

Perturbation: Changing the state of a gene.

Conditional Probability Table(CPT): A table indicating the likelihood of finding a gene in each of the three bins; over-expressed, under-expressed, and unchanged. This table is dependent on the states of all direct parent genes.

Bin(State) Probability Table: The probability table indicating the chances that a gene will be over-expressed, under-expressed, or unchanged. These values are calculated for each gene and are based on the set of all Conditional Probability Tables for every ancestor gene that lies upstream of a particular gene. Perturbations in the network have an effect on all the downstream BPTs.

Cytoscape Plugin: BnPredict

As an extension to the already published method, we have also developed a Cytoscape plug-in that does predictive modeling of the BN described above. For each network loaded with an associated conditional probability table (CPT), the plugin calculates probabilities of expression (bin/state probabilities) for each network node. For each node, the CPT is used to determine a specific probability table that is not dependent on its parent node(s). When a network is perturbed using BNPredict, the set of CPTs for the network is updated to reflect the set of conditions applied to the system. From this updated set of CPTs, a new collection of bin probability tables are calculated. By examining the expression patterns and the changes to the expression patterns as a result of perturbations it is possible to predict network reactions to specific stimulus.
In a nutshell, it attempts to predict the state of gene A given the state of its parent Gene B. The possible states a gene can exist in are: up regulated, down regulated or unchanged. Once a network has been leaned we find the conditional probability table (CPT) associated with each node (gene). The CPT of a node constitutes the individual probabilities of the gene being up, down or unchanged given its parent(s) is/are in state up, down, unchanged (any combination of parent and state). Knowing the CPT of any node we can then predict the exact probability of the node being in any one state given any combination of parents and their states.

This method allows researchers to conditionally alter gene expression and predict resulting changes in a biological process based on microarray data which can then be experimentally validated. We hope this novel approach would enable researchers to (a) get a better understanding of the biological interactions that exists in an enriched set of genes and (b) also to predict yet unknown processes and/or interactions. The BN module in MeV can be accessed from under the "Miscellaneous" category of algorithms. We provide annotations and support files for all major Affymetrix platform required to run this module and we are constantly adding new arrays and platform to our database.

CPT Panel

Interactive but view Only panel. This panel allows user to navigate the raw Conditional probability tables associated with a node given its parent(s). They cannot be edited. The main components of the panel and a short description follows.
  1. Displays the Cytoscape ID of the selected node
  2. Displays the Cytoscape IDs of 0 or more parents of the selected node.
  3. A set of 3 radio buttons for each parent denoting parent states - Up, Neutral or Down. User can select any state for any parent to see the corresponding CPT for its state of being Up, Neutral or Down.
  4. Displays the Probability of the selected node being Up, Neutral or Down given its Parent(s) being Up, Neutral, Down or any combination thereof
  5. Displays a textual probability statement based of the user selection of it's parent(s) and their state(s)
  6. A Note of Warning. Only nodes with a maximum of 3 parents are displayed. As the number of parents grow, the possible combination of their states grow exponentially making it impossible to display all possible situations. We think a limiting it to 3 parents is reasonable as majority of nodes will not have more than 3 parents in a meaningful network.
AttachmentSize
cpt_panel_2_cs.png86.51 KB

The Attributes Panel

Main interactive panel. This panel have controls to perturb (conditionally change the state of a node) the network and view the predicted network based on the changes. It also has visual controls that can be customized to user specific needs, like color of states and its transitions. Attribute PanelAttribute Panel The BnAttribute Panel allows users to visually and quantitatively investigate network perturbations. When a node is selected, it can be set into any of the three bins/states (up, down, and neutral), by pressing the first three buttons on the left. Activating a perturbation will typically result in a visually observable change in the network appearance. In the BNAttribute Panel table, a probability for each bin/state and selected node is displayed. By selecting "Probability", the table will display the probabilities that a node will fall into each bin. Invariably, these three values will add up to 1.0 or 100%.

The main controls of the panel and a short description follows
  1. By clicking on the Green Up Arrow the user can change the state of a node to Up.
    This results in a change in the network. It changes all the nodes that are children of the selected node to reflect the change in their parent's state.
  2. By clicking on the Orange Down Arrow the user can change the state of a node to Down.
    This results in a change in the network. It changes all the nodes that are children of the selected node to reflect the change in their parent's state.
  3. By clicking on the Blue 3 dotted circle, the user can change the state of a node to Neutral.
    This results in a change in the network. It changes all the nodes that are children of the selected node to reflect the change in their parent's state.
  4. This button resets the network to its original condition, as it was before all user perturbations.
  5. Undo button can restore the the network to any state in the series of perturbations induced by the user. Redo button can re-apply the all the steps of perturbation thats undone.
  6. The blue question marked button launches this help window.
  7. The Probability radio button displays a table with the absolute probability of the user set value of a nodes state.
  8. The Probability Change radio button displays displays the relative change in the probability of the states of the resulting from a perturbation.
  9. The Up Color button opens a color palate. The user can select the color to represent the Up state of a node.
    It is also use as the gradient value to represent the color of its state relative to the other states in the network view.
  10. The Down Color button opens a color palate. The user can select the color to represent the Down state of a node.
    It is also use as the gradient value to represent the color of its state relative to the other states in the network view.
  11. The Slider control or the color sensitivity bar allows the user to adjust BNPredict’s color sensitivity so that changes in nodes can be visually discernible relative to others in the network. See here for more details.
  12. The Multi-scale coloring uses a similar approach as the two-color system, but includes the information contained in the neutral bin. See here for more details.


The Network Views

The Network views allows users to interact with the nodes by changing its state and viewing the resulting changes.
There are 2 views associated with each network:

The network View available by Clicking the "Probability" radio button.

Probability ViewProbability View This network shows the nodes colored by its probability of being Up, Down or Neutral. In the BNAttribute Panel table, a probability for each bin and selected node is displayed. By selecting "Probability", the table will display the probabilities that a node will fall into (or belongs to) each bin/state. Invariably, these three values will add up to 1.0 or 100%. A color gradient is used to reflect "how" much Up or Down a node is. It also identifies a node with a marker whose state has been changed by the user.

The network View available by Clicking the "Probability Change" radio button.

Probability Change ViewProbability Change View This network shows the nodes colored by its change in probability of being Up, Down or Neutral from its previous state. "Probability Change" table will display the changes in probability compared to the original bin probability table. These values will begin at 0.0 and any deviation will represent a response to a perturbation of the network. A positive value indicates that the associated bin has become more probable as a result of the perturbation. A negative value indicates that the bin has become less probable. These values will sum to 0.0 for each node.

The LM (Literature Mining) Network

Literature Mining NetworkLiterature Mining Network The LM Network view is non interactive. It does not have a Conditional Probability table associated with it and is not directed. This view provides the starting point of the BN Network search and is for supporting informational purpose only.


BnPredict Coloring System

The BnPredict plugin displays nodes in colors that contain a great deal of significance. The display may use either a two-color system (default) or a multi-color system. Many find the two-color system to be more understandable and logical, though it discards potentially important information. The multi-color system is more complex as incorporated an additional color-dimension. The two-color system is simpler and may be more visually intuitive to the user. Bin probabilities of over expression and under expression are compared while the neutrally expressed bin is ignored.
If the over expressed bin is more probable than the under expressed bin, the node will appear as a shade of the up-color. The relative dominance of expression (either under or over) will determine the intensity of the expression color. Equal probabilities for the up bin and down bin will result in a white node. The following example might explain the 2-color system.
Color-codingColor-coding In this example (subset of the network shown), we can see that the network has been perturbed at gene CCL5 where it has been set to under expressed. A quick visual examination shows that the result of a down regulation of gene CCL5 has an immediate impact on its direct children. Both genes BAD and UBE2L3 can be seen to be significantly more likely to be under expressed when CCL5 is in the down state. As we move downstream in the network we can see that the effect on gene WARS is also towards down-regulation, but the faintness of the color would indicate that there is a significantly reduced response. On the other arm, we see that the grandchild node, ESRRA has a greater relative probability of being over expressed. This might suggest that there is an inhibitory relationship between ESRRA and UBE2L3. The white color of gene NUCB1 indicates that responses have significantly diminished that far down the network. No other genes have been affected by this perturbation in this network. The multi-color system uses a similar approach as the two-color system, but includes the information contained in the neutral bin. The neutral bin's probability affects the color brightness. A more probable bin will result in a darker color. For example, a node which is frequently over-expressed, rarely neutral and rarely under-expressed will appear brighter than a node which is frequently over-expressed, frequently neutral and rarely under-expressed. In this example, the two-color system would treat the two nodes as equal while the multi-color system makes a distinction.
The specific probabilities that the coloring system uses depends on the currently selected display selection. When Probability is selected in the BnAttribute Panel, the coloring scheme will use the raw bin probability tables. When the Probability Change button is selected, each node's bin probability will be compared to the original bin probability. In this case relative changes in bin probability as a result of network perturbations are more easily observed visually.
By clicking the Up-Color or Down-Color buttons, users may select a desirable color scheme. For both coloring systems, the color sensitivity bar allows the user to adjust BnPredict's color sensitivity so that changes can be visually discernible relative to others in the network.

Cluster Manager

Working with Clusters

MeV provides a useful management system for evaluating groups of genes and samples.  These groups, or clusters, can be generated through a number of expression analysis or annotation based techniques.  Once stored, they can be visualized, manipulated and evaluated through the Cluster Manager tool, which is accessible from the MeV Result Tree.

  ClusterManager
 

Storing Clusters and Using the Cluster Manager

 

Clusters of interest can be stored to a repository from the basic cluster viewers. Highlight the cluster of interest by clicking it, then in the right click menu select Store Cluster. Once a cluster is stored, the Cluster Manager node on the result navigation tree will contain a list of stored clusters. Gene clusters and sample clusters are maintained in separate spreadsheets which are viewable from the Cluster Manager node. When storing a cluster to the repository an input dialog is presented which allows for three user defined fields to be associated with the cluster. Two optional text fields are used to capture a cluster name and a description of the algorithm or interesting features of the cluster. The third user input is a color used to identify genes or experiments which are members of the clusters. These colors can be tracked while performing analyses so that clustering consensus can be established. No two clusters may use the same identifying color.

The cluster tables contain the following columns:

-The Serial Number is a unique number which is sequentially assigned to easily identify a particular cluster.

-The Source field describes whether the cluster source was an algorithm, a cluster operation, or some other means of selecting a group of elements.

-The Algorithm Node or Factor field identifies the algorithm used, if the source was an algorithm, and includes the navigation tree result index (in parentheses).


 Cluster Attributes Dialog

-The Cluster Node field identifies the specific cluster node under the Algorithm Node from which the cluster was stored.

-The Cluster Label is an optional user defined name for the cluster.

-The Remarks field can be used to contain details about the process used to create the cluster or specific features of interest in the cluster.

-The Size field shows the number of elements in the cluster.

-The Color displays the user defined color for the cluster. If you click the color box a screen will show that allows you to change the color if you wish.

-The Show Color check box allows you to show or repress the displayed color. This option can be useful when visualizing cluster intersections in viewers. Selecting only one cluster color to view can simplify interpretation.

Users have 6 options for display types and 5 options for the data to display.
Four MeV viewers are accessible from the Cluster Manager:

  1. Table: This option displays a table in the Cluster Manager showing every available annotation for each gene or sample included.
  2. Expression Image: This option displays a heatmap in the Cluster Manager showing expression values for each gene and sample included.
  3. Expression Graph: This option displays a graph in the Cluster Manager showing expression values for each gene and sample included.
  4. Centroid Graph: This option displays a graph similar to the Expression Graph, but individual lines are omitted. Instead, a single mean expression line is displayed and bars representing standard deviation at a particular sample/gene are shown.
  5. Venn Diagram: This option displays a venn diagram in the Cluster Manager showing the overlap between 2 or 3 clusters. A p-value is represented given a null hypothesis of zero membership correlation between clusters.
  6. Expression Charts: This option displays charts in the Sample Cluster Manager showing expression values for each gene or gene cluster expressed over the set of selected sample clusters.

Five data options are available based on the clusters selected.

  1. Show All: All elements in the loaded data are shown, regardless of cluster membership.
  2. Show selected: All elements belonging to any of the currently selected clusters are displayed.
  3. Show excluded: All elements that belong to none of the currently selected clusters are displayed.
  4. Show intersect: All elements belonging to every currently selected cluster are displayed.
  5. Show except: All elements belonging to one and only one of the currently selected clusters are displayed.

The spreadsheet allows single or multiple row selection (by holding down the control key when left clicking the mouse). A right click with one or more rows selected will display a menu that contains several options detailed below. Double-clicking on a cluster will open the Modify Attributes dialog.

-The Modify Attributes option allows the user to modify cluster label, remarks or the cluster color by displaying the input form with the current settings displayed.

-The Open/Launch menu has two options. Open ClusterViewer will pull up the source cluster viewer. The second option is Launch MeV Session which opens a new multiple array viewer containing only the data from the selected cluster or the union of the members of several clusters if several clusters are selected.

-The Cluster Operations menu allows for three possible operations to be performed if two or more clusters are selected. Union combines the members of the selected clusters and stores the resulting cluster on the list. Elements represented in more than one cluster of the input clusters are only represented once in the output cluster. The Intersection operation takes the elements from two or more clusters and produces a cluster containing all elements which are common to all clusters. The XOR (exclusive OR) operation produces a cluster containing elements that are members of one cluster or another but not members of more than one cluster.

-Options also exist to delete selected clusters or all clusters in the list as well as to save a selected cluster to a specified file.

Deleting clusters can be performed by selecting a single or multiple clusters in the cluster table or by selecting delete public cluster option from the menu in the viewer which contains the cluster.

You can also Save Cluster Data to a tab-delimited text file. Selecting this option from the right-click menu will cause a file chooser to appear. Select a file name and a place to save row/column data, log ratio expression values, and (optionally) Cy3 and Cy5 values for each gene in the cluster. Selecting Save All Clusters will allow you to save the genes in all clusters in a similar way. This option is available from the cluster table as well as in the viewer.

  

One additional option is the option to delete all gene clusters or sample clusters. These global operations which effect all colored clusters is selected from the Utilities menu in the multiple array viewer by selecting Delete All Gene Clusters or Delete All Sample Clusters or can be done from the cluster tables.

The Import Gene/Experiment List allows one to create a cluster based on supplied identifiers. For example, if you wish to make a cluster out of specific genes that you know are important, you can paste those genes into the dialog box and the cluster will be created out of those specific genes you pick. Identifiers belonging to the cluster are pasted into the text area. The drop down list indicates the type of annotation being loaded. After searching for matches, the List Import Result dialog will be displayed. An intermediate dialog (Import Result Dialog) will appear to display the results of the import and to allow you to select a subset of the identified elements before saving the elements as a cluster. After review of identified elements a cluster attributes dialog will be presented so that a cluster name, description, and color can be defined for the new cluster. This dialog also displays a table that contains matching elements. The rows in the table can be selected to remove unwanted entries before hitting the Store Cluster button to store the items to a cluster. The bottom section of the dialog also reports which indices were found and which were not found in the loaded data set.

The List Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.

Clustering by Gene List Import

 

Import ID Type

This drop down list contains the gene or experiment annotation types in the loaded data set. Select the annotation type that corresponds to the input ID list.

Paste List (Text Area)

Paste the ID list into the text area by left clicking the mouse in the text area and then using the ctrl-v key strokes to paste the identifiers in the list. Reset will clear the selections on the dialog.

Once the dialog is dismissed, if genes or experiments were found in the data set that match the input parameters, a cluster attributes dialog will be presented to collect cluster attributes including the desired cluster color.

Automatic Identifier Clustering Dialog

 

The Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.

Import Parameters

Select one or more annotation type from the left list and click the '>>>' button to move it to the list of annotations to be used.

The annotations types that you have selected will be used to create clusters. Each unique annotation in each annotation type will be used to create a distinct cluster. Only samples (or genes) that have identical annotations for the specified type will be placed in the same cluster.

Binned Clustering Dialog

 

The Import Dialog is used to import gene or experiment identifiers for the purpose of imposing or creating clusters within the loaded data set. This enables the user to mark genes or experiments of interest for tracking during analysis.

Import Parameters Using the combobox, select the annotation type that contains the numerical values that you wish to cluster.

Fill in values for the upper and lower limits that are to contain the desired cluster.

The upper and lower limits that you have input will be used to create a cluster. All samples (or genes) with a numerical value for the selected annotation type within the specified limits will be placed in a single cluster.

Gene Set Enrichment Analysis (GSEA)

What is GSEA?

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
Reference: http://www.broad.mit.edu/gsea/

Why GSEA?

Traditional statistics use adjusted P-values with some arbitrary cutoff, treating genes with slightly different P-values as different entities. Also, small differences in mRNA abundance are often not detected, nor are large changes in just a few genes.

GSEA remedies this by using all the genes in your expression data for the analysis. GSEA also compiles per-gene statistics across genes within a gene set, allowing for the detection of small changes in many genes or large changes in few genes.

The GSEA algorithm implemented in MeV v4.3 is based on Zhen Jiang and Robert Gentleman's 2007 Bioinformatics paper (Jiang, Z., Gentleman, R., (2007). Bioinformatics. 2007 Feb 1; 23(3):306-13. Extensions to gene set enrichment analysis).

Brief Description of the GSEA Algorithm

The GSEA algorithm can be roughly divided in to three steps:

  • Calculate the per gene statistic. This is done by fitting a linear model to all the genes, separately and simultaneously.
  • Calculate the gene set statistic.
  • Estimate significance by:
    • Permuting factor/phenotype/class labels
    • Compute the per gene statistic for every permutation
    • Compute the gene set statistic for every permutation
    • Calculate and report Unadjusted p-values

How to Run GSEA

 

GSEA uses a set of parameter input dialogs that open sequentially to provide input options

that correspond to each step of the process.  The first step of the process is data selection

which lets you assign phenotype/class labels to your samples.

Figure 1Figure 1


 

 

 

 

 

 

 

 

 

 

 

The default assignment (Figure 1)is two groups (factors) with two levels per group. This can be changed to reflect the groups present in your data.

Figure 2Figure 2 

 

 

 

 

 

 

 

 

 

 

 

B-ALL and T-ALL are the two levels of this phenotype,  so enter 2 in the  “Number of levels of factor” textbox.  If you have pre selected sample clusters and decide to use the “Cluster Selection” tab just assign group numbers using the Group Assignment drop down as shown in Figure 2.  

 

Figure 3Figure 3

 

 

 

 

 

 

 

 

 

 

 

 

You can also use the “Button Selection” tab shown in Figure 3 to achieve the same. Group1 and Group2 symbolize B-ALL and T-ALL. You can save these grouping using the “Save settings” button. To load saved groupings, use the “Load settings” button. Reset button will clear all your choices. Once you are done, click the Next > button. 

 

This brings you to the “Parameter Selection” section shown in Figure 4.

 

Sample Probe Values:

 

  Sample1 Sample2 Sample3 St. Dev
Probe_1 10 20 30 3
Probe_2 20 5 10 4
Probe_3 20 15 10 2

 

Using Maximum Probe, the substituted values for Samples 1-3 would be 20, 20, 30. Using Median Probe, the values would be 20, 10, 15. With Standard Deviation (SD), the probe that has maximum SD across samples is used, so the values would be 20, 5, 10. NOTE: SD will be calculated by MeV on the fly. You do not have to do anything. This is just an example.

 The ‘Browse’ button corresponding to “Select the directory containing your gene sets” lets you choose the directory containing gene set files. You can select the files you want to use from the “Available” panel. 

“Selected” panel indicates the gene set files that you chose to use for this analysis. In addition to this, gene sets can also be downloaded from the MIT/Broad website http://www.broad.mit.edu/gsea/msigdb/downloads.jsp


If your gene set file is *.gmt or *.gmx format,  “Select the identifier used to annotate genes in your selected gene set” drop down is automatically populated with “GENE_SYMBOL” as shown in figure above. In case of  a custom gene set file, you must manually choose the gene identifier from the drop down.

 

The “Load Annotation Data” panel lets you upload annotations. Annotations are a MUST for running GSEA. Details on how to load annotations is described in Using the Annotation Feature in MeV manual.

 

The last step is to hit the Execute button.  GSEA outputs besides the standard MeV viewers three new viewers namely “Test Statistic Graph”, “Leading Edge Graph Viewer” and “Geneset p-value graph”.  “Significant Gene Sets” under “Table Views” lists gene sets sorted by their Over enriched (upper p values). Lower p values are the probability of seeing a test statistic lower than the observed one. Upper p values are the probability of seeing a test statistic higher than the observed one.

Figure 5Figure 5

 

 

 

 

 

 

 

 

 

Right Clicking on the rows in the table as shown in Figure 5 lets you navigate to different viewers. "Excluded Gene Sets" table contains gene sets which do not pass the minimum genes per gene set criteria. These gene sets are not included in the analysis. "Probe to Gene Mapping" table shows all the probes mapping to a gene.

Figure 6Figure 6

 

 

 

 

 

 

 

 

 

 

“Test Statistic Graph” shown in Figure 6 aims to show how genes within a gene set contribute to the overall gene-set-level metric. This metric is computed by summing the distance from the green line to the orange point and then normalizing this sum by the square root of the number of genes in the gene set.

 

Figure 7Figure 7

 

 

 

 

 

 

 

 

 

“Leading Edge Graph” in Figure 7 shows which subset of genes within the gene set is contributing to the significance of the gene set level metric. The leading edge subset is calculated by first ranking the genes based on largest to smallest test statistics.

We then calculate the Jiang-Gentleman statistic for subsets of the gene set, starting with the first subset containing the gene with the largest t-statistic, and then incrementing the subset to include the next gene with the next largest t-statistic.

We iterate through until the final subset contains all the genes in the gene set. The subset which maximizes the Jiang-Gentleman statistic suggests that this group of genes contribute the most to the gene-set level metric.

Getting Started with RNA-Seq Analysis

This guide is an introduction to using the new RNASeq functions in MeV. The guide contains a brief tour of the new RNASeq file loader and a demonstration of a few of the new functions we have added specifically to support RNASeq data. The guide will first walk you through loading the data using the new RNA-Seq file loader. Then it will describe using an RNA-Seq-optimized module, EdgeR, to find differentially expressed genes between two groups of samples. Finally, it will demonstrate how to examine these differentially expressed genes for functional themes using the new module GOSeq.

These new options were added in MeV v4.7. If you already have MeV v4.7 installed, you can skip the Setup step and go directly to Loading a Data Set.

Loading an RNA-Seq Data Set

  1. In the Multiple Array Viewer, go to File -> Load Data.
  2. When the window titled Expression File Loader appears, click Select File loader -> RNASeq DGE Files. The RNASeq file loader screen will appear. RNA-Seq file loader: The RNASeq data loader accepts raw count data, RPKM or FPKM, mapped to either ENSEMBL IDs or RefSeq IDs.
  3. Click the Browse button at the upper right side of the screen. In the file browser that appears, navigate to the MeV folder, then open the data/rnaseq folder. Choose the file TagSeqExample.txt. This file contains raw count data. 
  4. Choose the appropriate parameters for each of the drop-down menus at the top of the file loader screen. For the data file we have selected, choose the Data Type Count, the Species Human, the Reference Genome RefSeq, and the UCSC build hg19. Leave Read Length blank.
  5. Click the Load button.





 

AttachmentSize
rnaseq-loader.gif44.68 KB

Differential Expression Detection

Begin your RNASeq analysis by testing for differential expression of all of the unique reads. To do this, we will use a module called edgeR, based on the Empirical Analysis of Digital Gene Expression data in R package written by Mark Robinson.

  1. In the row of colorful buttons across the top of the MultiExperiment Viewer window, EdgeR module selection: The edgeR module can be found in the Statistics drop-down menu.click the one labeled Statistics. Choose Empirical Analysis of Digital Gene Expression data in R (edgeR). An initialization dialog will appear. 
  2. Select the group membership for each of the six samples. Click "Group 1" for the first four samples, and "Group 2" for the remaining two samples.
  3. Leave the default values for the Inference Algorithm and p-value/FDR parameters.
  4. Click Ok. The analysis will run and display the results in the result tree, on the left of the Multiple Array Viewer window.

EdgeR Initialization Dialog: The edgeR initialization dialog.

 

 

 

Differential Expression Results

 

  1. Open up the result node labeled edgeR, and expand the nodes to find one labeled Significant Gene List. Click on this node to select it and display the list of genes found to be differentially expressed between the two sample groups you selected in the previous section. You can click on the links to launch a web browser displaying more information about individual genes. 
  2. EdgeR Output: Results of the edgeR module, showing significantly differentially expressed genes/transcripts. Right-click to reveal a context menu with many powerful options.Right-click on the window in a cell with no links (the Stored Color column is a good bet). Choose Store entire cluster and click Ok to label each of the genes in this window with a color. This color label will be visible anywhere a gene display is shown in MeV - even in the results of other modules.

Examining RNA-Seq Differential expression list for signature themes

Now that we have a list of differentially expressed genes, we can examine it for themes. To do this, we will use the GOSeq module. This module is based on the R package GOSeq, by Matthew Young. It is designed to find enriched gene groups in length-biased data, such as RNASeq data. Compare it to tools like EASE for microarray data.

  1. From the Statistics drop-down menu, choose the item Gene Ontology analysis for GOSeq Initialization Dialog: The GOSeq initialization dialog.RNA-seq.
  2. Click the Cluster Analysis tab at the top of the Initialization Dialog.
  3. Leave the GOSeq parameters Significance Level: Alpha, Number of Permutations and Number of Genes per Transcript Length Bin set at their default values.
  4. You should have a cluster pre-selected in the cluster selector dialog. If you have more than one cluster available in this dialog, choose the one you want to examine for geneset enrichment.
  5. Choose Download from GeneSigDb from the drop-down menu. Click the Download button.
  6. Check that the Choose Annotation Type drop-down menu is set at GENE_SYMBOL.
  7. Leave the File Location field blank.
  8. Click Ok. GOSeq will run.

 

 

Signature theme results

In the Result Tree, you will see a new result node named GOSEQ. GOSeq Output: Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.

  1. Open this node and select the node labeled Results Table. This table contains the complete list of genelists downloaded from the GeneSigDb database, as well as a rating for each list as to wether the contents of that list is enriched in the selected group of differentiated genes used to run GOSeq.
  2. Double-click on the header labeled p-value to sort the list. Those gene lists with low p-values, like Human StemCell_Brendel05_21genes, listed here, are enriched in the set of differentially expressed genes we found in our previous edgeR analysis. You can explore this gene list by going to the GeneSigDb website.

 
Gene signatures, published in GeneSigDb, with enrichment in the list of selected genes. Future plans include adding links from this display directly to the gene signature web page, where the list of genes in the signature and the source publication can be found.

From here, you can continue examining gene signatures of interest by searching the GeneSigDb website, or continue on with another analysis by simply selecting it from one of the drop-down menus. For this pilot, most of the standard MeV modules are available to use. A few of them, like the EASE and GSEA modules, require specific annotation files that are currently only available for DNA micoarray data. Part of the full RNASeq implementation project will be to adapt MEV to fully support RNASeq analysis in all modules. However, that support is not yet available.


Using Nested EASE (nEASE)

Nested EASE (nEASE) is an extension of EASE. The nEASE algorithm includes a second, sub-level, iterative Fisher’s Exact Test on significantly enriched GO terms identified in a first-level EASE analysis. This sub-classification approach provides increased sensitivity for detecting enriched GO terms and thus affords a deeper understanding of possible mechanisms underlying a given condition under study. nEASE was added to MeV as a new feature for version 4.5.

Begin this tutorial after installing MeV.

Loading Example Data

Launch MeV by double-clicking on the TMEV.bat file (Windows), the tmev.sh file (Linux) or the MeV application icon (Mac OSX).

When you launch MeV, two windows will open. The small narrow window across the top of the screen is called the MeV main menubar. This window is used normally to open new MultipleArrayViewer windows and manage other MeV properties. We will not be using this menubar window for the purposes of this tutorial. The larger window that opens is called a MultipleArrayViewer (MAV). This is where the majority of our work will take place.

Download and unzip the file nease_example_files.zip. This file contains the expression data and supporting GO term files we will use to replicate the analysis presented in the manuscript.

Choose File-> Open Analysis from the MAV window. In the file chooser that opens, navigate to the folder where you unzipped nease_example_files.zip, and select the file ER_status_SAM_1_Miller.anl and choose Open. A saved analysis will be loaded into MeV. This may take some time.

 

These data are fully described in the manuscript. They are based on the data from Miller et al. (2005) and Minn et al. (2005). 

Choosing Nested EASE Parameters

EASE is found under the Meta Analysis menu itemEASE is found under the Meta Analysis menu itemNear the top of the MAV is a row of colorful drop-down menus. These menus contain the analysis options available in MeV.

After loading the analysis file, click the Meta-Analysis drop-down menu, and select EASE Cluster Analysis. An initialization dialog will appear.

 

 

 

 

Step 1: Selecting the EASE file system

To manually load an EASE file system click the button marked Custom in the EASE Annotation Analysis dialog which will bring up the EASE Advanced Parameters dialog, then click the Browse button. Proceed to navigate to the folder you previously downloaded, nease_example_files. Inside it, open the folder named data\ease\affy_HT_HG-U133A_EASE. You should see three folders inside, Data, Enhance and Lists. Do not select any of these folders. Simply click Open.

 

Step 2: Population Selection EASE Advanced Parameter DialogEASE Advanced Parameter Dialog

Make sure the option Select Background Population from File is selected in the EASE Advanced Parameter dialog. Click the button labeled File Browser in the Population panel and select the file Lists/affy_HT_HG-U133A/Populations/ProbesetIDs.txt.

 

 

 

 

 

 

 

 

 

 

 

 

Step 3: Annotation Parameter Selection

From the drop-down menu labeled Annotation Key in the MeV Annotation Key panel, select the heading PROBE_ID. Click the checkbox labeled Use Annotation Converter. Click the button labeled File Browser. The file selection dialog that opens should already be set to the correct directory. (If it is not, navigate to nease_example_files\affy_HT_HG-U133A_EASE\Data\Convert.) Select the file affy_HT_HG-U133A_ProbesetIDs.txt and click Open. Under the heading Gene Annotation /Gene Ontology Linking Files, click the Add Files button. Again, the resulting file browser should already be displaying a list of the available GO term files. (If not, navigate to nease_example_files\affy_HT_HG-U133A_EASE \Data\Class directory within the MeV folder.) Hold down the control key and click on the files GO Biological Process.txt, GO Molecular Function.txt and GO Cellular Component.txt to select them. When the three files are selected, click Open.

 

Click the OK button to return the the main EASE dialog.

 

Step 4: Statistical Parameter SelectionnEASE is an intensive module and will take some time to runnEASE is an intensive module and will take some time to run

Click to select the check box next to Run Nested EASE, near the bottom of the EASE Annotation Analysis dialog. Now click the Ok button to run EASE and Nested EASE. This will take some time. 

Viewing the nEASE results

The standard EASE analysis, as described in Hosack, et al. (2003), will run, followed by the Nested EASE analysis. The nEASE results are included as part of the standard EASE results, as a subnode on the result tree, labeled Nested EASE. Double-click the result node labeled EASE Analysis to see the nested ease results appear. We recommend that you expand the size of the window containing the Result Tree by clicking and dragging on the dividing bar between the Result Tree and the larger viewing window on the right. Click on the result nodes to explore the result data within.

 

Nested EASE Summary Table.Nested EASE Summary Table.

The most useful view is found in a node labeled Nested EASE Summary Table, which contains the summarized data for all of the nEASE results. These are the data reported in the manuscript. To examine the genes which drive a result row in the nEASE Summary Table, left-click to select a row, and right-click to open up a context-sensitive menu. Choose Open Viewer -> Expression Image. MeV will open a heatmap view of the probe expression values that correspond to the genes that were members of that group. 

 

 

View the expression profiles for the genes in the EASE group of interest.View the expression profiles for the genes in the EASE group of interest.

All of the usual heatmap-related options are available in this window, including adjusting the color display, changing the gene annotation displayed, and storing the displayed genes in a new cluster. Note that the left-hand panel, the Result Tree, does not obviously change when this heatmap is opened. However, scrolling the Result Tree down will reveal that a new result node has been selected, corresponding to the Nested EASE result that was selected from the Summary Table. 

 

References D. A. Hosack, G. Dennis, B. T. Sherman, H. C. Lane, R. A. Lempicki, Genome Biol 4 (2003). A. J. Minn, et al. , Nature 436, 518. (2005) L. D. Miller, et al. , Proceedings of the National Academy of Sciences 102, 13550 (2005).

Example Data Files

MeV is capable of loading genomic data from many different types of files. Affymetrix, Agilent, Illumina, GenePix and others are availble.  MeV also supports several platform-independent file formats such as TDMS, MAGE-TAB and GEO.

The Tab-Delimited Multiple Sample (TDMS) file loader.The Tab-Delimited Multiple Sample (TDMS) file loader.Tab-delimited Multiple Sample files

Download an example file

Loaded with the TDMS file loader. This file format is a flexible, vendor-independent format where each row contains a record for one gene, with samples arranged by column. Any number of gene or sample annotation rows are allowed. MeV will attempt to "guess" which rows and columns contain annotation and will color-code them accordingly. Please verify that MeV has guessed well. If it has not, click to select the upper-leftmost cell that contains expression data. MeV will re-color the cells to reflect your selections. Gene annotations are colored blue/purple, sample annotations in blue-green, and sample annotation headers are in yellow. Expression values are striped in blue and white.


MAGE-TAB Files

Sample IDF file

Sample SDRF file

Sample Data file

The MGED consortium has written a generalized file format called MAGE-TAB. MeV's MAGE-TAB file loader accepts the MAGE TAB idf, sdrf and data files, though only the data file is required.

Other Methods

Data can also be loaded into empty Multiple Array Viewer windows with the Gaggle framework. See the Gaggle tutorial for more details.

 

AttachmentSize
TDMS_format_file.txt62.9 KB
E-ATMX-12.sdrf_.txt5.36 KB
E-ATMX-12.idf_.txt4.14 KB
E-ATMX-12-processed-data-1343527784.txt4.01 MB