Clustering

Clustering basically groups similar samples or genes together. This way it makes it much easier to test the data, since you can see which groups are more alike.

Clustering with the Hierarchical Clustering Module (HCL)

  1. HCL is a commonly used clustering algorithm. From the top left click the Clustering drop down menu. Choose the first option, Hierarchical Clustering. NOTE: The downside to this test is that it can take a long time to perform if there are a lot of genes, and if there are too many an error message may appear. 10,000 genes is a standard cutoff point, but even with that many genes it may take up to 20 minutes. If you just want to look at the similarities among samples and not genes, simply uncheck the box at the top that says Gene Tree. If you do have a lot of genes and want to group them into similar groups, the k-means cluster in the next section may be more useful
  2. Click OK at the bottom right.
  3. To view the clusters, find and double click HCL on the navigation tree on the left.
  4. You can store the clusters by clicking on the groups that you want together. That is, click the line that connects a certain group of genes. That group will then be highlighted. Right click on the line and choose Store Cluster.
  5. The program should automatically choose a new color for each cluster so that you can easily view and distinguish them.
  6. If you right click in the gene tree and select Gene Tree Properties, you can get much more detailed results.
  7. Under Distance Threshold Adjustment, you can adjust how many clusters will be created. The lower the distance range, the more clusters there will be since a shorter distance between similar genes means more groups. You will see a light gray triangular ‘wedges’ along the side as this distance threshold changes. Either use the slide or type in a value for the distance threshold until you get the desired number of clusters.
  8. If you click the box on the right Create Cluster Viewers, you will get different graphs and charts for each cluster created. These can be very useful visualization tools as well being able to look closer at each cluster.
  9. Click OK.
  10. If you did not get the desired number of clusters, you can always go back. Also, if you navigation tree gets too cluttered, simply right click the item your wish to delete and click Delete.
  11. You can view each individual cluster in the navigation tree on the left. You can either view different graphs of an individual cluster, or choose All Clusters and see all of them on one screen.
  12. The Expression images are the red and green rectangles like that of the main view. The Centroid Graphs show the means and ranges of each sample. The Expression Graphs give a more detailed representation of the gene behavior. Each individual line represents one gene, and the pink line in the middle shows the average expression of all the genes in each sample.
  13. If you wish to do an analysis on just one particular cluster (such as a t-test, which will be discussed in the Modules section), you can right click on the individual cluster and choose Launch New Session. This will open a new window with just the particular cluster you wish to analyze.
  14. If you double click Cluster Manager in the navigation tree, it will show you the list of all the clusters you made.

Clustering with K-Means clustering Module (KMC)

K-Mean clustering is useful for large data sets, as it does not use nearly as much memory as HCL. This module can also be found in the Clustering drop down menu. The benefits of this cluster are that it is much faster than the HCL and that you can choose exactly how many clusters you want.
  1. Select Clustering -> k-Means/Medians Clustering from the drop down menu.
  2. Choose whether you want to cluster the genes or the samples.
  3. Choose how many clusters you wish to form. Here we will use 10, but you can always go back and select fewer (or more) if those results do not give you the information you want.
  4. You can view the clusters in the navigation tree on the left just as you could for the HCL.
For more information on clustering, there is a more detailed section in the MeV Manual called Working with Clusters.