To load Affymetrix data (typically .CEL files) into MeV, the CEL files must first be normalized to account for technical variation between the arrays. The team at TM4 recommends using Robust Multichip Average (RMA) normalization for Affymetrix chips. For more information, see:
Bolstad, B.M., Irizarry R. A., Astrand, M., and Speed, T.P. (2003), A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2):185-193
Before beginning any analysis: as general housekeeping, put all of the .CEL files to be analyzed as one batch into a separate folder.
Also, before loading normalized data into MeV, annotating the data makes clustering easier. In Excel, Notepad, or a text editor, sample names or experimental treatments corresponding to the .CEL file can be added to the row directly above the .CEL names (keeping one empty space at the beginning of the row).
If the microarray service provides .CDF AND.CEL files for each GeneChip, the Windows GUI program RMAExpress can be used. RMAExpress can be found here: http://rmaexpress.bmbolstad.com/. After installing RMAExpress, documentation can be found in the User's Guide link under the Start Menu, or can be found at: http://rmaexpress.bmbolstad.com/RMAExpress_UsersGuide.pdf.
If the .CDF file is not provided by the microarray service, one can be downloaded be registering on Affymetrix.com and navigating to: http://www.affymetrix.com/support/technical/libraryfilesmain.affx. Scroll down to find the appropriate array type, and download the .zip file. Extract the .zip file, then navigate to the .CDF file (it should be found in "Full->Name of Chip->LibFiles". This file can be copied to someplace easily accessible, like where .CEL files are stored, and use this as the .CDF file necessary for RMAExpress.
RMAExpress is pretty straightforward. First open the application, choose "File->Read Unprocessed Files", and navigate to where the .CDF file is saved and select it. Then navigate to the directory where the .CEL files are saved and select the ones to be normalized. After loading the files, they can be normalized by selecting "File->Compute RMA measure." In order to load the RMA-normalized data into MeV, choose the option, "Write Results to File (log scale)," and it will save the results as a tab-delimited text file that MeV can read. RMAExpress's other features include raw data visualization: PM intensity log2 boxplots, density plots, and residuals plots. RMAExpress also allows the user to visualize QC statistics, showing RLE and Nuse values, medians, and IQRs. For more information, consult the user's guide.
For a faster, cross-platform alternative, the statistical language R can be used. Even for those without any familiarity with R, using R for RMA normalization can be quite easy and less computationally intensive than other methods. R and its suite Bioconductor are loaded with features to allow for data visualization and multiple methods of normalization. Here we will present a quick starter guide to using R to normalize Affymetrix GeneChip data.
First, R can be obtained from: http://www.r-project.org. On the left-hand side of the page it says, "Download" then "CRAN" under it; click "CRAN". Find a mirror that is currently working (geographically closer is often better) and then choose the appropriate operating system distribution. Download and install as appropriate for the operating system used.
Open R, and from within R install the Bioconductor package biocLite to do some easy RMA normalization and QC analysis of the Affymetrix GeneChip data.
In R, set your working directory to where the .CEL files are contained (File->Change dir) OR from the R prompt:
>setwd("your full directory here with quotation marks")
At the prompt, install BiocLite by copy-pasting the code below(includes the Affy packages needed for normalization):
>source("http://bioconductor.org/biocLite.R")
>biocLite()
Load Affy package for normalization:
>library(affy)
Load the .CEL files and do RMA normalization:
>Data<-ReadAffy()
>eset<-rma(Data)
Write a tab-delimited file of the normalized data.
>write.exprs(eset, file="insertFileNameHereWithQuotes.txt", sep="\t")
Optional: QC Plots and Analysis
Install and load the made4 package:
>biocLite("made4")
>library(made4)
Get a quick overview of the data: hierarchical clustering dendrogram, boxplot, histogram.
>overview(eset)
AMP, or Automated Microarray Pipeline, is a part of the TM4 suite designed to normalize and do preliminary analysis on Affymetrix GeneChip data. Although still in Beta phase, AMP is a web application that processes the data on the its own server, freeing the user's resources up for other processes. Therefore, AMP can handle larger sets of .CEL files that some computers may not be able to load and normalize. AMP requires free registration, and is found here: http://compbio.dfci.harvard.edu/amp/.
AMP currently supports Affymetrix Version 3 CEL files. To determine if a .CEL file is Version 3 or Version 4, open the .CEL file in a text editor. If the file appears as plain text, it is Version 3 and no conversion is required. Version 4 files will appear as a binary "jumble" of characters, and these files will need to be converted with Affymetrix' freely-available CEL File Conversion Tool software for Windows. This requires a free account with Affymetrix, and can be found at: http://www.affymetrix.com/support/developer/tools/devnettools.affx. Extract the .zip file, and the run the program with the .exe file. Use "Browse" to navigate to the Version 4 .CEL directory, select the "Version 4 to 3" radio button, and click "Convert."
For other operating systems, there is Celutil, available at http://www.bioinformatics.org/celutil/. Celutil requires a C++ compiler. See the Celutil web site for more details.
After converting .CEL files to version 3, AMP can be used for normalization, and also can display boxplots and histograms, RLE and NUSE plots. In AMP, create a "New Dataset," and file it under a study or create a new study, then click the "Upload Data" button. Allow the Java Web Start Launcher to open, click the "Run" button. In the AMP upload window, click "Browse," navigate to the .CEL directory, and choose the files to be normalized. When finished, click "Upload." Back in the main AMP browser menu, select the study that contains the uploaded .CEL files, and for the dataset, check the righthand side box next to "Analyze Further," then at the top of the righthand column click "Analyze Selected Datasets." RMA normalization is already selected by default. Select any other additional plots desired. Preliminary analysis is not necessary, since this is easily implemented in MeV. When finished click "Submit," and AMP will send email notification when the normalization is completed.
Once notification has been received, navigate back to the Study page in AMP, and under the normalized dataset, click "Normalization Results." Scroll down to the file named "rma.txt" (the normalized data); right-click to save it to the local machine. This file can be loaded directly into MeV as a tab-delimited text file (Affymetrix).