Affymetrix Microarray Gene Expression Analysis Complete Tutorial

1 view (last 30 days)
Hey,
BACKGROUND
So I was given a bunch of .cel files and imported them to MATLAB to learn that they were MOE430A chips from Affymetrix. I started going through your tutorials on Microarray analysis (preprocessing, exploring, etc) and I am having great difficulty because of some missing information. For example, if you start with the example files at preprocessing, go through the tutorial and then start exploring, you can not complete the exploring microarray gene expression. This is because the data structures are manipulated to some degree. Further, when it convenient, files are added with no explanation of where they came from or how they were made. For example, HuGeneFL_GeneSymbol_Map. There is not a depositary of all of Affymetrix's annotations prepared for MATLAB. So how does one go from their annotation CSV file for any given chip (e.g. in my case MOE430A) to that map structure?
QUESTION
I guess most simply my question is can someone please 1.) clarify the discontinuity between MathWork examples 2.) a.) explain were necessary files that are just impromptu loaded during the tutorials (e.g. the MAP) came from b.) explain what files they are using and why and 3.) generalise for those trying on a different Affymetrix chip.
OTHER
There were other issues (such as not being any MPIntensities when I loaded my chip), but I do not think those are related specifically to MATLAB.
Also is there an easy way to fix poorly made CSV files? Affymetrix's annotation cvs file is actually just one column written as:
|"field 1" , "field 2" , ... | " " | " " | " " | " " | ... | Where the pipes are the actual fields, so all the information is actually just in the first one...

Answers (1)

Luuk van Oosten
Luuk van Oosten on 27 Jul 2016
Dear Summer,
I'm having some trouble with understanding your exact questions, but I'll take a shot at it in the hope at least some of it helps:
1) I do not believe there is discontinuity between MathWork examples. They just use different data to get the point [of each individual tutorial] across. It might be that some operations can not be performed on other data due to [structural] differences in the files itself. No expert on microarray data here... so you have to figure that one out yourself.
2.a & b). Each example uses some data to work with. Is this data impromptu loaded? Maybe, but you need some data to work with right...? Where does this data come from? It is shipped with the rest of MATLAB, no need to download it somewhere, except if you see a sentence like this:
"Both raw and normalized Illumina expression data are available on the Gene Expression Omnibus (GEO) database Web site"
Otherwise, you can just type:
load HuGeneFL_GeneSymbol_Map
And MATLAB will load the corresponding .mat file with the Human genome data in it. Note that you must have all the right toolboxes installed.
3) This I do not understand.
Other:
- other issues with MPIntensities: ok.
- other issues with *.csv files: How are they poorly made? (1) can't you import them, or (2) is it just the general structure that is not very appealing? If (2) is the case, than you can just manipulate your data import script in such a way that you get a nice formatted matrix/struct/file/whatever (and save it again as a *.csv file, if you want...)?

Categories

Find more on Bioinformatics Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!