1
0
Fork 0
This repository has been archived on 2021-01-06. You can view files and clone it, but cannot push or open issues or pull requests.
dennogumi.org-archive/_posts/2006-11-10-the-joy-of-meta-analysis.markdown

2.1 KiB

author comments date layout slug title wordpress_id categories header
einar true 2006-11-10 22:04:54+00:00 page the-joy-of-meta-analysis The joy of meta-analysis 128
Science
image_fullwidth
banner_other.jpg

Recently, I've been in need to retrieve some records regarding renal cell carcinoma referenced in papers by Zhao et al. and Higgins et al. The records of the former were hosted on NCBI's Gene Expression Omnibus, while the latter records were uploaded to EBI's ArrayExpress database. Getting data from others and using it for your own analysis is called meta-analysis, and it's often used to validate methods and algorithms with different data sets.

The problem is, getting the **right **data is not always easy. I spent the whole afternoon yesterday trying to figure out how I could retrieve already analyzed data (usually you get the processed - i.e. normalized - data only). From GEO I could download individual sample data (something I didn't need) or the whole data set (a whopping 1.6 Gb), in SOFTtext format. [Biopython]({{ site.url }}/biopython.org) has a SOFT parser, but the set was so big I just crashed my own machine. Of course, data wasn't available in tabular format.

ArrayExpress wasn't better on that respect. Perhaps I don't understand well the format used by two color arrays, but again, it was impossible to group the samples like I wanted, and the sample information file was missing (critical requirement, I needed to choose only clear cell histotypes), though with some fiddling I managed to get the right files. Of course, they included only a normalized mean of the log2ratio of the two channels, and I didn't want to run an analysis (such as SAM) myself...

Science is all about being able to reproduce results. It's a shame that sometimes doing so is so hard.