dennogumi/content/post/2007-06-20-data-handling.markdown
Luca Beltrame 64b24842b8
All checks were successful
continuous-integration/drone/push Build is passing
Update all posts to not show the header text
2021-01-13 00:05:30 +01:00

1.8 KiB

author categories comments date header slug title omit_header_text disable_share wordpress_id
einar
Science
true 2007-06-20T17:50:39Z
image_fullwidth
banner_other.jpg
data-handling Data handling true true 264

As the people who read my science related posts already know,[ I'm in the middle of doing meta-analysis]({{ site.url }}/2007/05/28/more-meta-analysis-difficulty/). That brought up a problem, so to speak, and it's related to annotations.

Probes on microarrays are referenced to genes (to over-simplify): usually these references are made with the latest version of the genome available. As the map of the genome is not static, but it's a moving target, these annotations tend to become obsolete. And that unfortunately leads to problems when you compare experiments made in different time frames.

To be precise, the papers I'm using the data from are from 2005 to 2006, but the actual experiments were performed earlier. One uses the annotation data from the Affymetrix HG-U133A chip, which (along with the whole HG-U133 family) have been proven to be outdated by Dai and coworkers. The other uses Entrez Gene identifiers, but some IDs are no longer valid or overlap.

How can such a situation be solved? For some experiments there's nothing much to do, perhaps reannotate the IDs using an automated system (I believe this is possible), for others (Affy chips) the paper I linked gives a possible (and effective: we've tested it in our group) solution by creating new "meta-probes" that reflect the updated annotations.

In any case, you should be wary of that, should you want to compare different microarray datasets.