23 lines
		
	
	
	
		
			1.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			23 lines
		
	
	
	
		
			1.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| author: einar
 | |
| comments: true
 | |
| date: 2007-06-20 17:50:39+00:00
 | |
| layout: page
 | |
| slug: data-handling
 | |
| title: Data handling
 | |
| wordpress_id: 264
 | |
| categories:
 | |
| - Science
 | |
| header:
 | |
|     image_fullwidth: banner_other.jpg
 | |
| ---
 | |
| 
 | |
| As the people who read my science related posts already know,[ I'm in the middle of doing meta-analysis]({{ site.url }}/2007/05/28/more-meta-analysis-difficulty/). That brought up a problem, so to speak, and it's related to annotations.
 | |
| 
 | |
| <!-- more -->Probes on microarrays are referenced to genes (to over-simplify): usually these references are made with the latest version of the genome available. As the map of the genome is not static, but it's a moving target, these annotations tend to become obsolete. And that unfortunately leads to problems when you compare experiments made in different time frames.
 | |
| 
 | |
| To be precise, the papers I'm using the data from are from 2005 to 2006, but the actual experiments were performed earlier. One uses the annotation data from the Affymetrix HG-U133A chip, which (along with the whole HG-U133 family) [have been proven to be outdated by Dai and coworkers.](http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16284200&ordinalpos=2&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum) The other uses Entrez Gene identifiers, but some IDs are no longer valid or overlap.
 | |
| 
 | |
| How can such a situation be solved? For some experiments there's nothing much to do, perhaps reannotate the IDs using an automated system (I believe this is possible), for others (Affy chips) the paper I linked gives a possible (and effective: we've tested it in our group) solution by creating new "meta-probes" that reflect the updated annotations.
 | |
| 
 | |
| In any case, you should be wary of that, should you want to compare different microarray datasets.
 |