Add the whole blog
This commit is contained in:
		
					parent
					
						
							
								0d2f58ce7a
							
						
					
				
			
			
				commit
				
					
						c4f23c1529
					
				
			
		
					 418 changed files with 15708 additions and 0 deletions
				
			
		
							
								
								
									
										23
									
								
								content/post/2006-11-10-the-joy-of-meta-analysis.markdown
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								content/post/2006-11-10-the-joy-of-meta-analysis.markdown
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,23 @@ | |||
| --- | ||||
| author: einar | ||||
| categories: | ||||
| - Science | ||||
| comments: true | ||||
| date: "2006-11-10T22:04:54Z" | ||||
| header: | ||||
|   image_fullwidth: banner_other.jpg | ||||
| slug: the-joy-of-meta-analysis | ||||
| title: The joy of meta-analysis | ||||
| disable_share: true | ||||
| wordpress_id: 128 | ||||
| --- | ||||
| 
 | ||||
| Recently, I've been in need to retrieve some records regarding renal cell carcinoma referenced in papers by [Zhao _et al._](http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=16318415&query_hl=1&itool=pubmed_docsum) and[ Higgins _et al._  ](http://ajp.amjpathol.org/cgi/content/full/162/3/925)The records of the former were hosted on [NCBI's Gene Expression Omnibus](http://www.ncbi.nlm.nih.gov/geo), while the latter records were uploaded to EBI's [ArrayExpress](http://www.ebi.ac.uk/arrayexpress) database. Getting data from others and using it for your own analysis is called _meta-analysis_, and it's often used to validate methods and algorithms with different data sets. | ||||
| 
 | ||||
| <!--more--> | ||||
| 
 | ||||
| The problem is, getting the **right **data is not always easy. I spent the whole afternoon yesterday trying to figure out how I could retrieve already analyzed data (usually you get the processed - i.e. normalized - data only). From GEO I could download individual sample data (something I didn't need) or the whole data set (a whopping 1.6 Gb), in SOFTtext format. [Biopython]({{ site.url }}/biopython.org) has a SOFT parser, but the set was so big I just crashed my own machine. Of course, data wasn't available in tabular format. | ||||
| 
 | ||||
| ArrayExpress wasn't better on that respect. Perhaps I don't understand well the format used by two color arrays, but again, it was impossible to group the samples like I wanted, and the sample information file was missing (critical requirement, I needed to choose only clear cell histotypes), though with some fiddling I managed to get the right files. Of course, they included only a normalized mean of the log2ratio of the two channels, and I didn't want to run an analysis (such as SAM) myself... | ||||
| 
 | ||||
| Science is all about being able to reproduce results. It's a shame that sometimes doing so is so hard. | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue