21 lines
		
	
	
	
		
			1.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			21 lines
		
	
	
	
		
			1.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| author: einar
 | |
| categories:
 | |
| - General
 | |
| - Linux
 | |
| - Science
 | |
| comments: true
 | |
| date: "2006-11-25T09:06:10Z"
 | |
| header:
 | |
|   image_fullwidth: banner_other.jpg
 | |
| slug: a-simple-annotator
 | |
| title: 'A simple annotator '
 | |
| disable_share: true
 | |
| wordpress_id: 132
 | |
| ---
 | |
| 
 | |
| In the past two days I've written a simple annotator program, that given an input list of RefSeq genes, automatically determines the relevant Entrez Gene IDs and annotates them using the flat files provided by the [NCBI](http://www.ncbi.nlm.nih.gov). A direct conversion was not possible due to limitations in Biopython's parsers, but I managed to use the GenBank parser to identify and extract the references to the Gene IDs (and putting them in a list).
 | |
| 
 | |
| Once that had been done, I created a series of dictionaries when reading the annotation file, for data such as gene name, symbol, chromosome and cytoband. Using the list I already obtained, it was easy to create a new file with the required fields.
 | |
| 
 | |
| During this process I learnt somewhat more about how to play with iterators to skip headings and so on. The code is not yet sufficiently generic, but once I finish toying with it, I may publish it for "general" (assuming anyone would use it) consumption, under GPL v2.
 |