42 lines
		
	
	
	
		
			1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			42 lines
		
	
	
	
		
			1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| author: einar
 | |
| comments: true
 | |
| date: 2007-11-15 19:57:16+00:00
 | |
| layout: page
 | |
| slug: gene-identifiers
 | |
| title: Gene identifiers
 | |
| wordpress_id: 336
 | |
| categories:
 | |
| - Science
 | |
| tags:
 | |
| - annotation
 | |
| - bioinformatics
 | |
| - microarray
 | |
| - python
 | |
| ---
 | |
| 
 | |
| While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs...), but today I had a list of mixed identifiers.
 | |
| 
 | |
| The subsequent idea was "let's implement auto-detection of common identifiers in the class". The problem is... is there any actual documentation on how identifiers are made? So far, using regular expressions, I've tracked down a few:
 | |
| 
 | |
| 
 | |
| 
 | |
| 	
 | |
|   * RefSeq
 | |
| 
 | |
| 	
 | |
|   * GenBank
 | |
| 
 | |
| 	
 | |
|   * Entrez Gene
 | |
| 
 | |
| 	
 | |
|   * UCSC Genome Browser
 | |
| 
 | |
| 	
 | |
|   * Ensembl
 | |
| 
 | |
| 
 | |
| However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?
 | |
| 
 | |
| (On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)
 |