All checks were successful
		
		
	
	continuous-integration/drone/push Build is passing
				
			
		
			
				
	
	
		
			45 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			45 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						|
author: einar
 | 
						|
categories:
 | 
						|
- Science
 | 
						|
comments: true
 | 
						|
date: "2007-11-15T19:57:16Z"
 | 
						|
header:
 | 
						|
  image_fullwidth: banner_other.jpg
 | 
						|
slug: gene-identifiers
 | 
						|
tags:
 | 
						|
- annotation
 | 
						|
- bioinformatics
 | 
						|
- microarray
 | 
						|
- python
 | 
						|
title: Gene identifiers
 | 
						|
omit_header_text: true
 | 
						|
disable_share: true
 | 
						|
wordpress_id: 336
 | 
						|
---
 | 
						|
 | 
						|
While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs...), but today I had a list of mixed identifiers.
 | 
						|
 | 
						|
The subsequent idea was "let's implement auto-detection of common identifiers in the class". The problem is... is there any actual documentation on how identifiers are made? So far, using regular expressions, I've tracked down a few:
 | 
						|
 | 
						|
 | 
						|
 | 
						|
	
 | 
						|
  * RefSeq
 | 
						|
 | 
						|
	
 | 
						|
  * GenBank
 | 
						|
 | 
						|
	
 | 
						|
  * Entrez Gene
 | 
						|
 | 
						|
	
 | 
						|
  * UCSC Genome Browser
 | 
						|
 | 
						|
	
 | 
						|
  * Ensembl
 | 
						|
 | 
						|
 | 
						|
However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?
 | 
						|
 | 
						|
(On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)
 |