--- author: einar categories: - Science comments: true date: "2007-11-15T19:57:16Z" header: image_fullwidth: banner_other.jpg slug: gene-identifiers tags: - annotation - bioinformatics - microarray - python title: Gene identifiers omit_header_text: true disable_share: true wordpress_id: 336 --- While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs...), but today I had a list of mixed identifiers. The subsequent idea was "let's implement auto-detection of common identifiers in the class". The problem is... is there any actual documentation on how identifiers are made? So far, using regular expressions, I've tracked down a few: * RefSeq * GenBank * Entrez Gene * UCSC Genome Browser * Ensembl However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up? (On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)