dennogumi/content/post/2006-11-25-a-simple-annotator.markdown at cf5d4f6d7ff75c4f2c801659be3fc8d051fff3e2

websites/dennogumi

Fork 0

Luca Beltrame 64b24842b8

continuous-integration/drone/push Build is passing

Details

Update all posts to not show the header text

2021-01-13 00:05:30 +01:00

1.2 KiB

Raw Blame History

author

categories

comments

date

header

slug

title

omit_header_text

disable_share

wordpress_id

einar

General

Linux

Science

true

2006-11-25T09:06:10Z

image_fullwidth
banner_other.jpg

a-simple-annotator

A simple annotator

true

132

In the past two days I've written a simple annotator program, that given an input list of RefSeq genes, automatically determines the relevant Entrez Gene IDs and annotates them using the flat files provided by the NCBI. A direct conversion was not possible due to limitations in Biopython's parsers, but I managed to use the GenBank parser to identify and extract the references to the Gene IDs (and putting them in a list).

Once that had been done, I created a series of dictionaries when reading the annotation file, for data such as gene name, symbol, chromosome and cytoband. Using the list I already obtained, it was easy to create a new file with the required fields.

During this process I learnt somewhat more about how to play with iterators to skip headings and so on. The code is not yet sufficiently generic, but once I finish toying with it, I may publish it for "general" (assuming anyone would use it) consumption, under GPL v2.

1.2 KiB Raw Blame History

1.2 KiB

Raw Blame History