dennogumi/content/post/2007-10-09-soft-file-woes.markdown at 20ec946334fedbd3871bfce74864eb979ba95ec6

websites/dennogumi

Fork 0

Luca Beltrame 64b24842b8

continuous-integration/drone/push Build is passing

Details

Update all posts to not show the header text

2021-01-13 00:05:30 +01:00

1.6 KiB

Raw Blame History

author

tags

title

omit_header_text

disable_share

wordpress_id

einar

Science

true

2007-10-09T20:00:23Z

image_fullwidth
banner_other.jpg

soft-file-woes

bioinformatics

python

Science

software

SOFT file woes

true

298

Today I started working on a data set published on GEO. As the sample data were somehow inconsistent (they mentioned 23 controls when I found 28), I decided to parse the SOFT file from GEO in order to get the exact sample information.

I did a grave mistake. First of all, Biopython's SOFT parser is horribly broken (doesn't work at all) and quite undocumented: I could work around the lack of documentation (API docs) but not with the fact that it wouldn't work. So I turned to R, which offers a GEO query module through Bioconductor.

Again that proved to be a terrible mistake. For a file containing 183 samples, the analysis is going on since four hours and with no sign of completing anytime soon (not to mention a possible memory leak). After this, I gave up. I'm going to get the reduced data sheet and write a small parser in Python myself.

What is frustrating is the lack of quality: I could concentrate on my own work rather than reinventing the wheel for the nth time if the existing implementations worked. What's the point in releasing non-working software? I could understand bugs, but this is one step further.

1.6 KiB Raw Blame History

1.6 KiB

Raw Blame History