1
0
Fork 0
This repository has been archived on 2021-01-06. You can view files and clone it, but cannot push or open issues or pull requests.
dennogumi.org-archive/_posts/2008-06-29-dataframes-in-python-datamatrix.markdown

92 lines
2.8 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
author: einar
comments: true
date: 2008-06-29 08:13:55+00:00
layout: page
slug: dataframes-in-python-datamatrix
title: data.frames in Python - DataMatrix
wordpress_id: 405
categories:
- Linux
- Science
header:
image_fullwidth: "banner_other.jpg"
tags:
- programming
- python
---
For a long time I have tried to handle text files in Python in the same way that R's [data.frame](http://pbil.univ-lyon1.fr/library/base/html/data.frame.html) does - that is, direct access to columns and rows of a loaded text file. As I don't like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you'll find below.
<!-- more -->
The idea is to store the values of the text file as a dictionary of columns which includes then a list of (row name, row value) tuples. Like this, you can access the columns by their name (I need to see if it's workable to also use numbers), or you can view specific rows, including all or a subset of the columns. It's decently faster and it allows for non-sequential access, which you can't do when reading a file (or a file-like structure).
**Requirements**
I have tested this on Python 2.5.1. Older versions may or may not work. All modules called by this one should be shipped with Python itself.
**Download and installation
**
[Download the py file directly]({{ site.url }}/files/datamatrix.py). Currently there is no installation mechanism, so copy it wherever Python can find it.  There's [some API documentation]({{ site.url }}/files/datamatrix.html) generated with pydoc.
This module is licensed under the GNU General Public License, version 2.
**Usage**
First of all, import the module
[code lang="python"]
import datamatrix[/code]
Then open a file and instantiate a DataMatrix object
[code lang="python"]
fh = open("somefile.txt")
data = datamatrix.DataMatrix(fh)[/code]
By default no column with row names is specified, so if you have one, you have to specify it:
[code lang="python"]
data = datamatrix.DataMatrix(fh, row_names=1)
[/code]
More options are in the documentation.
Once the DataMatrix is initialized, you can view how many columns are there and also view rows with the getRow method:
[code lang="python"]
>> data.columns
["GeneID","Great_Exp1","Great_Exp2"]
>> data["Great_Exp1"]
[("Gene1",56.34),
...
]
>> data.getRow(5)
["NOT_EXISTENT","56.545","4.56"]
[/code]
Sometimes you'd want to get only the column without the row identifier, and that's where getColumn comes in:
[code lang="python"]
>> data.getColumn("Great_Exp1")
[56.34,2.55.....]
[/code]
Should you want to save a DataMatrix instance, you can use the writeMatrix function:
[code lang="python"]
datamatrix.writeMatrix(data,fname="/path/to/somewhere/file.txt")
[/code]
That's all. Questions and suggestions, especially on coding and improvements, are very welcome.