dennogumi/content/post/2006-07-08-the-power-of-the-shell.markdown

---
author: einar
categories:
- General
- Linux
- Science
comments: true
date: "2006-07-08T08:16:30Z"
header:
  image_fullwidth: banner_other.jpg
slug: the-power-of-the-shell
title: The power of the shell
omit_header_text: true
disable_share: true
wordpress_id: 86
---

Yesterday I was trying to adjust some files in order to make a program use Affymetrix SNP arrays data (instead of arrayCGH data like the program was designed for). I had a big (116,000 rows) tab-delimited text file and I needed to use only part of the columns there.

<!--more-->

Most people would just try to use Excel (ugh) but since it has way too many limitations, it is unstable, and runs on Windows, I had to use other ways. The _awk_ command is what I needed, given the fact that my input was a text file:
[code]awk ' { print $1"\t"$7 } ' CAKI1_CNAT.txt  > CAKI-1.txt
awk ' { print $1"\tchr"$2"\t"$3"\t"$3 } ' CAKI1_CNAT.txt  > CAKI-1.ann [/code]

With two commands I created the two files I needed for the obscure software I was testing and without a single headache. The first one created a file with only columns 1 and 7, while the second only with the first three columns, adding "chr" to the text in the second column.

A simpler and more elegant solution would have probably been using _cut_ for the first file:
[code]cut -f1,7 CAKI1_CNAT.txt > CAKI-1.txt[/code]

Either way, these are things that make my job easier. Try doing that with cmd.exe.