Thursday, April 9, 2009

Dealing with gzip file in Perl

There are several ways to process gzip compressed files in Perl.
I tried to show some of them according to the chronological order
that I experienced.

Year 2005

Before I joined Jong Park's lab, I have no chance to deal with gzip compressed file processing. But the situation changed. The first data set in gzip format is PDB data files, which contain the information for 3-D structure of protein.

I solve the problem just by 'uncompressed' the gzip files !!
Simple and easy, hur?

Year 2006

One day, my lab member let me know there's linux command 'zcat', which 'cat' the gzip compressed file. So I just used that command in my Perl script.

my @file=`zcat xx.gz`;
while( @file ){ .. }

Year 2007

I was not CPAN lover before 2007. So I didn't tried to find any solution from CPAN. But after I participated 'Agile programming education', I changed my mind and started to love CPAN as I used it.

The module I found at that time for this problem is 'PerlIO::gzip'.

use PerilIO::gzip;
open $fh,'<:gzip',"xx.gz";
while(<$fh>){.. }

Year 2009

Recently, I set up my new computer with Ubuntu 8.04 and 8.10 version. But there was a problem to install PerlIO::gzip module. It seemed that there's conflict between OS set-up and module's basic configuration. So I tried to find if there's another module to process gzip files in CPAN.

And there is as TMTOWTDI !

use IO::Zlib;
my $fh=IO::Zlib->new("xx.gz",'r');
while(<$fh>){ .. }