[SciPy-user] read/write compressed files
Thu Jun 21 05:57:02 CDT 2007
I meant bz2 over zlib due to higher compression, if slower performance.
This common belief was usually parallel to my experience. However, a
simple test below made with fresh morning data clearly undermines this
> du -hsc test9*.dat
> time gzip test9*.dat
> du -hsc test9*.dat.gz
> time gunzip test9*.dat.gz
> time bzip2 test9*.dat
> du -hsc test9*.dat.bz2
> time bunzip2 test9*.dat.bz2
I am surprised, as I well remember cases where I could gain 20%. But
indeed, given the much slower performance, you have me convinced to use
zlib over bz2.
thanks for forcing me to do this test,
Francesc Altet wrote:
> El dc 20 de 06 del 2007 a les 21:01 +0200, en/na Dominik Szczerba va
>> PyTables is great (and big) while I just need to read in a sequence of
> Ok, that's fine. In any case, I'm interested in knowing the reasons on
> why you are using bzip2 instead zlib. Have you detected some data
> pattern where you get significantly more compression than by using zlib
> for example?.
> I'm asking this because, in my experience with numerical data, I was
> unable to detect important compression level differences between bzip2
> and zlib. See:
> for some experiments in that regard.
> I'd appreciate any input on this subject (bzip2 vs zlib).
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
More information about the SciPy-user