[Numpy-discussion] [Fwd: compression in storage of Numeric/numarray objects]

Warren Focke focke at slac.stanford.edu
Fri Sep 9 13:56:10 CDT 2005


You may be able to avoid the tostring() overhead by using tofile():

s.tofile(gzip.open('compressed.dat', 'wb'))

You are probably SOL on the mmapping, though.

w

On Fri, 9 Sep 2005, Joost van Evert wrote:

> On Fri, 2005-09-09 at 15:06 -0500, John Hunter wrote:
> > >>>>> "Joost" == Joost van Evert <phjoost at gmail.com> writes:
> >
> >     Joost> is it possible to use compression while storing
> >     Joost> numarray/Numeric objects?
> >
> >
> > Sure
> >
> >     In [35]: s = rand(10000)
> >
> >     In [36]: file('uncompressed.dat', 'wb').write(s.tostring())
> >
> >     In [37]: ls -l uncompressed.dat
> >     -rw-r--r--  1 jdhunter jdhunter 80000 2005-09-09 15:04 uncompressed.dat
> >
> >     In [38]: gzip.open('compressed.dat', 'wb').write(s.tostring())
> >
> >     In [39]: ls -l compressed.dat
> >     -rw-r--r--  1 jdhunter jdhunter 41393 2005-09-09 15:04 compressed.dat
> >
> Thanks, this helps me, but I think not enough, because the arrays I work
> on are sometimes >1Gb(Correlation matrices). The tostring method would
> explode the size, and result in a lot of swapping. Ideally the
> compression also works with memmory mapped arrays.
>
> Greets,
>
> Joost
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>




More information about the Numpy-discussion mailing list