[Numpy-discussion] numpy.savez(_compressed) in loop

David Warde-Farley d.warde.farley@gmail....
Mon Oct 29 15:37:49 CDT 2012


On Mon, Oct 29, 2012 at 6:29 AM, Radek Machulka
<radek.machulka@gmail.com> wrote:
> Hi,
>
> is there a way how to save more arrays into single npy (npz if possible) file
> in loop? Something like:
>
> fw = open('foo.bar', 'wb')
> while foo:
>         arr = np.array(bar)
>         np.savez_compressed(fw, arr)
> fw.close()
>
> Or some workaround maybe? I go through hundreds of thousands arrays and can
> not keep them in memory. Yes, I can save each array into single file, but I
> would better have them all in single one.

Note that you can save several npy arrays into a single file
descriptor. The NPY format is smart enough that given the header,
numpy.load() knows how many items to load. Moreover, if passed an open
file object, numpy.load() leaves its cursor position intact, such that
something like this works:

In [1]: import numpy as np

In [2]: f = open('x.dat', 'wb')

In [3]: np.save(f, np.arange(5))

In [4]: np.save(f, np.arange(10))

In [5]: f.close()

In [6]: with open('x.dat', 'rb') as f:
   ...:     while True:
   ...:         try:
   ...:             print np.load(f)
   ...:         except IOError:
   ...:             break
   ...:
[0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9]

This is something of a hack. You could do better than catching the
IOError, e.g. saving a 0d array containing the number of forthcoming
arrays, or a sentinel array containing "None" at the end to indicate
there are no more real arrays to serialize.

Like Pauli said, it's probably worthwhile to consider HDF5.

David


More information about the NumPy-Discussion mailing list