[Numpy-discussion] saving groups of numpy arrays to disk

Pauli Virtanen pav@iki...
Sun Aug 21 07:24:46 CDT 2011


On Sat, 20 Aug 2011 16:18:55 -0700, Chris Withers wrote:
> I've got a tree of nested dicts that at their leaves end in numpy arrays
> of identical sizes.
> 
> What's the easiest way to persist these to disk so that I can pick up
> with them where I left off?

Depends on your requirements.

You can use Python pickling, if you do *not* have a requirement for:

- real persistence, i.e., being able to easily read the data years later
- a standard data format
- access from non-Python programs
- safety against malicious parties (unpickling can execute some code
  in the input -- although this is possible to control)

then you can use Python pickling:

	import pickle

	file = open('out.pck', 'wb')
	pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL)
	file.close()

	file = open('out.pck', 'rb')
	tree = pickle.load(file)
	file.close()

This should just work (TM) directly with your tree-of-dicts-and-arrays.

> What's the most "correct" way to do so?
> 
> I'm using IPython if that makes things easier...
>
> I had wondered about PyTables, but that seems a bit too heavyweight for 
> this, unless I'm missing something?

If I had one or more of the requirements listed above, I'd use the HDF5
format, via either PyTables or h5py. If I'd just need to cache the trees,
then I'd use pickling.

I think the only reason to consider heavy-weighedness is distribution:
does your target audience have these libraries already installed
(they are pre-installed in several Python-for-science distributions),
and how difficult would it be for you to ship them with your stuff,
or to require the users to install them.

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list