[Numpy-discussion] Question about np.savez
Tue Sep 1 22:39:29 CDT 2009
On 1-Sep-09, at 10:11 PM, Jorge Scandaliaris wrote:
> David Warde-Farley <dwf <at> cs.toronto.edu> writes:
>> If you actually want to save multiple arrays, you can use
>> savez('fname', *[a,b,c]) and they will be accessible under the names
>> arr_0, arr_1, etc. and a list of these names is in the 'files'
>> attribute on the NpzFile object. To retrieve your list of arrays when
>> you load, you can just do
>> mynewlist = [data[arrname] for arrname in data.files]
> Thanks for the tip. I have realized, though, that I might need some
> flexibility than just the ability to save ndarrays. The data I am
> dealing with
> is best kept in a hierarchical way (I could represent the structure
> ndarrays also, but I think it would be messy and difficult). I am
> having a look
> at h5py to see if it fulfill my needs. I know there is pytables,
> too, but from
> having a quick look it seems h5py is simpler. Am I right on this?.
I wouldn't say one is 'simpler' or 'more complicated'; they're
different in approach. From the h5py FAQ:
The two projects have different design goals. PyTables presents a
database-like approach to data storage, providing features like
indexing and fast "in-kernel" queries on dataset contents. It also has
a custom system to represent data types.
In contrast, h5py is an attempt to map the HDF5 feature set to NumPy
as closely as possible. For example, the high-level type system uses
NumPy dtype objects exclusively, and method and attribute naming
follows Python and NumPy conventions for dictionary and array access
(i.e. ".dtype" and ".shape" attributes for datasets, obj[name]
indexing syntax for groups, etc).
So, if you have huge amounts of data and you want to do complicated
queries on discontiguous subsets of it, PyTables is the clear winner.
The types systems are quite similar but there is some extra work
involved with PyTables. h5py, on the other hand, provides a nearly
complete wrapping of the HDF5 C API, in addition to the NumPy
The truth is, both of them/either of them integrate nicely with NumPy.
They have overlapping featuresets, just different design philosophies.
More information about the NumPy-Discussion