[Numpy-discussion] Home for pyhdf5io?

Stephen Simmons mail@stevesimmons....
Sun May 24 07:23:22 CDT 2009


David Warde-Farley wrote:
> On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:
>> Actually my vision with pyhdf5io is to have hdf5 to replace numpy's
>> own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it)
>> should be the standard (binary) way to store data in scipy/numpy. A
>> bold statement, I know, but I think that it would be an improvement,
>> especially for those users how are replacing Matlab with sicpy/numpy.
>>     
> In that it introduces a dependency on pytables (and the hdf5 C  
> library) I doubt it would be something the numpy core developers would  
> be eager to adopt.
>
> The npy and npz formats (as best I can gather) exist so that there is  
> _some_ way of persisting data to disk that ships with numpy. It's not  
> meant necessarily as the best way, or as an interchange format, just  
> as something that works "out of the box", the code for which is  
> completely contained within numpy.
>
> It might be worth mentioning the limitations of numpy's built-in  
> save(), savez() and load() in the docstrings and recommending more  
> portable alternatives, though.
>
> David
>   

I tend to agree with David that PyTables is too big a dependency for 
inclusion in core Numpy. It does a lot more than simply loading and 
saving arrays.

While I haven't tried Andrew Collette's h5py 
(http://code.google.com/p/h5py), it looks like a very 'thin' wrapper 
around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), 
memmap() could be enhanced so that saving/loading files with HDF5-like 
file extensions used the HDF5 format, with code based on h5py and 
pyhdf5io. This could, I imagine, be a relatively small/simple addition 
to numpy, with the only external dependency being the HDF5 libraries 
themselves.

Stephen


More information about the Numpy-discussion mailing list