[Numpy-discussion] fast numpy i/o

Derek Homeier derek@astro.physik.uni-goettingen...
Mon Jun 27 11:17:45 CDT 2011

On 21.06.2011, at 8:35PM, Christopher Barker wrote:

> Robert Kern wrote:
>> https://raw.github.com/numpy/numpy/master/doc/neps/npy-format.txt
> Just a note. From that doc:
> """
>     HDF5 is a complicated format that more or less implements
>     a hierarchical filesystem-in-a-file.  This fact makes satisfying
>     some of the Requirements difficult.  To the author's knowledge, as
>     of this writing, there is no application or library that reads or
>     writes even a subset of HDF5 files that does not use the canonical
>     libhdf5 implementation.
> """
> I'm pretty sure that the NetcdfJava libs, developed by Unidata, use 
> their own home-grown code. netcdf4 is built on HDF5, so that qualifies 
> as "a library that reads or writes a subset of HDF5 files". Perhaps 
> there are lessons to be learned there. (too bad it's Java)
> """
>     Furthermore, by
>     providing the first non-libhdf5 implementation of HDF5, we would
>     be able to encourage more adoption of simple HDF5 in applications
>     where it was previously infeasible because of the size of the
>     library.
> """
> I suppose this point is still true -- a C lib that supported a subset of 
> hdf would be nice.
> That being said, I like the simplicity of the .npy format, and I don't 
> know that anyone wants to take any of this on anyway.

Some late comments on the note (I was a bit surprised that HDF5 installation seems to be a serious hurdle to many - maybe I've just been profiting from the fink build system for OS X here - but I also was not aware that the current netCDF is built on downwards-compatibility to the HDF5 standard, something useful learnt again...:-)

Some more confusion arose when finding that the NCAR netCDF includes C and Fortran versions:
but they also depend actually on HDF5 for netCDF 4 access. While the Java version appears not to, it also only provides *read* access to those formats, so it probably would not be of that much help anyway. 

The netCDF4-Python package mentioned before  
unfortunately builds on HDF5 again, same for the PyNIO module 
which is probably explained by the above dependencies. 

Finally, the former Scientific.IO NetCDF interface is now part of scipy.io, but I assume it only supports netCDF 3 (the documentation is not specific about that). This might be the easiest option for a portable data format (if Matlab supports it). 


More information about the NumPy-Discussion mailing list