[Numpy-discussion] Designing a new storage format for numpy recarrays

Zachary Pincus zachary.pincus@yale....
Fri Oct 30 09:26:21 CDT 2009


Unless I read your request or the documentation wrong, h5py already  
supports pulling specific fields out of "compound data types":

http://h5py.alfven.org/docs-1.1/guide/hl.html#id3

> For compound data, you can specify multiple field names alongside  
> the numeric slices:
> >>> dset["FieldA"]
> >>> dset[0,:,4:5, "FieldA", "FieldB"]
> >>> dset[0, ..., "FieldC"]

Is this latter style of access what you were asking for? (Or is the  
problem that it's not fast enough in hdf5, even with the shuffle  
filter, etc?)

So then the issue is that there's a dependency on hdf5 and h5py? (or  
if you want to access LZF-compressed files without h5py, a dependency  
on hdf5 and the C LZF compressor?). This is pretty lightweight,  
especially if you're proposing writing new code which itself would be  
a dependency. So your new code couldn't depend on *anything* else if  
you wanted it to be a fewer-dependencies option than hdf5+h5py, right?

Zach


More information about the NumPy-Discussion mailing list