[Numpy-discussion] checksum on numpy float array

Andrew Collette h5py@alfven....
Fri Dec 5 14:28:43 CST 2008


> Another possibility would be to use HDF5 as a data container.  It 
> supports the fletcher32 filter [1] which basically computes a chuksum 
> for evey data chunk written to disk and then always check that the data 
> read satifies the checksum kept on-disk.  So, if the HDF5 layer doesn't 
> complain, you are basically safe.
> 
> There are at least two usable HDF5 interfaces for Python and NumPy: 
> PyTables[2] and h5py [3].  PyTables does have support for that right 
> out-of-the-box.  Not sure about h5py though (a quick search in docs 
> doesn't reveal nothing).
> 
> [1] http://rfc.sunsite.dk/rfc/rfc1071.html
> [2] http://www.pytables.org
> [3] http://h5py.alfven.org
> 
> Hope it helps,
> 

Just to confirm that h5py does in fact have fletcher32; it's one of the
options you can specify when creating a dataset, although it could use
better documentation:

http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create_dataset

Like other checksums, fletcher32 provides error-detection but not
error-correction.  You'll still need to throw away data which can't be
read.  However, I believe that you can still read sections of the
dataset which aren't corrupted.

Andrew Collette



More information about the Numpy-discussion mailing list