[Numpy-discussion] checksum on numpy float array
Fri Dec 5 11:42:00 CST 2008
A Friday 05 December 2008, Brennan Williams escrigué:
> Robert Kern wrote:
> > On Thu, Dec 4, 2008 at 18:54, Brennan Williams
> > <firstname.lastname@example.org> wrote:
> >> Thanks
> >> email@example.com wrote:
> >>> I didn't check what this does behind the scenes, but try this
> >> import hashlib #standard python library
> >> import numpy as np
> >>> m = hashlib.md5()
> >>> m.update(np.array(range(100)))
> >>> m.update(np.array(range(200)))
> > I would recommend doing this on the strings before you make arrays
> > from them. You don't know if the network cut out in the middle of
> > an 8-byte double.
> > Of course, sending the lengths and other metadata first, then the
> > data would let you check without needing to do expensivish hashes
> > or checksums. If truncation is your problem rather than corruption,
> > then that would be sufficient. You may also consider using the NPY
> > format in numpy 1.2 to implement that.
> Thanks for the ideas. I'm definitely going to add some more basic
> checks on lengths etc as well.
> Unfortunately the problem is happening at a client site so (a) I
> can't reproduce it and (b) most of the
> time they can't reproduce it either. This is a Windows Python app
> running on Citrix reading/writing data
> to a Linux networked drive.
Another possibility would be to use HDF5 as a data container. It
supports the fletcher32 filter  which basically computes a chuksum
for evey data chunk written to disk and then always check that the data
read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't
complain, you are basically safe.
There are at least two usable HDF5 interfaces for Python and NumPy:
PyTables and h5py . PyTables does have support for that right
out-of-the-box. Not sure about h5py though (a quick search in docs
doesn't reveal nothing).
Hope it helps,
More information about the Numpy-discussion