[Numpy-discussion] checksum on numpy float array

Francesc Alted faltet@pytables....
Mon Dec 8 12:01:36 CST 2008


A Sunday 07 December 2008, Brennan Williams escrigué:
> OK so maybe I should....
>
> (1) not add some sort of checksum type functionality to my read/write
> methods
>
>       these read/write methods simply read/write numpy arrays to a
> binary file which contains one or more numpy arrays (and nothing
> else).
>
> (2) replace my binary files iwith either HDF5 or PyTables
>
> But....
>
> my app is being used by clients on existing projects - in one case
> there are over 900 of these numpy binary files in just one project,
> albeit each file is pretty small (200KB or so)
>
> so.. questions.....
>
> How can I tranparently (or at least with minimum user-pain) replace
> my existing read/write methods with PyTables or HDF5?
>
> My initial thoughts are...
>
> (a) have an app version number and a data format version number which
> i can check against.
>
> (b) if data format version < 1.0  then read  from old  binary files
>
> (c) if app version number > 1.0 then write to new PyTables or HDF5
> files
>
> (d) get clients to open existing project and then save existing
> project to semi-transparently convert from old to new formats.

Yeah.  That would work perfectly.  Also, there is a function in PyTables 
named 'isHDF5File(filename)' that allow you to know whether a file is 
in HDF5 format or not.  You might want to use it and avoid to bother 
with data format/app version issues.

Cheers,

Francesc

>
> Francesc Alted wrote:
> > A Friday 05 December 2008, Andrew Collette escrigué:
> >>> Another possibility would be to use HDF5 as a data container.  It
> >>> supports the fletcher32 filter [1] which basically computes a
> >>> chuksum for evey data chunk written to disk and then always check
> >>> that the data read satifies the checksum kept on-disk.  So, if
> >>> the HDF5 layer doesn't complain, you are basically safe.
> >>>
> >>> There are at least two usable HDF5 interfaces for Python and
> >>> NumPy: PyTables[2] and h5py [3].  PyTables does have support for
> >>> that right out-of-the-box.  Not sure about h5py though (a quick
> >>> search in docs doesn't reveal nothing).
> >>>
> >>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html
> >>> [2] http://www.pytables.org
> >>> [3] http://h5py.alfven.org
> >>>
> >>> Hope it helps,
> >>
> >> Just to confirm that h5py does in fact have fletcher32; it's one
> >> of the options you can specify when creating a dataset, although
> >> it could use better documentation:
> >>
> >> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.cre
> >>ate _dataset
> >
> > My bad.  I've searched for 'fletcher' instead of 'fletcher32'.  I
> > naively thought that the search tool in Sphinx allowed for partial
> > name finding.  In fact, it is a pity it does not.
> >
> > Cheers,
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



-- 
Francesc Alted


More information about the Numpy-discussion mailing list