[Numpy-discussion] RFC: Detecting array changes (NumPy 2.0?)
Charles R Harris
Fri Mar 11 12:47:58 CST 2011
On Fri, Mar 11, 2011 at 11:41 AM, Dag Sverre Seljebotn <
> There's a few libraries out there that needs to know whether or not an
> array changed since the last time it was used: joblib and pymc comes to
> mind. I believe joblib computes a SHA1 or md5 hash of array contents,
> while pymc simply assume you never change an array and uses the id().
> The pymc approach is fragile, while in my case the joblib approach is
> too expensive since I'll call the function again many times in a row
> with the same large array (yes, I can code around it, but the code gets
> less streamlined).
> So, would it be possible to very quickly detect whether a NumPy array is
> guaranteed to not have changed? Here's a revision counter approach:
> 1) Introduce a new 64-bit int field "modification_count" in the array
> object struct.
> 2) modification_count is incremented any time it is possible that an
> array changes. In particular, PyArray_DATA would increment the counter.
> 3) A new PyArray_READONLYDATA is introduced that does not increment
> the counter, which can be used in strategic spots. However, the point is
> simply to rule out *most* sources of having to recompute a checksum for
> the array -- a non-matching modification_count is not a guarantee the
> array has changed, but an unmatched modification_count is a guarantee of
> an unchanged array
> 4) The counter can be ignored for readonly (base) arrays.
> 5a) A method is introduced Python-side,
> arr.checksum(algorithm="md5"|"sha1"), that uses this machinery to cache
> checksum computation and that can be plugged into joblib.
> 5b) Alternatively, the modification count is exposed directly to
> Python-side, and it is up to users to store the modification count (e.g.
> in a WeakKeyDictionary indexed by the array's base array).
> Another solution to the problem would be to allow registering event
> handlers. Main reason I'm not proposing that is because I don't want to
> spend the time to implement it (sounds a lot more difficult), it appears
> to be considerably less backwards-compatible, and so on.
> Why not a simple dirty flag? Because you'd need one for every possible
> application of this (e.g, md5 and sha1 would need seperate dirty flags,
> and other uses than hashing would need yet more flags, and so on).
What about views? Wouldn't it be easier to write another object wrapping an
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion