[Numpy-discussion] numarray: Possible hash collision problem

David M. Cooke cookedm at physics.mcmaster.ca
Wed Sep 28 11:46:40 CDT 2005

"Edward C. Jones" <edcjones at comcast.net> writes:

> hash(numarray.arange(1000)) == hash(numarray.arange(10000))
> The hash value changes each time I enter the Python interpreter. I have 
> always assumed that hashing was deterministic. Is it?

Not suprising: I also get this:

hash(object()) == hash(object())

Looking through the source, I think the hash for an array is
determined by the object base class, and hence is the id() of the
array. The code above can be written long hand as

a = numarray.arange(1000)
ha = hash(a) # in this case, hash(a) == id(a)
del a
b = numarray.arange(10000)
hb = hash(b) # in this case, hash(b) == id(b)
del b
ha == hb

It's those (implicit) del statements that mean that a and b are stored
to the same location in memory, and hence have the same id(): there's
no other object created in the interpreter between when a is deleted
and b is created.

Basically, id() of a object is guaranteed to be unique *amongst all
active objects*. It is _not_ guaranteed to be different from objects
that have been created and destroyed.

This will return false:
a = numarray.arange(1000)
b = numarray.arange(10000)
hash(a) == hash(b)

as a and b still both exist.

Since arrays are mutable, there's no good way to get a content-based hash.

|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca

More information about the Numpy-discussion mailing list