[Numpy-discussion] type 'numpy.int64' unhashable

Robert Kern robert.kern@gmail....
Fri Oct 30 11:44:11 CDT 2009

On Fri, Oct 30, 2009 at 08:11, James Bergstra <bergstrj@iro.umontreal.ca> wrote:
> On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux
> <gael.varoquaux@normalesup.org> wrote:
>> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote:
>>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase@gmail.com> wrote:
>>> > I understand where this error comes from, however what I was trying to
>>> > do seems to "intuitive" that I would like to ask for suggestions:
>>> > "What should I do if the "number" 2636 becomes unhashable ?"
>>> In your example, that's the array which is unhashable, the numbers
>>> itself should be hashable. Arrays are mutable, so I don't think you
>>> can easily make them hashable. You could transform everything into
>>> tuple of tuple of... if you need to use set, though.
>> Use md5's of their .data attribute. This works quite well (you might want
>> to hash a pickled string of the dtype in addition).
>> Gaël
> Careful... if your data is not contiguous in memory then you could be
> adding lots of random noise to your hash key by doing this.  This
> could cause equal ndarrays to hash to different values -- not good.
> Make sure memory is contiguous before hashing the .data.  Flatten()
> does this i think, as does copy(), array(), and many others.

.data doesn't work for non-contiguous arrays anyways. :-)

But all of this is irrelevant to the OP. First, I cannot replicate his problem.

In [12]: chainsA = np.arange(10, dtype=np.int64)

In [13]: set(chainsA)
Out[13]: set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Second, he seems to be interested in scalar objects, not arrays. The
scalar objects should all be hashable and comparable out-of-box and
ready to be used in sets and as dict keys. We will need a complete,
self-contained example that demonstrates the problem to get any
further with this.

Third, even if he wanted to use arrays as set elements, he couldn't
because such objects not only need to have __hash__ defined, they also
need __eq__ to return a bool. We return boolean arrays that cannot be
used as a truth value.

Fourth, even if arrays could be compared, you couldn't replace their
__hash__ method or tell set to use a different function in place of
the __hash__ method.

Fifth, even if you could tell set to use a different hash function,
you wouldn't use cryptographic hashes. You would just
hash(buffer(arr)) for contiguous arrays and hash(arr.tostring()) for
the rest.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the NumPy-Discussion mailing list