[Numpy-discussion] dtype comparison, hash

Robert Kern robert.kern@gmail....
Tue Jan 17 08:28:21 CST 2012


On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner
<lists@informa.tiker.net> wrote:
> Hi Robert,
>
> On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern@gmail.com> wrote:
>> On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner
>> <lists@informa.tiker.net> wrote:
>> > Hi Robert,
>> >
>> > On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern@gmail.com> wrote:
>> >> On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner
>> >> <lists@informa.tiker.net> wrote:
>> >> > Hi all,
>> >> >
>> >> > Two questions:
>> >> >
>> >> > - Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
>> >>
>> >> Yes.
>> >>
>> >> > - Are dtypes supposed to be hashable?
>> >>
>> >> Yes, with caveats. Strictly speaking, we violate the condition that
>> >> objects that equal each other should hash equal since we define == to
>> >> be rather free. Namely,
>> >>
>> >>   np.dtype(x) == x
>> >>
>> >> for all objects x that can be converted to a dtype.
>> >>
>> >>   np.dtype(float) == np.dtype('float')
>> >>   np.dtype(float) == float
>> >>   np.dtype(float) == 'float'
>> >>
>> >> Since hash(float) != hash('float') we cannot implement
>> >> np.dtype.__hash__() to follow the stricture that objects that compare
>> >> equal should hash equal.
>> >>
>> >> However, if you restrict the domain of objects to just dtypes (i.e.
>> >> only consider dicts that use only actual dtype objects as keys instead
>> >> of arbitrary mixtures of objects), then the stricture is obeyed. This
>> >> is a useful domain that is used internally in numpy.
>> >>
>> >> Is this the problem that you found?
>> >
>> > Thanks for the reply.
>> >
>> > It doesn't seem like this is our issue--instead, we're encountering two
>> > different dtype objects that claim to be float64, compare as equal, but
>> > don't hash to the same value.
>> >
>> > I've asked the user who encountered the user to investigate, and I'll
>> > be back with more detail in a bit.
>>
>> I think we've run into this before and tried to fix it. Try to find
>> the version of numpy the user has and a minimal example, if you can.
>
> This is what Thomas found:
>
> http://projects.scipy.org/numpy/ticket/2017

It looks like the .flags attribute is different between np.uintp and
np.uint32. The .flags attribute forms part of the hashed information
about the dtype (or PyArray_Descr at the C-level).

[~]
|15> np.dtype(np.uintp).flags
1536

[~]
|16> np.dtype(np.uint32).flags
2048

The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so
unlike the comment in the ticket, they do have different hashes for
me.

However, diving through the source a bit, I'm not entirely sure I
trust the values being given at the Python level. It appears that the
flag member of the PyArray_Descr struct is declared as a char.
However, it is exposed as a T_INT member in the PyMemberDef table by
direct addressing. Basically, a Python descriptor gets added to the
np.dtype type that will look up sizeof(long) bytes from the starting
position of the flags member in the struct. This includes 3 bytes of
the following type_num member. Obviously, 2048 does not fit into a
char. Nonetheless, the type_num is also part of the hash, so either
the flags member or the type_num member is different between the two.

Two bugs for the price of one!

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the NumPy-Discussion mailing list