[Numpy-discussion] dtype comparison and hashing
Geoffrey Irving
irving@naml...
Sat Oct 18 17:43:38 CDT 2008
On Wed, Oct 15, 2008 at 12:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Wed, Oct 15, 2008 at 02:20, Geoffrey Irving <irving@naml.us> wrote:
>> Hello,
>>
>> Currently in numpy comparing dtypes for equality with == does an
>> internal PyArray_EquivTypes check, which means that the dtypes NPY_INT
>> and NPY_LONG compare as equal in python. However, the hash function
>> for dtypes reduces id(), which is therefore inconsistent with ==.
>> Unfortunately I can't produce a python snippet showing this since I
>> don't know how to create a NPY_INT dtype in pure python.
>>
>> Based on the source it looks like hash should raise a type error,
>> since tp_hash is null but tp_richcompare is not. Does the following
>> snippet through an exception for others?
>>
>>>>> import numpy
>>>>> hash(numpy.dtype('int'))
>> 5708736
>>
>> This might be the problem:
>>
>> /* Macro to get the tp_richcompare field of a type if defined */
>> #define RICHCOMPARE(t) (PyType_HasFeature((t), Py_TPFLAGS_HAVE_RICHCOMPARE) \
>> ? (t)->tp_richcompare : NULL)
>>
>> I'm using the default Mac OS X 10.5 installation of python 2.5 and
>> numpy, so maybe those weren't compiled correctly. Has anyone else
>> seen this issue?
>
> Actually, the problem is that we provide a hash function explicitly.
> In multiarraymodule.c:
>
> PyArrayDescr_Type.tp_hash = (hashfunc)_Py_HashPointer;
>
> That is a violation of the hashing protocol (objects which compare
> equal and are hashable need to hash equal), and should be fixed.
Thanks for finding that.
Geoffrey
More information about the Numpy-discussion
mailing list