[Numpy-discussion] dtype comparison and hashing

Robert Kern robert.kern@gmail....
Wed Oct 15 14:56:52 CDT 2008


On Wed, Oct 15, 2008 at 02:20, Geoffrey Irving <irving@naml.us> wrote:
> Hello,
>
> Currently in numpy comparing dtypes for equality with == does an
> internal PyArray_EquivTypes check, which means that the dtypes NPY_INT
> and NPY_LONG compare as equal in python.  However, the hash function
> for dtypes reduces id(), which is therefore inconsistent with ==.
> Unfortunately I can't produce a python snippet showing this since I
> don't know how to create a NPY_INT dtype in pure python.
>
> Based on the source it looks like hash should raise a type error,
> since tp_hash is null but tp_richcompare is not.  Does the following
> snippet through an exception for others?
>
>>>> import numpy
>>>> hash(numpy.dtype('int'))
> 5708736
>
> This might be the problem:
>
> /* Macro to get the tp_richcompare field of a type if defined */
> #define RICHCOMPARE(t) (PyType_HasFeature((t), Py_TPFLAGS_HAVE_RICHCOMPARE) \
>                         ? (t)->tp_richcompare : NULL)
>
> I'm using the default Mac OS X 10.5 installation of python 2.5 and
> numpy, so maybe those weren't compiled correctly.  Has anyone else
> seen this issue?

Actually, the problem is that we provide a hash function explicitly.
In multiarraymodule.c:

    PyArrayDescr_Type.tp_hash = (hashfunc)_Py_HashPointer;

That is a violation of the hashing protocol (objects which compare
equal and are hashable need to hash equal), and should be fixed.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Numpy-discussion mailing list