[Numpy-discussion] Multiplicity of an entry

Michael Droettboom mdroe@stsci....
Tue Oct 27 16:07:33 CDT 2009

Travis Oliphant wrote:
> On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
>> Christopher Barker wrote:
>>> Nadav Horesh wrote:
>>>> np.equal(a,a).sum(0)
>>>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>> true:
>>> In [25]: a
>>> Out[25]:
>>> array(['abc', 'def', 'abc', 'ghij'],
>>>       dtype='|S4')
>>> In [27]: np.equal(a,a)
>>> Out[27]: NotImplemented
>>> however:
>>> In [28]: a == a
>>> Out[28]: array([ True,  True,  True,  True], dtype=bool)
>>> don't they use the same code? or is "==" reverting to plain old  
>>> generic
>>> python sequence comparison, which would partly explain why it is so  
>>> slow.
>> It looks as if "a == a" (that is array_richcompare) is triggering
>> special case code for strings, so it is fast.  However, IMHO np.equal
>> should be made to work as well.  Can you file a bug and assign it to  
>> me
>> (I'm dealing with a number of other string-related things, so I  
>> might as
>> well take this too).
> The array_richcompare special-cased strings not for speed but for  
> actual functionality.
> Making np.equal work with strings requires changes to the ufunc code  
> itself which was never written to work with "variable-length" data- 
> types (like strings, unicode, and records).    There are several  
> things that would have to be fixed.   Some of the changes we made to  
> allow for date-time data-types also made it possible to support  
> variable-length strings, but this is non-trivial to implement.  It's  
> certainly possible, but I would want to look at any changes you make  
> before committing them to make sure all the issues are being understood.
Yeah -- I'm realizing this is a bigger project than I initially 
suspected.  I'll keep you posted if I find the time to do this right.


Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

More information about the NumPy-Discussion mailing list