***[Possible UCE]*** Re: [Numpy-discussion] Massive differences in numpy vs. numeric string handling

Travis Oliphant oliphant at ee.byu.edu
Wed Apr 12 15:47:04 CDT 2006


Tim Hochberg wrote:

>
> It seems a little wacky that 'S2' and 'S1' would have vastly different 
> behaviour.

True.   Much better is a compatibility function such as the one you gave.


>> This is a known missing feature due to the fact that comparisons use 
>> ufuncs but ufuncs are not supported for variable-length arrays.   
>> Currently, however you can use the chararray class which does allow 
>> comparisons of strings.
>
>
> It seems like this should be easy to worm around in __cmp__ (or 
> array_compare or however it's spelled). Since the strings really have 
> a fixed length, they're more or less equivalent to byte arrays with 
> one extra dimension. Writing a little lexographic comparison thing on 
> top of the results of a ufunc operating on the result of  a compare of 
> these byte arrays should be a piece of cake; in theory at least.

Yes, indeed it could be handled there as well.   It's the rich_compare 
function (all the cases are handled there...).   Right now, equality 
testing is special-cased a bit (inheriting behavior from Numeric). 

I've gone back and forth on whether I should put effort into handling 
variable-length arrays with ufuncs (which might be better long-term --- 
or just an example of feature bloat as I can't think of many use cases 
except this one),  or just special-case the needed comparisons (which 
would take less thought to implement).

I'm leaning towards the latter case --- special-case comparison of 
string arrays in the rich_compare function.   The next thing to think 
about is then Unicode arrays.  The problem with comparisons on unicode 
arrays though is "how do you compare unicode strings" in a meaningful 
way (i.e. what is alphabetical?).  

-Travis






More information about the Numpy-discussion mailing list