***[Possible UCE]*** Re: [Numpy-discussion] Massive differences in numpy vs. numeric string handling
oliphant at ee.byu.edu
Wed Apr 12 15:47:04 CDT 2006
Tim Hochberg wrote:
> It seems a little wacky that 'S2' and 'S1' would have vastly different
True. Much better is a compatibility function such as the one you gave.
>> This is a known missing feature due to the fact that comparisons use
>> ufuncs but ufuncs are not supported for variable-length arrays.
>> Currently, however you can use the chararray class which does allow
>> comparisons of strings.
> It seems like this should be easy to worm around in __cmp__ (or
> array_compare or however it's spelled). Since the strings really have
> a fixed length, they're more or less equivalent to byte arrays with
> one extra dimension. Writing a little lexographic comparison thing on
> top of the results of a ufunc operating on the result of a compare of
> these byte arrays should be a piece of cake; in theory at least.
Yes, indeed it could be handled there as well. It's the rich_compare
function (all the cases are handled there...). Right now, equality
testing is special-cased a bit (inheriting behavior from Numeric).
I've gone back and forth on whether I should put effort into handling
variable-length arrays with ufuncs (which might be better long-term ---
or just an example of feature bloat as I can't think of many use cases
except this one), or just special-case the needed comparisons (which
would take less thought to implement).
I'm leaning towards the latter case --- special-case comparison of
string arrays in the rich_compare function. The next thing to think
about is then Unicode arrays. The problem with comparisons on unicode
arrays though is "how do you compare unicode strings" in a meaningful
way (i.e. what is alphabetical?).
More information about the Numpy-discussion