[SciPy-dev] chararry array method
oliphant at ee.byu.edu
Wed Jan 4 13:19:08 CST 2006
Perry Greenfield wrote:
>On Dec 29, 2005, at 5:00 PM, Travis Oliphant wrote:
>>So, this is taking a buffer and chopping it into string bits.
>>Currently, the chararray array function does not take a buffer input.
>Yes, this is common for us as we usually create these from tables
>from files where some columns of the tables contain fixed width strings.
>It would be uncommon for the data buffer to contain only strings, but we
>generally need to create such arrays from data buffers.
Well, the new chararray function actually does support this (it was easy
enough to just do it).
Right now, the chararray's are essentially string and/or unicode
ndarray's with added methods for rich-comparisons, and the same methods
as strings and unicode objects. It's also a nice example for how to
do broadcasting in Python alone....
I would like to move the rich comparisions into the ndarray object at
some point (either by having ufuncs supported for extended types or by
special-casing the richcompare for string and unicode type ndarray's),
so that any string or unicode type can use them...
>I suppose this points to the fact that I'm not clear on what different
>roles the string array (and unicode) and character arrays play. In
>it was thought that eventually that character arrays would support all
>string methods (within reason considering the constraints of fixed
>that made it different enough from numeric arrays. Is this detailed
The string and unicode arrays are separate data-types for ndarray's.
They are supported at a fundamental level throughout the code base. In
other words you can have an ndarray of type (string, 30) (i.e. 'S30') or
(unicode, 45) (i.e. 'U45'). However, because ufuncs do not support
extended types at this time, and the richcompare for the ndarray
defaults to use ufuncs, rich comparisons don't work on them.
Now, it would be possible to make it so that the ndarray supported the
string methods for string and unicode arrays, but it also makes sense to
subclass for that kind of special support, which is what is done now.
>I tried finding it in the latest version of the Guide but it seems that
>topic of string arrays isn't discussed a lot. So a brief outline of how
>you see this working might help (e.g., should we really be working on
>enhancing the string array instead of focusing on character arrays?)
My thinking is that we should get at least the rich comparisons working
for string/unicode arrays (whether this makes sense by expanding the
ufuncs or simply special casing support for them in the richcomparison
function is an immediate question). I can see how it would be possible
(but not trivial) to do it in the ufuncs (which would make the ufunc
interface more flexible -- but maybe too flexible... I'm not sure I know
the use case beyond the comparisions).
Whether or not we should look at over-riding the getattribute function
to add string and unicode methods for all string/unicode chararrays is
another question, but that could also be done...
Then, again, it is an easy enough thing to wrap a string array in a
subclass if you really want to call the string methods on all the items
in the array... So, what is there now is workable and is essentially the
same as what was in numarray (I think numarray string comparison
functions are faster though --- they were compiled).
More information about the Scipy-dev