[Numpy-discussion] Re: Massive differences in numpy vs. numeric string handling

Robert Kern robert.kern at gmail.com
Wed Apr 12 15:57:06 CDT 2006


Tim Hochberg wrote:
> Jeremy Gore wrote:
> 
>> In Numeric:
>>
>> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,)
>> Numeric.array(['test','two']) ->
>> array([[t, e, s, t],
>>        [t, w, o,  ]],'c')
>>
>> but in numpy:
>>
>> numpy.array('test') -> array('test', dtype='|S4'); shape = ()
>> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = ()
>>
>> in fact you have to do an extra list cast:
>>
>> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); 
>> shape = (4,)
> 
> The creation of arrays from python objects is full of all kinds of weird
> special cases. For numerical arrays this is works pretty well , but for
> other sorts of arrays, like strings and even worse, objects, it's
> impossible to always guess the correct kind of thing to return. I'll
> leave it to the various string array users to battle it out over what's
> the right way to convert strings. However,  in the meantime or if you do
> not prevail in this debate, I suggest you slap an appropriate three line
> function into your code somewhere.

I would suggest this way of thinking about it: numpy.array() shouldn't have to
handle every possible way to construct an array. People building less-common
arrays from less-common Python objects may have to use a different constructor
if they want to do so in a natural way. Implementing every possible combination
in numpy.array() *and* making it intuitive and readable are incommensurate
goals, in my opinion.

> If all you care about is the interface issues use:
> 
>    def chararray(astring):
>        return numpy.array(list(astring), 'S1')
> 
> If you are worried about the performance of this, you could use the more
> cryptic, but more efficient:
> 
>    def chararray(astring):
>        a = numpy.array(astring)
>        return numpy.ndarray([len(astring)], 'S1', a.data)

Better:

In [31]: fromstring('test', dtype('S1'))
Out[31]:
array([t, e, s, t],
      dtype='|S1')

There's still the issue of N-D arrays of character, though.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco





More information about the Numpy-discussion mailing list