[Numpy-discussion] Coercing object arrays to string (or unicode) arrays

Michael Droettboom mdroe@stsci....
Thu Sep 24 12:19:09 CDT 2009


On 09/24/2009 01:02 PM, Christopher Barker wrote:
> Michael Droettboom wrote:
>    
>> As I'm looking into fixing a number of bugs in chararray, I'm running
>> into some surprising behavior.
>> In [14]: x = np.array(['abcdefgh', 'ijklmnop'], 'O')
>>
>> # Without specifying the length, it seems to default to sizeof(int)... ???
>> In [15]: np.array(x, 'S')
>> Out[15]:
>> array(['abcd', 'ijkl'],
>>         dtype='|S4')
>>      
> This sure looks like a bug, and I'm no expert, but I suspect that it's
> the size of a pointer (you are on a 32 system -- I am), which makes a
> bit of sense, as Object arrays store a pointer to the python objects.
>    
That was my guess, too, but I haven't yet delved into the code.  I'm on 
32-bit as well.
> The question is, what should the array constructor do? perhaps the
> equivalent of:
>
> In [41]: np.array(x.tolist())
> Out[41]:
> array(['abcdefgh', 'ijklmnop'],
>         dtype='|S8')
>
> which you could use as a work around.
>    
Yes, that's the behaviour I was expecting.
> Do you need to go through object arrays? could you go straight to a
> string array:
>
> np.array(['abcdefgh', 'ijklmnop'], np.string_)
> Out[35]:
> array(['abcdefgh', 'ijklmnop'],
>         dtype='|S8')
>
> or just keep the strings in a list.
>    
The background here is that I'm fixing/resurrecting chararray, which 
provides vectorized versions of the standard Python string operations, 
endswith, ljust etc.

I was using object arrays when the length of the output string can't be 
determined ahead of time.  For example, the string __mod__ operator.  I 
could probably get away with generating a list of strings instead, but 
it's a little bit inconsistent with how I'm doing things elsewhere, 
which is always to generate an array.
> Object arrays are weird, I think there are a lot of corner cases.
>    
Yeah, that's been my experience.  But it would be nice to try to plug 
those corner cases up if possible.  I'll spend some time investigating 
this particular one.

Cheers,
Mike


More information about the NumPy-Discussion mailing list