[Numpy-discussion] Coercing object arrays to string (or unicode) arrays

Michael Droettboom mdroe@stsci....
Thu Sep 24 13:23:57 CDT 2009


I have filed a bug against this, along with a patch that fixes casting 
to fixed-size string arrays:

http://projects.scipy.org/numpy/ticket/1235

Undefined-sized string arrays is a harder problem, which I'm deferring 
for later.

Mike

On 09/24/2009 01:19 PM, Michael Droettboom wrote:
> On 09/24/2009 01:02 PM, Christopher Barker wrote:
>    
>> Michael Droettboom wrote:
>>
>>      
>>> As I'm looking into fixing a number of bugs in chararray, I'm running
>>> into some surprising behavior.
>>> In [14]: x = np.array(['abcdefgh', 'ijklmnop'], 'O')
>>>
>>> # Without specifying the length, it seems to default to sizeof(int)... ???
>>> In [15]: np.array(x, 'S')
>>> Out[15]:
>>> array(['abcd', 'ijkl'],
>>>          dtype='|S4')
>>>
>>>        
>> This sure looks like a bug, and I'm no expert, but I suspect that it's
>> the size of a pointer (you are on a 32 system -- I am), which makes a
>> bit of sense, as Object arrays store a pointer to the python objects.
>>
>>      
> That was my guess, too, but I haven't yet delved into the code.  I'm on
> 32-bit as well.
>    
>> The question is, what should the array constructor do? perhaps the
>> equivalent of:
>>
>> In [41]: np.array(x.tolist())
>> Out[41]:
>> array(['abcdefgh', 'ijklmnop'],
>>          dtype='|S8')
>>
>> which you could use as a work around.
>>
>>      
> Yes, that's the behaviour I was expecting.
>    
>> Do you need to go through object arrays? could you go straight to a
>> string array:
>>
>> np.array(['abcdefgh', 'ijklmnop'], np.string_)
>> Out[35]:
>> array(['abcdefgh', 'ijklmnop'],
>>          dtype='|S8')
>>
>> or just keep the strings in a list.
>>
>>      
> The background here is that I'm fixing/resurrecting chararray, which
> provides vectorized versions of the standard Python string operations,
> endswith, ljust etc.
>
> I was using object arrays when the length of the output string can't be
> determined ahead of time.  For example, the string __mod__ operator.  I
> could probably get away with generating a list of strings instead, but
> it's a little bit inconsistent with how I'm doing things elsewhere,
> which is always to generate an array.
>    
>> Object arrays are weird, I think there are a lot of corner cases.
>>
>>      
> Yeah, that's been my experience.  But it would be nice to try to plug
> those corner cases up if possible.  I'll spend some time investigating
> this particular one.
>
> Cheers,
> Mike
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    



More information about the NumPy-Discussion mailing list