[Numpy-discussion] Coercing object arrays to string (or unicode) arrays

Christopher Barker Chris.Barker@noaa....
Thu Sep 24 12:02:39 CDT 2009


Michael Droettboom wrote:
> As I'm looking into fixing a number of bugs in chararray, I'm running 
> into some surprising behavior.
> In [14]: x = np.array(['abcdefgh', 'ijklmnop'], 'O')
> 
> # Without specifying the length, it seems to default to sizeof(int)... ???
> In [15]: np.array(x, 'S')
> Out[15]:
> array(['abcd', 'ijkl'],
>        dtype='|S4')

This sure looks like a bug, and I'm no expert, but I suspect that it's 
the size of a pointer (you are on a 32 system -- I am), which makes a 
bit of sense, as Object arrays store a pointer to the python objects.

The question is, what should the array constructor do? perhaps the 
equivalent of:

In [41]: np.array(x.tolist())
Out[41]:
array(['abcdefgh', 'ijklmnop'],
       dtype='|S8')

which you could use as a work around.

Do you need to go through object arrays? could you go straight to a 
string array:

np.array(['abcdefgh', 'ijklmnop'], np.string_)
Out[35]:
array(['abcdefgh', 'ijklmnop'],
       dtype='|S8')

or just keep the strings in a list.

Object arrays are weird, I think there are a lot of corner cases.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list