[Numpy-discussion] Coercing object arrays to string (or unicode) arrays

Michael Droettboom mdroe@stsci....
Wed Sep 23 14:18:24 CDT 2009


As I'm looking into fixing a number of bugs in chararray, I'm running 
into some surprising behavior.  One of the things chararray needs to do 
occasionally is build up an object array of string objects, and then 
convert that back to a fixed-length string array.  This length is 
sometimes predetermined by a recarray data structure.  Unfortunately, 
I'm not getting what I would expect when coercing or assigning an object 
array to a string array.  Is this a bug, or am I just going about this 
the wrong way?  If a bug, I'm happy to look into it as part of my 
"fixing chararray" task, but I just wanted to confirm that it is a bug 
before proceeding.

In [14]: x = np.array(['abcdefgh', 'ijklmnop'], 'O')

# Without specifying the length, it seems to default to sizeof(int)... ???
In [15]: np.array(x, 'S')
Out[15]:
array(['abcd', 'ijkl'],
       dtype='|S4')

In [21]: np.array(x, np.string_)
Out[21]:
array(['abcd', 'ijkl'],
       dtype='|S4')

# Specifying a length gives strange results
In [16]: np.array(x, 'S8')
Out[16]:
array(['abcdijkl', 'mnop\xe0\x01\x85\x08'],
       dtype='|S8')

# This is what I expected to happen above, but the cast to a list seems 
like it should be unnecessary
In [17]: np.array(list(x))
Out[17]:
array(['abcdefgh', 'ijklmnop'],
       dtype='|S8')

# Assignment also seems broken
In [18]: y = np.empty(x.shape, dtype='S8')

In [19]: y[:] = x[:]

In [20]: y
Out[20]:
array(['abcdijkl', 'mnop\xc05\xf9\xb7'],
       dtype='|S8')

Cheers,
Mike


More information about the NumPy-Discussion mailing list