[Numpy-discussion] chararray behavior

Alan McIntyre alan.mcintyre@gmail....
Wed Jul 9 00:49:59 CDT 2008


On Tue, Jul 8, 2008 at 3:30 PM, Anne Archibald
<peridot.faceted@gmail.com> wrote:
> In particular, the returned type is always "string of length four",
> which is very peculiar - why four?  I realize that variable-length
> strings are a problem (object arrays, I guess?), as is returning
> arrays of varying dtypes (strings of length N), but this definitely
> violates the principle of least surprise...

Hmm..__mul__ calculates the required size of the result array, but the
result of the calculation is a numpy.int32.  So ndarray__new__ is
given this int32 as the itemsize argument, and it looks like the
itemsize of the argument (rather than its contained value) is used as
the itemsize of the new array:

>>> np.chararray((1,2), itemsize=5)
chararray([[';<f', '\x00\x00\x00@']],
      dtype='|S5')
>>> np.chararray((1,2), itemsize=np.int32(5))
chararray([['{5', '']],
      dtype='|S4')
>>> np.chararray((1,2), itemsize=np.int16(5))
chararray([['{5', '']],
      dtype='|S2')

Is this expected behavior?  I can fix this particular case by forcing
the calculated size to be a Python int, but this treatment of the
itemsize argument seems like it might be an easy way to cause subtle
bugs.


More information about the Numpy-discussion mailing list