[NumPy-Tickets] [NumPy] #1892: object arrays converted to string arrays of 'S' dtype have default length.
NumPy Trac
numpy-tickets@scipy....
Mon Aug 15 05:11:42 CDT 2011
#1892: object arrays converted to string arrays of 'S' dtype have default length.
---------------------------------------+------------------------------------
Reporter: ehiggs | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: Unscheduled
Component: Other | Version: devel
Keywords: vectorize string truncate |
---------------------------------------+------------------------------------
Comment(by ehiggs):
Replying to [comment:4 charris]:
> so I'm not sure if the bug is in vectorize itself or if in array. Is the
truncation above desirable?
There appears to be at least one bug with vectorize. If I've given an
input dtype and an otypes argument to vectorize, I think any sort of array
manipulation should be using one of these dtypes to know the length of the
strings. However, this information is still lost. For example:
{{{
In [7]: numpy.vectorize(lambda x: x[:10],
otypes=[numpy.dtype('|S10')])(numpy.array(['a'*20],
dtype=numpy.dtype('|S20')))
Out[7]:
array(['aaaaaaaa'],
dtype='|S8')
}}}
> Looks like numpy needs to know how long the string for the object arrays
is going to be before creation and falls back to a default size instead of
trying to determine it.
I'm not sure how the interface to array should handle the default case.
However, if I provide an otypes argument to vectorize, I think that this
should help the array interface understand what I believe the length of
the string to be. This didn't appear to be the case in the following
example:
{{{
>>> numpy.vectorize(lambda x: x[:10],
otypes=[numpy.dtype('|S10')])(numpy.array(['a'*20]))
array(['aaaaaaaa'],
dtype='|S8')
}}}
I'm not familiar with the underlying code, but perhaps vectorize could
pick up the otypes argument and offer it as the dtype to any
object->string conversions in the function. As jordigh pointed out, array
creation works fine when the string length is specified, so this would at
least protect vectorize from ignoring the intended output type.
Thanks for taking a look.
--
Ticket URL: <http://projects.scipy.org/numpy/ticket/1892#comment:5>
NumPy <http://projects.scipy.org/numpy>
My example project
More information about the NumPy-Tickets
mailing list