[NumPy-Tickets] [NumPy] #1892: object arrays converted to string arrays of 'S' dtype have default length.

NumPy Trac numpy-tickets@scipy....
Mon Aug 15 05:11:42 CDT 2011


#1892: object arrays converted to string arrays of 'S' dtype have default length.
---------------------------------------+------------------------------------
 Reporter:  ehiggs                     |       Owner:  somebody   
     Type:  defect                     |      Status:  new        
 Priority:  normal                     |   Milestone:  Unscheduled
Component:  Other                      |     Version:  devel      
 Keywords:  vectorize string truncate  |  
---------------------------------------+------------------------------------

Comment(by ehiggs):

 Replying to [comment:4 charris]:
 > so I'm not sure if the bug is in vectorize itself or if in array. Is the
 truncation above desirable?

 There appears to be at least one bug with vectorize. If I've given an
 input dtype and an otypes argument to vectorize, I think any sort of array
 manipulation should be using one of these dtypes to know the length of the
 strings. However, this information is still lost. For example:

 {{{
 In [7]: numpy.vectorize(lambda x: x[:10],
 otypes=[numpy.dtype('|S10')])(numpy.array(['a'*20],
 dtype=numpy.dtype('|S20')))
 Out[7]:
 array(['aaaaaaaa'],
       dtype='|S8')

 }}}

 > Looks like numpy needs to know how long the string for the object arrays
 is going to be before creation and falls back to a default size instead of
 trying to determine it.

 I'm not sure how the interface to array should handle the default case.
 However, if I provide an otypes argument to vectorize, I think that this
 should help the array interface understand what I believe the length of
 the string to be. This didn't appear to be the case in the following
 example:

 {{{
 >>> numpy.vectorize(lambda x: x[:10],
 otypes=[numpy.dtype('|S10')])(numpy.array(['a'*20]))

 array(['aaaaaaaa'],
       dtype='|S8')
 }}}

 I'm not familiar with the underlying code, but perhaps vectorize could
 pick up the otypes argument and offer it as the dtype to any
 object->string conversions in the function. As jordigh pointed out, array
 creation works fine when the string length is specified, so this would at
 least protect vectorize from ignoring the intended output type.

 Thanks for taking a look.

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/1892#comment:5>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list