[Numpy-tickets] [NumPy] #516: Error transposing arrays with many dimensions [fix included]

NumPy numpy-tickets@scipy....
Thu May 10 14:39:37 CDT 2007


#516: Error transposing arrays with many dimensions [fix included]
------------------------+---------------------------------------------------
 Reporter:  gcross      |       Owner:  somebody
     Type:  defect      |      Status:  new     
 Priority:  highest     |   Milestone:          
Component:  numpy.core  |     Version:  devel   
 Severity:  critical    |    Keywords:          
------------------------+---------------------------------------------------
 == Symptom ==

 Numpy incorrectly reports the error "dimensions too large" when
 transposing arrays with many dimensions.  For example, the following code
 produces an error:

 {{{
 from numpy.random import rand
 arr1 = rand(*(2,)*16)
 arr2 = arr1.transpose(range(16))
 }}}

 Clearly the above should cause no problems, since not only should arr2 not
 be too big given we could already create arr1, but also in this particular
 case we are not even changing the indices of arr2!


 == Cause ==

 The problem can be traced to an erroneous assumption made in the C routine
 PyArray_Transpose.  In line 1875 of this routine, a call is made to
 PyArray_NewFromDescr to allocate space for a new array descriptor:

 {{{
         ret = (PyArrayObject *)\
                 PyArray_NewFromDescr(ap->ob_type,
                                      ap->descr,
                                      n, permutation,
                                      NULL, ap->data, ap->flags,
                                      (PyObject *)ap);
 }}}

 To see the problem, look at the fourth argument of the call:
 "permutation".  This passes in an array which contains the permuted
 indices, but in an argument which meant to specify the new dimensions of
 the array!  For arrays with, say, 16 dimensions, this confuses numpy into
 thinking that we are creating an array of size 16!=20922789888000, which
 is clearly not what we want.

 As far as I can tell, this was done to save a little memory.  The
 programmer preferred not to create intermediate arrays with the correct
 values for the dimensions and strides to pass into PyArray_NewFromDescr,
 so instead he figured that he could just pass any dummy array with the
 correct size into the "dims" argument, and then later write the correct
 dimensions and strides directly into the fields of the newly allocated
 descriptor.  However, when doing this he forgot that the "dims" argument
 is used to check whether the array being allocated is too big, so that you
 cannot pass just anything into it and expect the call to work.

 == Solution ==

 I am not sufficiently familiar with the code to know whether passing a
 dummy array into this field is even a good idea, but assuming that it is:
 a better choice would be the "dimensions" field of the old array "ap",
 i.e. "ap->dimensions", so that the call becomes

 {{{
         ret = (PyArrayObject *)\
                 PyArray_NewFromDescr(ap->ob_type,
                                      ap->descr,
                                      n, ap->dimensions,
                                      NULL, ap->data, ap->flags,
                                      (PyObject *)ap);
 }}}

 This seems to fix the problem for me.

 == DIFF ==

 numpy/core/src/multiarraymodule.c

 {{{
 1878c1878
 <                                    n, permutation,
 ---
 >                                    n, ap->dimensions,
 }}}

-- 
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/516>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list