[Numpy-discussion] Nasty bug using pre-initialized arrays

Ryan May rmay@ou....
Mon Jan 7 13:07:35 CST 2008


Zachary Pincus wrote:
>>> For large arrays, it makes since to do automatic
>>> conversions, as is also the case in functions taking output arrays,
>>> because the typecast can be pushed down into C where it is time and
>>> space efficient, whereas explicitly converting the array uses up
>>> temporary space. However, I can imagine an explicit typecast  
>>> function,
>>> something like
>>>
>>> a[...] = typecast(b)
>>>
>>> that would replace the current behavior. I think the typecast  
>>> function
>>> could be implemented by returning a view of b with a castable flag  
>>> set
>>> to true, that should supply enough information for the assignment
>>> operator to do its job. This might be a good addition for Numpy 1.1.
>> While that seems like an ok idea, I'm still not sure what's wrong with
>> raising an exception when there will be information loss.  The  
>> exception
>> is already raised with standard python complex objects.  I can  
>> think of
>> many times in my code where explicit looping is a necessity, so
>> pre-allocating the array is the only way to go.
> 
> The issue Charles is dealing with here is how to *suppress* the  
> proposed exception in cases (as the several that I described) where  
> the information loss is explicitly desired.
> 
> With what's currently in numpy now, you would have to do something  
> like this:
> A[...] = B.astype(A.dtype)
> to set a portion of A to B, unless you are *certain* that A and B are  
> of compatible types.
> 
> This is ugly and also bug-prone, seeing as how there's some violation  
> of the don't-repeat-yourself principle. (I.e. A is mentioned twice,  
> and to change the code to use a different array, you need to change  
> the variable name A twice.)
> 
> Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and  
> destroys a temporary array the same size as B. It's equivalent to:
> temp = B.astype(A.dtype)
> A[...] = temp
> 
> Which is no good if B is very large. Currently, the type conversion  
> in 'A[...] = B' cases is handled implicitly, deep down in the C code  
> where it is very very fast, and no temporary array is made.
> 
> Charles suggests a 'typecast' operator that would set a flag on the  
> array B so that trying to convert it would *not* raise an exception,  
> allowing for the fast, C-level conversion. (This assumes your  
> proposed change in which by default such conversions would raise  
> exceptions.) This 'typecast' operation wouldn't do anything but set a  
> flag, so it doesn't create a temporary array or do any extra work.
> 
> But still, every time that you are not *certain* what the type of a  
> result from a given operation is, any code that looks like:
> A[i] = calculate(...)
> will need to look like this instead:
> A[i] = typecast(calculate(...))
> 
> I agree with others that such assignments aren't highly common, but  
> there will be broken code from this. And as Charles demonstrates,  
> getting the details right of how to implement such a change is non- 
> trivial.
> 

I agree that some of the other options with typecast/astype look
horrible, as well as that unnecessary temporaries are bad.  Maybe I'm
too focused on just the complex case, where all you really need is to
use .real to get only the real part (and coincidentally works with float
and int arrays).  The nice part about using .real is that it explicitly
states what you're looking for.  I'm not personally interested in
anything other than this silent complex->float conversion.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


More information about the Numpy-discussion mailing list