[Numpy-discussion] Nasty bug using pre-initialized arrays

Zachary Pincus zpincus@stanford....
Mon Jan 7 12:47:52 CST 2008


>> For large arrays, it makes since to do automatic
>> conversions, as is also the case in functions taking output arrays,
>> because the typecast can be pushed down into C where it is time and
>> space efficient, whereas explicitly converting the array uses up
>> temporary space. However, I can imagine an explicit typecast  
>> function,
>> something like
>>
>> a[...] = typecast(b)
>>
>> that would replace the current behavior. I think the typecast  
>> function
>> could be implemented by returning a view of b with a castable flag  
>> set
>> to true, that should supply enough information for the assignment
>> operator to do its job. This might be a good addition for Numpy 1.1.
>
> While that seems like an ok idea, I'm still not sure what's wrong with
> raising an exception when there will be information loss.  The  
> exception
> is already raised with standard python complex objects.  I can  
> think of
> many times in my code where explicit looping is a necessity, so
> pre-allocating the array is the only way to go.

The issue Charles is dealing with here is how to *suppress* the  
proposed exception in cases (as the several that I described) where  
the information loss is explicitly desired.

With what's currently in numpy now, you would have to do something  
like this:
A[...] = B.astype(A.dtype)
to set a portion of A to B, unless you are *certain* that A and B are  
of compatible types.

This is ugly and also bug-prone, seeing as how there's some violation  
of the don't-repeat-yourself principle. (I.e. A is mentioned twice,  
and to change the code to use a different array, you need to change  
the variable name A twice.)

Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and  
destroys a temporary array the same size as B. It's equivalent to:
temp = B.astype(A.dtype)
A[...] = temp

Which is no good if B is very large. Currently, the type conversion  
in 'A[...] = B' cases is handled implicitly, deep down in the C code  
where it is very very fast, and no temporary array is made.

Charles suggests a 'typecast' operator that would set a flag on the  
array B so that trying to convert it would *not* raise an exception,  
allowing for the fast, C-level conversion. (This assumes your  
proposed change in which by default such conversions would raise  
exceptions.) This 'typecast' operation wouldn't do anything but set a  
flag, so it doesn't create a temporary array or do any extra work.

But still, every time that you are not *certain* what the type of a  
result from a given operation is, any code that looks like:
A[i] = calculate(...)
will need to look like this instead:
A[i] = typecast(calculate(...))

I agree with others that such assignments aren't highly common, but  
there will be broken code from this. And as Charles demonstrates,  
getting the details right of how to implement such a change is non- 
trivial.

Zach



More information about the Numpy-discussion mailing list