[Numpy-discussion] Nasty bug using pre-initialized arrays
Zachary Pincus
zpincus@stanford....
Mon Jan 7 12:47:52 CST 2008
>> For large arrays, it makes since to do automatic
>> conversions, as is also the case in functions taking output arrays,
>> because the typecast can be pushed down into C where it is time and
>> space efficient, whereas explicitly converting the array uses up
>> temporary space. However, I can imagine an explicit typecast
>> function,
>> something like
>>
>> a[...] = typecast(b)
>>
>> that would replace the current behavior. I think the typecast
>> function
>> could be implemented by returning a view of b with a castable flag
>> set
>> to true, that should supply enough information for the assignment
>> operator to do its job. This might be a good addition for Numpy 1.1.
>
> While that seems like an ok idea, I'm still not sure what's wrong with
> raising an exception when there will be information loss. The
> exception
> is already raised with standard python complex objects. I can
> think of
> many times in my code where explicit looping is a necessity, so
> pre-allocating the array is the only way to go.
The issue Charles is dealing with here is how to *suppress* the
proposed exception in cases (as the several that I described) where
the information loss is explicitly desired.
With what's currently in numpy now, you would have to do something
like this:
A[...] = B.astype(A.dtype)
to set a portion of A to B, unless you are *certain* that A and B are
of compatible types.
This is ugly and also bug-prone, seeing as how there's some violation
of the don't-repeat-yourself principle. (I.e. A is mentioned twice,
and to change the code to use a different array, you need to change
the variable name A twice.)
Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and
destroys a temporary array the same size as B. It's equivalent to:
temp = B.astype(A.dtype)
A[...] = temp
Which is no good if B is very large. Currently, the type conversion
in 'A[...] = B' cases is handled implicitly, deep down in the C code
where it is very very fast, and no temporary array is made.
Charles suggests a 'typecast' operator that would set a flag on the
array B so that trying to convert it would *not* raise an exception,
allowing for the fast, C-level conversion. (This assumes your
proposed change in which by default such conversions would raise
exceptions.) This 'typecast' operation wouldn't do anything but set a
flag, so it doesn't create a temporary array or do any extra work.
But still, every time that you are not *certain* what the type of a
result from a given operation is, any code that looks like:
A[i] = calculate(...)
will need to look like this instead:
A[i] = typecast(calculate(...))
I agree with others that such assignments aren't highly common, but
there will be broken code from this. And as Charles demonstrates,
getting the details right of how to implement such a change is non-
trivial.
Zach
More information about the Numpy-discussion
mailing list