[Numpy-discussion] Overloading numpy's ufuncs for better type coercion?
Hans Meine
meine@informatik.uni-hamburg...
Wed Jul 22 07:06:02 CDT 2009
Hi!
(This mail is a reply to a personal conversation with Ullrich Köthe, but is
obviously of a greater concern. This is about VIGRA's new NumPy-based python
bindings.) Ulli considers this behaviour of NumPy to be a bug:
In [1]: a = numpy.array([200], numpy.uint8)
In [2]: a + a
Out[2]: array([144], dtype=uint8)
However, this is well-known, often-discussed, and IMHO not really unexpected
for computer programmers who ever worked with C-like languages (even Java has
many such problems). Christian even said this is what he wants.
OTOH, I agree that it is a missing feature that NumPy performs "coercion
before the operation" (to be more precise: the temporary data type should be
promoted from the operand types, and /then/ the coercion - which can also
reduce the number of bits - should happen), to fix this strange behaviour:
In [3]: numpy.add(a, a, numpy.empty((1, ), dtype = numpy.uint32))
Out[3]: array([144], dtype=uint32)
Now, our opinions differ on how to deal with this - Ulli planned to overwrite
(more or less) all ufuncs in vigranumpy in order to return float32 (which is
sort of a common denominator and the type nearly all other vigranumpy
functions should accept). I see two main disadvantages here:
a) Choosing float32 seems to be arbitrary, and I'd like as much as possible of
vigranumpy to be independent from VIGRA and its particular needs. I have seen
so many requests (e.g. on the c++-sig mailing list) for *good* C++/boost-
python <-> numpy bindings that it would be a pity IMO to add special cases for
VIGRA by overloading __add__ etc.
b) Also, I find it unexpected and undesirable to change the behaviour of such
basic operations as addition on our ndarray-derived image types. IMO this
brings the danger of new users being confused about the different behaviours,
and even experienced vigranumpy users might eventually fall in the trap when
dealing with plain ndarrays and our derived types side-by-side.
Ideally, I'd like numpy to be "fixed" - I hope that the "only" obstacle is
that someone needs to do it, but I am afraid of someone mentioning the term
"backward-compatibility" (Ulli would surely call it "bug-compatibility" here
;-) ).
But in the end, I wonder how bad this really is for the VIGRA. AFAICS, the
main problem is that one needs to decide upon the pixel types for which to
export algorithms (in most cases, we'll use just float32, at least for a
start), and that when one loads images into arrays of the data type used in
the image file, one will often end up with uint8 arrays which cannot be passed
into many algorithms without an explicit conversion. However, is this really
a bad problem? For example, the conversion would typically have to be
performed only once (after loading), no? Then, why not simplify things
further by adding a dtype= parameter to importImage()? This could even
default to float32 then.
Looking forward to your opinions,
Hans
More information about the NumPy-Discussion
mailing list