[Numpy-discussion] Type of 1st argument in Numexpr where()
tim.hochberg at ieee.org
Wed Dec 20 15:29:57 CST 2006
Ivan Vilata i Balaguer wrote:
> Tim Hochberg (el 2006-12-20 a les 09:20:01 -0700) va dir::
>> Actually, this is on purpose. Numpy.where (and most other switching
>> constructs in Python) will switch on almost anything. In particular, any
>> number that is nonzero is considered True, zero is considered False. By
>> changing the signature, you're restricting where to only accepting
>> booleans. Since booleans and ints can by freely cast to doubles in
>> numexpr, always using float for the condition saves us a couple of opcodes.
> Yes, I understand the reasons you expose here. Nou you brought the
> topic about, I'm curious about what does "always using float for the
> condition saves us a couple of opcodes" mean. Could you explain this?
> Just for curiosity. :)
Let's look at simpler than where, which is a confusing function. How
about *sin*. Also, let's pretend complex numbers don't exist to make
things still simpler. There is only a single *sin* function defined in
the numexpr interpreter, and it operates on floats. This works because
the numexpr compiler is smart enough to insert cast opcodes to convert
boolean or integer types to floats before operating on the with the
*sin* opcode which strictly works on floats (remember we are pretending
complex numbers don't exist).
The situation with the first argument to where is analogous. Booleans
and ints are automagically promoted to floats. Since the opcode is
designed to work on floats everything works great. And, we only need a
single opcode to treat bools, ints and float. That is where "saving a
couple of opcodes" comes in. However::
1. Booleans are probably more common than floats as the argument to
where. At present floats are the most efficient case; other cases
incur some extra overhead due to casting.
2. It doesn't work for complex values.
Problem #2 is easily fixable, should we so desire, simply by adding
another opcode. Problem #1 is not so easy.
It would be possible to adapt your original idea. We could do the following:
1. Add a function boolean() to the numexpr namespace. This would cast
it's argument to an array of bools.
2. Tweak the compile (actually, probably where_func in
expressions.py) to compile where(x,a,b) as where(bool(x),a,b)
3. Change where to take bools as the first argument.
Or, maybe it would be cleaner to instead change the casting rules so
that casting to bool happens automagically. Having cycles in the casting
rules frightens me a bit, but it could probably be made to work.
So, in summary, I think that the general idea you proposed could be made
to work with some more effort. Conceptually, it's cleaner and it could
be made more efficient for the common case. On the downside, this would
require three new opcodes, as opposed to a single new opcode to do the
simple minded fix. So, I'm still a bit up in the air as to whether it's
a good idea.
More information about the Numpy-discussion