[Numpy-discussion] numexpr thoughts
tim.hochberg at cox.net
Tue Mar 7 10:35:02 CST 2006
Tim Hochberg wrote:
>> 3. Reduction. I figure this could be done at the end of the program in
>> each loop: sum or multiply the output register. Downcasting the
>> output could be done here too.
> Do you mean that sum() and friends could only occur as the outermost
> function. That is:
> would work, but:
> "where(a, sum(a+3*b), c)"
> would not? Or am I misunderstanding you here? I don't think I like
> that limitation if that's the case. I don' think it should be
> necessary either.
OK, I thought about this some more and I think I was mostly wrong. I
suspect that reduction does need to happen as the last step. Still it
would be nice for "where(a, sum(a+3*b), c)" to work. This could be done
by doing the following transformation:
a = evaluate("where(a, sum(a+3*b), c)") =>
temp=evaluate("sum(a+3*b)"); a = evaluate("where(a, temp, c)")
I suspect that this this would be fairly easy to do automagically as
long as it was at the Python level. That is, numexpr would return a
python object that would call the lower level interpreter.numexpr
appropriately. This would have some other advantages as well -- it would
make it easy to deal with keyword arguments for one. It would also make
it easy to do the bytecode rewriting if we decide to go that route. It
does add more call overhead, but if that turns out to be we can move
stuff into C later.
I'm still not excited about summing over the whole output buffer though.
That ends up allocating and scanning through a whole extra buffer which
may result in a signifigant speed and memory hit for large arrays. Since
if we're only doing this on the way out, there should be no problem just
allocating a single double (or complex) to do the sum in. On the way
in, this could be set to zero or one based on what the last opcode is
(sum or product). Then the SUM opcode could simply do something like:
BTW, the cleanup of the interpreter looks pretty slick. I'm going to
look at timings for using COPY_C versus using add directly and see about
reducing the number of opcodes. If this works out OK, the number
comparison opcodes could be reduced a lot. Sorry about munging the line
More information about the Numpy-discussion