# [Numpy-discussion] Numpy and PEP 343

Tim Hochberg tim.hochberg at cox.net
Tue Feb 28 15:15:02 CST 2006

```David M. Cooke wrote:

>Tim Hochberg <tim.hochberg at cox.net> writes:
>
>
>
>><pie-in-the-sky>
>>
>>An idea that has popped up from time to time is delaying evalution of
>>a complicated expressions so that the result can be computed more
>>efficiently. For instance, the matrix expression:
>>
>>a = b*c + d*e
>>
>>results in the creation of two, potentially large, temporary matrices
>>and also does a couple of extra loops at the C level than the
>>equivalent expression implemented in C would.
>>
>>The general idea has been to construct some sort of psuedo-object,
>>when the numerical operations are indicated, then do the actual
>>numerical operations at some later time. This would be very
>>problematic if implemented for all arrays since it would quickly
>>become impossible to figure out what was going on, particularly with
>>view semantics. However, it could result in large performance
>>improvements without becoming incomprehensible if implemented in small
>>enough chunks.
>>
>>A "straightforward" approach would look something like:
>>
>>   numpy.begin_defer()    # Now all numpy operations (in this thread)
>>   are deferred
>>   a = b*c + d*e # 'a' is a special object that holds pointers to
>>                         #  'b', 'c', 'd' and 'e' and knows what ops to
>>   perform.
>>   numpy.end_defer() # 'a' performs the operations and now looks like
>>   an array
>>
>>Since 'a' knows the whole series of operations in advance it can
>>perform them more efficiently than would be possible using the basic
>>numpy machinery. Ideally, the space for 'a' could be allocated up
>>front, and all of the operations could be done in a single loop. In
>>practice the optimization might be somewhat less ambitious, depending
>>on how much energy people put into this. However, this approach has
>>some problems. One is the syntax, which clunky and a bit unsafe (a
>>missing end_defer in a function could cause stuff to break very far
>>away). The other is that I suspect that this sort of deferred
>>evaluation makes multiple views of an array even more likely to bite
>>the unwary.
>>
>>
>
>This is a good idea; probably a bit difficult.
>
It's not original. I think this idea comes around periodically, but dies
from a combination of it being nontrivial and the resulting syntax being
too heavyweight.

>I don't like the global
>defer context though. That could get messy, especially if you start
>calling functions.
>
>
I'm not crazy about it either. You could localize it with an appropriate
(ab)use of sys._getframe, but that's another potential can of worms.
Something like:

class deferral:
frames = set()
def __enter__(self):
self.frame = sys._getframe(1)
def __exit__(self, *args):
self.frame = None

def should_defer():
return (sys._getframe(1) in deferral.frames)

Then:

with deferral():
#stuff

Should be localized to just 'stuff', even if it calls other
functions[1]. The details might be sticky though....

>>The syntax issue can be cleanly addressed now that PEP 343 (the 'with'
>>statement) is going into Python 2.5. Thus the above would look like:
>>
>>with numpy.deferral():
>>   a = b*c + d*e
>>
>>Just removing the extra allocation of temporary variables can result
>>in 30% speedup for this case[1], so the payoff would likely be large.
>>On the down side, it could be quite a can of worms, and would likely
>>require a lot of work to implement.
>>
>>
>
>Alternatively, make some sort of expression type:
>
>ex = VirtualExpression()
>
>ex.a = ex.b * ex.c + ex.d * ex.e
>
>then,
>
>compute = ex.compile(a=(shape_of_a, dtype_of_a), etc.....)
>
>This could return a function that would look like
>
>def compute(b, c, d, e):
>    a = empty(shape_of_a, dtype=dtype_of_a)
>    multiply(b, c, a)
>    # ok, I'm making this one up :-)
>    fused_multiply_add(d, e, a)
>    return a
>
>a = compute(b, c, d, e)
>
>
The syntax seems too heavy too me. It would be signifigantly lighter if
the explicit compile step is optional, allowing:

ex = VirtualExpression()

ex.a = ex.b * ex.c + ex.d * ex.e

a = ex(b=b, c=c, d=d, e=e)

'ex' could then figure out all of the sizes and types itself, create the function, compute the result. The created function would be cached and whenever the input parameters matched it would just be reused, so there shouldn't be too much more overhead than with compiled version you suggest. The syntax is still heavy relative to the 'with' version though.

>Or, it could use some sort of numeric-specific bytecode that can be
>interpreted quickly in C. With some sort of optimizing compiler for
>that bytecode it could be really fun (it could use BLAS when
>appropriate, for instance!).
>
>or ... use weave :-)
>
>
I'll have to look at weave again. Last time I looked at it (quite a
while ago) it didn't work for me. I can't recall if it was a licensing
issue or it didn't work with my compiler or what, but I decided I
couldn't use it.

-tim

[1] Here's an example that fakes with and tests deferral:

import sys

class deferral:
frames = set()
def __enter__(self):
self.frame = sys._getframe(1)
def __exit__(self, *args):
self.frame = None

def should_defer():
return (sys._getframe(1) in deferral.frames)

def f(n):
if not n:
return
if n % 4:
print "should_defer() =", should_defer(), "for n =", n
f(n-1)
else:
# This is a rough translation of:
# with deferral():
#    print "should_defer() =", should_defer(), "in f"
#    g(n-1)
d = deferral()
d.__enter__()
try:
print "should_defer() =", should_defer(), "for n =", n
f(n-1)
finally:
d.__exit__(None, None, None)

f(10)

```