[Numpy-discussion] Efficient operator overloading
Sturla Molden
sturla@molden...
Wed Apr 18 13:14:10 CDT 2007
On 4/18/2007 7:33 AM, Anne Archibald wrote:
>copying. And the scope of improvement would be very limited; an
expression like A*B+C*D would be much more efficient, probably, if the
whole expression were evaluated at once for each element (due to
memory locality and temporary allocation). But it is impossible for
numpy, sitting inside python as it does, to do that.
Most numerical array/matrix libraries dependent on operator overloading
generates temporaries. That is why Fortran is usually perceived as
superior to C++ for scientific programming. The Fortran compiler knows
about arrays and can avoid allocating three temporary arrays to evaluate
and expression like
y = a * b + c * d
If this expression is evaluated bya Fortran 90/95 compiler, it will
automatically generate code like
do i = 1,n
y(i) = a(i) * b(i) + c(i) * d(i)
enddo
On the other hand, conventional use of overloaded operators would result
in something like this:
allocate(tmp1,n)
do i = 1,n
tmp1(i) = a(i) * b(i)
enddo
allocate(tmp2,n)
do i = 1,n
tmp2(i) = c(i) * d(i)
enddo
allocate(tmp3,n)
do i = 1,n
tmp3(i) = tmp1(i) + tmp2(i)
enddo
deallocate(tmp1)
deallocate(tmp2)
do i = 1,n
y(i) = tmp3(i)
enddo
deallocate(tmp3)
Traversing memory is one of the most expensive thing a CPU can do. This
approach is therefore extremely inefficient compared with what a Fortran
compiler can do.
This does not mean that all use of operator overloading is inherently
bad. Notably, there is a C++ numerical library called Blitz++ which can
avoid these tempraries for small fixed-size arrays. As it depends on
template metaprogramming, the size must be known at compile time. But if
this is the case, it can be just as efficient as Fortran if the
optimizer is smart enough to remove the redundant operations. Most
modern C++ compilers is smart enough to do this. Note that it only works
for fixed size arrays. Fortran compilers can do this on a more general
basis. It is therefore advisable to have array syntax built into the
language syntax it self, as in Fortran 90/95 and Ada 95.
But if we implement the operator overloading a bit more intelligent, it
should be possible to get rid of most of the temporary arrays. We could
replace temporary arrays with an "unevaluated expression class" and let
the library pretend it is a compiler.
Let us assume again we have an expression like
y = a * b + c * d
where a,b,c and d are all arrays or matrices. In this case, the
overloaded * and + operators woud not return a temporary array but an
unevaluated expression of class Expr. Thus we would get
tmp1 = Expr('__mul__',a,b) # symbolic representation of 'a * b'
tmp2 = Expr('__mul__',c,d) # symbolic representation of 'c * d'
tmp3 = Expr('__add__',tmp1,tmp1) # symbolic "a * b + c * d"
del tmp1
del tmp2
y = tmp3 # y becomes a reference to an unevaluated expression
Finally, we need a mechanism to 'flush' the unevaluated expression.
Python does not allow the assignment operator to be evaluated, so one
could not depend on that. But one could a 'flush on read/write'
mechanism, and let an Expr object exist in different finite states (e.g.
unevaluated and evaluated). If anyone tries to read an element from y or
change any of the objects it involves, the expression gets evaluated
without temporaries. Before that, there is no need to evalute the
expression at all! We can just keep a symbolic representation of it.
Procrastination is good!
Thus later on...
x = y[i] # The expression 'a * b + c * d' gets evaluated. The
# object referred to by y now holds an actual array.
or
a[i] = 2.0 # The expression 'a * b + c * d' gets evaluated. The
# object referred to by y now holds an actual array.
# Finally, 2.0 is written to a[i].
or
y[i] = 2.0 # The expression 'a * b + c * d' gets evaluated. The
# object referred to by y now holds an actual array.
# Finally, 2.0 is written to y[i].
When the expression 'a * b + c * d' is finally evaluated, we should
through symbolic manipulation get something (at least close to) a single
efficient loop:
do i = 1,n
y(i) = a(i) * b(i) + c(i) * d(i)
enddo
I'd really like to see a Python extension library do this one day. It
would be very cool and (almost) as efficient as plain Fortran - though
not quite, we would still get some small temporary objects created. But
that is a sacrifice I am willing to pay to use Python. We would gain
some efficacy over Fortran by postponing indefinitely evaluation of
computations that are not needed, when this is not known at compile
time.
Any comments?
Sturla Molden
Ph.D.
More information about the Numpy-discussion
mailing list