[Numpy-discussion] optimizing power() for complex and real cases

Tim Hochberg tim.hochberg at cox.net
Sun Feb 19 19:36:12 CST 2006

David M. Cooke wrote:

>On Sat, Feb 18, 2006 at 06:17:47PM -0700, Tim Hochberg wrote:
>>OK, I now have a faily clean implementation in C of:
>>def __pow__(self, p):
>>   if p is not a scalar:
>>      return power(self, p)
>>   elif p == 1:
>>      return p
>>   elif p == 2:
>>      return square(self)
>>#    elif p == 3:
>>#       return cube(self)
>>#    elif p == 4:
>>#       return power_4(self)
>>#    elif p == 0:
>>#       return ones(self.shape, dtype=self.dtype)
>>#    elif p == -1:
>>#       return 1.0/self
>>   elif p == 0.5:
>>      return sqrt(self)
>>First a couple of technical questions, then on to the philosophical portion 
>>of this note.
>>1. Is there a nice fast way to get a matrix filled with ones from C. I've 
>>been tempted to write a ufunc 'ones_like', but I'm afraid that might be 
>>considered inappropriate.
>>2. Are people aware that array_power is sometimes passed non arrays as its 
>>first argument? Despite having the signature:
>>array_power(PyArrayObject *a1, PyObject *o2) 
>>This caused me almost no end of headaches, not to mention crashes during 
>Yes; because it's the implementation of __pow__, the second argument can
>be anything.
No, you misunderstand.. What I was talking about was that the *first* 
argument can also be something that's not a PyArrayObject, despite the 
functions signature.

>>I'll check this into the power_optimization branch RSN, hopefully with a 
>>fix for the zero power case. Possibly also after extending it to inplace 
>>power as well.
>>OK, now on to more important stuff. As I've been playing with this my 
>>opinion has gone in circles a couple of times. I now think the issue of 
>>optimizing integer powers of complex numbers and integer powers of floats 
>>are almost completely different. Because complex powers are quite slow and 
>>relatively inaccurate, it is appropriate to optimize them for integer 
>>powers at the level of nc_pow. This should be just a matter of liberal 
>>borrowing from complexobject.c, but I haven't tried it yet.
>>On the other hand, real powers are fast enough that doing anything at the 
>>single element level is unlikely to help. So in that case we're left with 
>>either optimizing the cases where the dimension is zero as David has done, 
>>or optimizing at the __pow__ (AKA array_power) level as I've done now based 
>>on David's original suggestion. This second approach is faster because it 
>>avoids the mysterious python scalar -> zero-D array conversion overhead. 
>>However, it suffers if we want to optimize lots of different powers since 
>>one needs a ufunc for each one. So the question becomes, which powers 
>>should we optimize?
>Hmm, ufuncs are passed a void* argument for passing info to them. Now,
>what that argument is defined when the ufunc is created, but maybe
>there's a way to piggy-back on it.
Yeah, I really felt like I was fighting the ufuncs when I was playing 
with this. On the one hand, you really want to use the ufunc machinery. 
On the other hand that forces you into using the same types for both 
arguments. That really wouldn't be a problem, since we could just define 
an integer_power that took doubles, but did integer powers, except for 
the conversion overhead of Python_Integers into arrays. It looks like 
you started down this road and I played with this as well.   I can think 
a of at least one (horrible) way around the matrix overhead, but the 
real fix would be to dig into PyArray_EnsureArray and see why it's slow 
for Python_Ints. It is much faster for numarray scalars.

Another approach is to actually compute (x*x)*(x*x) for pow(x,4) at the 
level of array_power. I think I could make this work. It would probably 
work well for medium size arrays, but might well make things worse for 
large arrays that are limited by memory bandwidth since it would need to 
move the array from memory into the cache multiple times.

>>My latest thinking on this is that we should optimize only those cases 
>>where the optimized result is no less accurate than that produced by pow. 
>>I'm going to assume that all C operations are equivalently accurate, so 
>>pow(x,2) has roughly the same amount of error as x*x. (Something on the 
>>order of 0.5 ULP I'd guess). In that case:
>>  pow(x, -1) -> 1 / x
>>  pow(x, 0) -> 1
>>  pow(x, 0.5) -> sqrt(x)
>>  pow(x, 1) -> x
>>  pow(x, 2) -> x*x
>>can all be implemented in terms of multiply or divide with the same 
>>accuracy as the original power methods. Once we get beyond these, the error 
>>will go up progressively.
>>The minimal set described above seems like it should be relatively 
>>uncontroversial and it's what I favor. Once we get beyond this basic set, 
>>we would need to reach some sort of consensus on how much additional error 
>>we are willing to tolerate for optimizing these extra cases. You'll notice 
>>that I've changed my mind, yet again, over whether to optimize A**0.5. 
>>Since the set of additional ufuncs needed in this case is relatively small, 
>>just square and inverse (==1/x), this minimal set works well if optimizing 
>>in pow as I've done.

Just to add a little more confusion to the mix. I did a little testing 
to see how close pow(x,n) and x*x*... actually are. They are slightly 
less close for small values of N and slightly closer for large values of 
N than I would have expected. The upshot of this is that integer powers 
between  -2 and +4 all seem to vary by the same amount when computed 
using pow(x,n) versus multiplies. I'm including the test code at the 
end. Assuming that this result is not a fluke that expands the 
noncontroversial set by at least 3 more values. That's starting to 
strain the ufunc aproach, so perhaps optimizing in @TYP at _power is the 
way to go after all. Or,  more likely, adding @TYP at _int_power or maybe 
@TYP at _fast_power (so as to be able to include some half integer powers) 
and dispatching appropriately from array_power.

The problem here, of course, is the overhead that PyArray_EnsureArray 
runs into. I'm not sure if the ufuncs actually call that, but I was 
using that to convert things to arrays at one point and I saw the 
slowdown, so I suspect that the slowdown is in something 
PyArray_EnsureArray calls if not in that routine itself. I'm afraid to 
dig into that stuff though.. On the other hand, it would probably speed 
up all kinds of stuff if that was sped up.

>Ok. I'm still not happy with the speed of pow(), though. I'll have to
>sit and look at. We may be able to optimize integer powers better.
>And there's another area: integer powers of integers. Right now that
>uses pow(), whereas we might be able to do better. 
So that's why that's so slow. I assumed it was doing some sort of 
sucessive multiplication. For this, the code that complexobject uses for 
integer powers might be helpful.

>I'm looking into
>that. A framework for that could be helpful for the complex powers too.
>Too bad we couldn't make a function generator :-) [Well, we could using


def check(n=100000):
    import math
    sqrt = math.sqrt
    failures = {}
    for x in [math.pi, math.e, 1.1]+[1.0+1.0/y for y in range(1,1+n)]:
        for e, expr in [
                        (-5, "1/((x*x)*(x*x)*x)"), 
(-4,"1/((x*x)*(x*x))"), (3, "1/((x*x)*x)"), (1, "x"),
                        (-2, "1/(x*x)"), (-1,"1/x"), (0, "1"), (1, "x"),
                        (2, "x*x"), (3, "x*x*x"), (4, "(x*x)*(x*x)"),
                        (4, "x*x*x*x"),
                        (5, '(x*x)*(x*x)*x'),
                        (6, '(x*x)*(x*x)*(x*x)'),
                        (7, '(x*x)*(x*x)*(x*x)*x'),
                        (8, '((x*x)*(x*x))*((x*x)*(x*x))'),
                        (-1.5, "1/(sqrt(x)*x)"), (-0.5, "1/sqrt(x)"),
                        (0.5, "sqrt(x)"), (1.5, "x*sqrt(x)")]:
            delta =  abs(pow(x,e) - eval(expr, locals())) / pow(x,e)
            if delta:
                key = (e, expr)
                if key not in failures:
                    failures[key] = [(delta, x)]
                failures[key].append((delta, x))
    for key in sorted(failures.keys()):
        e, expr = key
        fstring = ', '.join(str(x) for x in 
        if len(failures[key]) > 1:
            fstring += ', ...'
        print "Failures for x**%s (%s): %s" % (e, expr, fstring)

More information about the Numpy-discussion mailing list