[Numpy-discussion] optimizing power() for complex and real cases
tim.hochberg at cox.net
Wed Feb 22 19:56:03 CST 2006
David M. Cooke wrote:
>Tim Hochberg <tim.hochberg at cox.net> writes:
>>David M. Cooke wrote:
>I've gone through your code you checked in, and fixed it up. Looks
>good. One side effect is that
> a = ones_like(x)
> a[:] = 0
> return a
>is now faster than zeros_like(x) :-)
I noticed that ones_like was faster than zeros_like, but I didn't think
to try that. That's pretty impressive considering how ridicuously easy
it was to write.
>One problem I had is that in PyArray_SetNumericOps, the "copy" method
>wasn't picked up on. It may be due to the order of initialization of
>the ndarray type, or something (since "copy" isn't a ufunc, it's
>initialized in a different place). I couldn't figure out how to fiddle
>that, so I replaced the x.copy() call with a call to PyArray_Copy().
Interesting. It worked fine here.
>>>Yes; because it's the implementation of __pow__, the second argument can
>>No, you misunderstand.. What I was talking about was that the *first*
>>argument can also be something that's not a PyArrayObject, despite the
>Ah, I suppose that's because the power slot in the number protocol
>also handles __rpow__.
That makes sense. It was giving me fits whatever the cause.
>>but the real fix would be to dig into
>>PyArray_EnsureArray and see why it's slow for Python_Ints. It is much
>>faster for numarray scalars.
>Right; that needs to be looked at.
It doesn't look to bad. But I haven't had a chance to try to do anything
about it yet.
>>Another approach is to actually compute (x*x)*(x*x) for pow(x,4) at
>>the level of array_power. I think I could make this work. It would
>>probably work well for medium size arrays, but might well make things
>>worse for large arrays that are limited by memory bandwidth since it
>>would need to move the array from memory into the cache multiple times.
>I don't like that; I think it would be better memory-wise to do it
>elementwise. Not just speed, but size of intermediate arrays.
Yeah, for a while I was real hot on the idea since I could do everything
without messing with ufuncs. But then I decided not to pursue it because
I thought it would be slow because of memory usage -- it would be
pullling data into the cache over and over again and I think that would
slow things down a lot.
>'int_power' we could do; that would the next step I think. The half
>integer powers we could maybe leave; if you want x**(-3/2), for
>instance, you could do y = x**(-1)*sqrt(x) (or do y = x**(-1);
>sqrt(y,y) if you're worried about temporaries).
>Or, 'fast_power' could be documented as doing the optimizations for
>integer and half-integer _scalar_ exponents, up to a certain size,
>like 100), and falling back on pow() if necessary. I think we could do
>a precomputation step to split the exponent into appropiate squarings
>and such that'll make the elementwise loop faster.
There's a clever implementation of this in complexobject.c. Speaking of
complexobject.c, I did implement fast integer powers for complex objects
at the nc_pow level. For small powers at least, it's over 10 times as
fast. And, since it's at the nc_pow level it works for matrix matrix
powers as well. My implementation is arguably slightly faster than
what's in complexobject, but I won't have a chance to check it in till
next week -- I'm off for some snowboarding tomorrow.
I kind of like power and scalar_power. Then ** could be advertised as
calling scalar_power for scalars and power for arrays. Scalar power
would do optimizations on integer and half_integer powers. Of course
there's no real way to enforce that scalar power is passed scalars,
since presumably it would be a ufunc, short of making _scalar_power a
ufunc instead and doing something like:
def scalar_power(x, y):
"compute x**y, where y is a scalar optimizing integer and half
integer powers possibly at some minor loss of accuracy"
if not is_scalar(y): raise ValuerError("Naughty!!")
>exponents are exactly representable as doubles (up to some number of
>course), so there's no chance of decimal-to-binary conversions making
>things look different. That might work out ok. Although, at that point
>I'd suggest we make it 'power', and have 'rawpower' (or ????) as the
>version that just uses pow().
>Another point is to look at __div__, and use reciprocal if the
>dividend is 1.
That would be easy, but wouldn't it be just as easy to optimize __div__
for scalar divisions. Should probably check that this isn't just as fast
since it would be a lot more general.
>I've added a page to the developer's wiki at
>to keep a list of areas like that to look into if someone has time :-)
Ah, good plan.
More information about the Numpy-discussion