[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?
Francesc Alted
francesc@continuum...
Tue Apr 10 11:57:04 CDT 2012
On 4/10/12 9:55 AM, Henry Gomersall wrote:
> On 10/04/2012 16:36, Francesc Alted wrote:
>> In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
>> 100 loops, best of 3: 12.3 ms per loop
>>
>> In [11]: timeit c = numpy.abs(b)
>> 100 loops, best of 3: 8.45 ms per loop
>>
>> in your windows box and see if they raise similar results?
>>
> No, the results are somewhat the same as before - ~40ms for the first
> (upcast/downcast) case and ~150ms for the direct case (both *much*
> slower than yours!). This is versus ~28ms for operating directly on
> double precisions.
Okay, so it seems that something is going on wrong with the performance
of pure complex64 abs() for Windows.
>
> I'm using numexpr in the end, but this is slower than numpy.abs under linux.
Oh, you mean the windows version of abs(complex64) in numexpr is slower
than a pure numpy.abs(complex64) under linux? That's weird, because
numexpr has an independent implementation of the complex operations from
NumPy machinery. Here it is how abs() is implemented in numexpr:
static void
nc_abs(cdouble *x, cdouble *r)
{
r->real = sqrt(x->real*x->real + x->imag*x->imag);
r->imag = 0;
}
[as I said, only the double precision version is implemented, so you
have to add here the cost of the cast too]
Hmm, considering all of these facts, it might be that sqrtf() on windows
is under-performing? Can you try this:
In [68]: a = numpy.linspace(0, 1, 1e6)
In [69]: b = numpy.float32(a)
In [70]: timeit c = numpy.sqrt(a)
100 loops, best of 3: 5.64 ms per loop
In [71]: timeit c = numpy.sqrt(b)
100 loops, best of 3: 3.77 ms per loop
and tell us the results for windows?
PD: if you are using numexpr on windows, you may want to use the MKL
linked version, which uses the abs of MKL, that should have considerably
better performance.
--
Francesc Alted
More information about the NumPy-Discussion
mailing list