[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?
Tue Apr 10 11:57:04 CDT 2012
On 4/10/12 9:55 AM, Henry Gomersall wrote:
> On 10/04/2012 16:36, Francesc Alted wrote:
>> In : timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
>> 100 loops, best of 3: 12.3 ms per loop
>> In : timeit c = numpy.abs(b)
>> 100 loops, best of 3: 8.45 ms per loop
>> in your windows box and see if they raise similar results?
> No, the results are somewhat the same as before - ~40ms for the first
> (upcast/downcast) case and ~150ms for the direct case (both *much*
> slower than yours!). This is versus ~28ms for operating directly on
> double precisions.
Okay, so it seems that something is going on wrong with the performance
of pure complex64 abs() for Windows.
> I'm using numexpr in the end, but this is slower than numpy.abs under linux.
Oh, you mean the windows version of abs(complex64) in numexpr is slower
than a pure numpy.abs(complex64) under linux? That's weird, because
numexpr has an independent implementation of the complex operations from
NumPy machinery. Here it is how abs() is implemented in numexpr:
nc_abs(cdouble *x, cdouble *r)
r->real = sqrt(x->real*x->real + x->imag*x->imag);
r->imag = 0;
[as I said, only the double precision version is implemented, so you
have to add here the cost of the cast too]
Hmm, considering all of these facts, it might be that sqrtf() on windows
is under-performing? Can you try this:
In : a = numpy.linspace(0, 1, 1e6)
In : b = numpy.float32(a)
In : timeit c = numpy.sqrt(a)
100 loops, best of 3: 5.64 ms per loop
In : timeit c = numpy.sqrt(b)
100 loops, best of 3: 3.77 ms per loop
and tell us the results for windows?
PD: if you are using numexpr on windows, you may want to use the MKL
linked version, which uses the abs of MKL, that should have considerably
More information about the NumPy-Discussion