[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?

Francesc Alted francesc@continuum...
Tue Apr 10 13:13:01 CDT 2012


On 4/10/12 11:43 AM, Henry Gomersall wrote:
> On 10/04/2012 17:57, Francesc Alted wrote:
>>> I'm using numexpr in the end, but this is slower than numpy.abs under linux.
>> Oh, you mean the windows version of abs(complex64) in numexpr is slower
>> than a pure numpy.abs(complex64) under linux?  That's weird, because
>> numexpr has an independent implementation of the complex operations from
>> NumPy machinery.  Here it is how abs() is implemented in numexpr:
>>
>> static void
>> nc_abs(cdouble *x, cdouble *r)
>> {
>>        r->real = sqrt(x->real*x->real + x->imag*x->imag);
>>        r->imag = 0;
>> }
>>
>> [as I said, only the double precision version is implemented, so you
>> have to add here the cost of the cast too]
> hmmm, I can't seem to reproduce that assertion, so ignore it.
>
>> Hmm, considering all of these facts, it might be that sqrtf() on windows
>> is under-performing?  Can you try this:
>>
>> In [68]: a = numpy.linspace(0, 1, 1e6)
>>
>> In [69]: b = numpy.float32(a)
>>
>> In [70]: timeit c = numpy.sqrt(a)
>> 100 loops, best of 3: 5.64 ms per loop
>>
>> In [71]: timeit c = numpy.sqrt(b)
>> 100 loops, best of 3: 3.77 ms per loop
>>
>> and tell us the results for windows?
> In [18]: timeit c = numpy.sqrt(a)
> 100 loops, best of 3: 21.4 ms per loop
>
> In [19]: timeit c = numpy.sqrt(b)
> 100 loops, best of 3: 12.5 ms per loop
>
> So, all sensible there it seems.
>
> Taking this to the next stage...
>
> In [95]: a = numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048)
>
> In [96]: b = numpy.complex64(a)
>
> In [97]: timeit numpy.sqrt(a*numpy.conj(a))
> 10 loops, best of 3: 61.9 ms per loop
>
> In [98]: timeit numpy.sqrt(b*numpy.conj(b))
> 10 loops, best of 3: 27.2 ms per loop
>
> In [99]: timeit numpy.abs(a)  # for comparison
> 10 loops, best of 3: 30 ms per loop
>
> In [100]: timeit numpy.abs(b)  # and again (slow slow slow)
> 1 loops, best of 3: 153 ms per loop
>
> I can confirm the results are correct. So, it really is in numpy.abs.

Yup, definitely seems an issues of numpy.abs for complex64 on windows.  
Could you file a ticket on this please?

>> PD: if you are using numexpr on windows, you may want to use the MKL
>> linked version, which uses the abs of MKL, that should have considerably
>> better performance.
> I'd love to - I presume this would mean me buying an MKL license? If
> not, where do I find the MKL linked version?

Well, depending on what you do, you may want to use Golke's version:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

where part of the packages here comes with MKL included (in particular 
NumPy/numexpr).

However, after having a look at numexpr sources, I found that the abs() 
version is not using MKL (apparently due to some malfunction of the 
latter; maybe this has been solved already).
So, don't expect a speedup by using MKL in this case.

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list