[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?
Tue Apr 10 12:01:45 CDT 2012
On Tue, Apr 10, 2012 at 12:57 PM, Francesc Alted <email@example.com>wrote:
> On 4/10/12 9:55 AM, Henry Gomersall wrote:
> > On 10/04/2012 16:36, Francesc Alted wrote:
> >> In : timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
> >> 100 loops, best of 3: 12.3 ms per loop
> >> In : timeit c = numpy.abs(b)
> >> 100 loops, best of 3: 8.45 ms per loop
> >> in your windows box and see if they raise similar results?
> > No, the results are somewhat the same as before - ~40ms for the first
> > (upcast/downcast) case and ~150ms for the direct case (both *much*
> > slower than yours!). This is versus ~28ms for operating directly on
> > double precisions.
> Okay, so it seems that something is going on wrong with the performance
> of pure complex64 abs() for Windows.
> > I'm using numexpr in the end, but this is slower than numpy.abs under
> Oh, you mean the windows version of abs(complex64) in numexpr is slower
> than a pure numpy.abs(complex64) under linux? That's weird, because
> numexpr has an independent implementation of the complex operations from
> NumPy machinery. Here it is how abs() is implemented in numexpr:
> static void
> nc_abs(cdouble *x, cdouble *r)
> r->real = sqrt(x->real*x->real + x->imag*x->imag);
> r->imag = 0;
> [as I said, only the double precision version is implemented, so you
> have to add here the cost of the cast too]
> Hmm, considering all of these facts, it might be that sqrtf() on windows
> is under-performing? Can you try this:
> In : a = numpy.linspace(0, 1, 1e6)
> In : b = numpy.float32(a)
> In : timeit c = numpy.sqrt(a)
> 100 loops, best of 3: 5.64 ms per loop
> In : timeit c = numpy.sqrt(b)
> 100 loops, best of 3: 3.77 ms per loop
> and tell us the results for windows?
> PD: if you are using numexpr on windows, you may want to use the MKL
> linked version, which uses the abs of MKL, that should have considerably
> better performance.
> Francesc Alted
Just a quick aside, wouldn't the above have overflow issues? Isn't this
why hypot() is available?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion