[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?
Francesc Alted
francesc@continuum...
Tue Apr 10 10:36:56 CDT 2012
On 4/10/12 6:44 AM, Henry Gomersall wrote:
> Here is the body of a post I made on stackoverflow, but it seems to be
> a non-obvious issue. I was hoping someone here might be able to shed
> light on it...
>
> On my 32-bit Windows Vista machine I notice a significant (5x)
> slowdown when taking the absolute values of a fairly large
> |numpy.complex64| array when compared to a |numpy.complex128| array.
>
> |>>> import numpy
> >>> a= numpy.random.randn(256,2048) + 1j*numpy.random.randn(256,2048)
> >>> b= numpy.complex64(a)
> >>> timeit c= numpy.float32(numpy.abs(a))
> 10 loops, best of3: 27.5 ms per loop
> >>> timeit c= numpy.abs(b)
> 1 loops, best of3: 143 ms per loop
> |
>
> Obviously, the outputs in both cases are the same (to operating
> precision).
>
> I do not notice the same effect on my Ubuntu 64-bit machine (indeed,
> as one might expect, the double precision array operation is a bit
> slower).
>
> Is there a rational explanation for this?
>
> Is this something that is common to all windows?
>
I cannot tell for sure, but it looks like the windows version of NumPy
is casting complex64 to complex128 internally. I'm guessing here, but
numexpr lacks the complex64 type, so it has to internally do the upcast,
and I'm seeing kind of the same slowdown:
In [6]: timeit numpy.abs(a)
100 loops, best of 3: 10.7 ms per loop
In [7]: timeit numpy.abs(b)
100 loops, best of 3: 8.51 ms per loop
In [8]: timeit numexpr.evaluate("abs(a)")
100 loops, best of 3: 1.67 ms per loop
In [9]: timeit numexpr.evaluate("abs(b)")
100 loops, best of 3: 4.96 ms per loop
In my case I'm seeing only a 3x slowdown, but this is because numexpr is
not re-casting the outcome to complex64, while windows might be doing
this. Just to make sure, can you run this:
In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
100 loops, best of 3: 12.3 ms per loop
In [11]: timeit c = numpy.abs(b)
100 loops, best of 3: 8.45 ms per loop
in your windows box and see if they raise similar results?
> In a related note of confusion, the times above are notably (and
> consistently) different (shorter) to that I get doing a naive `st =
> time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected?
>
This happens a lot, yes, specially when your code is memory-bottlenecked
(a very common situation). The explanation is simple: when your
datasets are small enough to fit in CPU cache, the first time the timing
loop runs, it brings all your working set to cache, so the second time
the computation is evaluated, the time does not have to fetch data from
memory, and by the time you run the loop 10 times or more, you are
discarding any memory effect. However, when you run the loop only once,
you are considering the memory fetch time too (which is often much more
realistic).
--
Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120410/a39a8aa2/attachment.html
More information about the NumPy-Discussion
mailing list