[Numpy-discussion] numpy speed question
Francesc Alted
faltet@pytables....
Fri Nov 26 12:03:03 CST 2010
A Thursday 25 November 2010 11:13:49 Jean-Luc Menut escrigué:
> Hello all,
>
> I have a little question about the speed of numpy vs IDL 7.0. I did a
> very simple little check by computing just a cosine in a loop. I was
> quite surprised to see an order of magnitude of difference between
> numpy and IDL, I would have thought that for such a basic function,
> the speed would be approximatively the same.
>
> I suppose that some of the difference may come from the default data
> type of 64bits in numpy and 32 bits in IDL. Is there a way to change
> the numpy default data type (without recompiling) ?
>
> And I'm not an expert at all, maybe there is a better explanation,
> like a better use of the several CPU core by IDL ?
As others have already point out, you should make sure that you use
numpy.cos with arrays in order to get good performance.
I don't know whether IDL is using multi-cores or not, but if you are
looking for ultimate performance, you can always use Numexpr that makes
use of multicores. For example, using a machine with 8 cores (w/
hyperthreading), we have:
>>> from math import pi
>>> import numpy as np
>>> import numexpr as ne
>>> i = np.arange(1e6)
>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 85.2 ms per loop
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 8.28 ms per loop
If you don't have a machine with a lot of cores, but still want to get
good performance, you can still link Numexpr against Intel's VML (Vector
Math Library). For example, using Numexpr+VML with only one core (in
another machine):
>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 66.7 ms per loop
>>> ne.set_vml_num_threads(1)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 9.1 ms per loop
which also gives a pretty good speedup. Curiously, Numexpr+VML is not
that good at using multicores in this case:
>>> ne.set_vml_num_threads(2)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
10 loops, best of 3: 14.7 ms per loop
I don't really know why Numexpr+VML is taking more time using 2 threads
than only one, but it is probably due to Numexpr requiring better fine-
tuning in combination with VML :-/
--
Francesc Alted
More information about the NumPy-Discussion
mailing list