[SciPy-User] numexpr.evaluate slower than eval, why?
Francesc Alted
faltet@pytables....
Mon Nov 1 16:26:55 CDT 2010
A Monday 01 November 2010 20:24:51 Gerrit Holl escrigué:
> Hi,
>
> (since I couldn't find any numexpr mailing-list, I ask the question
> here)
>
> I am working with pytables and numexpr. I use pytables' .where()
> method to select fields from my data. Sometimes I can't do that and I
> need to select them "by hand", but to keep the interface constant and
> avoid the need to parse things myself, I evaluate the same strings to
> sub-select fields from my data. To my surprise, numexpr.evaluate is
> about two times slower than eval. Why?
>
> In [130]: %timeit numexpr.evaluate('MEAN>1000', recs)
> 10000 loops, best of 3: 117 us per loop
>
> In [131]: %timeit eval('MEAN>1000', {}, {'MEAN': recs['MEAN']})
> 10000 loops, best of 3: 55.4 us per loop
>
> In [132]: %timeit recs['MEAN']>1000
> 10000 loops, best of 3: 42.1 us per loop
There are several causes for this. First, numexpr is not always faster
than numpy, but only basically when temporaries enter into the equation
(that is, when you are evaluating complex expressions basically). In
the above expression, you only have a simple expression, with no
temporaries at all, so you cannot expect a large speed-up when using
numexpr.
Secondly, if you are getting a 2x slowdown in the above expression is
probably due to the fact that you are using small inputs (i.e. len(recs)
is small), and that numexpr is using several threads automatically. And
it happens that, for such a small arrays, the current threading code
introduces an important overhead.
Consider this (using a 2-core machine here):
>>> ne.set_num_threads(2)
>>> a = np.arange(1e3)
>>> timeit ne.evaluate('a>1000')
10000 loops, best of 3: 31.5 µs per loop
>>> timeit eval('a>1000')
100000 loops, best of 3: 19.5 µs per loop
>>> timeit a>1000
100000 loops, best of 3: 4.35 µs per loop
i.e. for small arrays, eval+numpy is faster. To prove that this is
mainly due to the overhead of internal threading code, let's force the
use of a single thread with numexpr:
>>> ne.set_num_threads(1)
>>> timeit ne.evaluate('a>1000')
100000 loops, best of 3: 18.8 µs per loop
which is very close to eval + numpy performance. Finally, we can see
how almost all of the evaluation time is wasted during the compilation
phase:
>>> a = np.arange(1e0)
>>> timeit ne.evaluate('a>1000')
100000 loops, best of 3: 16.4 µs per loop
>>> timeit eval('a>1000')
100000 loops, best of 3: 17.5 µs per loop
[Incidentally, one can see how the numexpr's compiler is slightly faster
than python's one. Wow, what a welcome surprise!]
Interestingly enough, things changes dramatically for larger arrays:
>>> ne.set_num_threads(2)
>>> b = np.arange(1e5)
>>> timeit ne.evaluate('b>1000')
10000 loops, best of 3: 97.5 µs per loop
>>> timeit eval('b>1000')
10000 loops, best of 3: 138 µs per loop
>>> timeit b>1000
10000 loops, best of 3: 123 µs per loop
In this case, numexpr is faster than numpy by a 25%. This speed-up is
mostly due to the use of several threads automatically (using 2 cores
and 2 threads above). Forcing the use of a single thread we have:
>>> ne.set_num_threads(1)
>>> timeit ne.evaluate('b>1000')
10000 loops, best of 3: 112 µs per loop
which is closer to numpy performance (but still a 10% faster, don't know
exactly why).
So, the lesson to learn here is that, if you work with small arrays and
want to attain at least the same performance than python's `eval`, then
you should set the number of threads in numexpr to 1.
Hmm, now that I think about this, it should be interesting if numexpr
can automatically disable the multi-threading code for small arrays.
Added the ticket:
http://code.google.com/p/numexpr/issues/detail?id=36
> (on a side-note: what is python/evals definition of a mapping?
> numexpr evaluates recs (a numpy.recarray) as a mapping, but eval
> does not)
Numexpr comes with special machinery to recognize many NumPy's features,
like automatic detection of strided arrays, or unaligned ones. In
particular, structured arrays / recarrays are also recognized and
computations are optimized based on all this metainfo. Indeed, Python's
compiler is ignorant about NumPy objects and hence it has no
possibilities to apply such optimizations.
--
Francesc Alted
More information about the SciPy-User
mailing list