[Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator

Andrew Collette h5py@alfven....
Wed Jan 21 12:55:39 CST 2009


Hi,

I get identical results for both shapes now; I manually removed the
"numexpr-1.1.1.dev-py2.5-linux-i686.egg" folder in site-packages and
reinstalled.  I suppose there must have been a stale set of files
somewhere.

Andrew Collette

On Wed, Jan 21, 2009 at 3:41 AM, Francesc Alted <faltet@pytables.org> wrote:
> A Tuesday 20 January 2009, Andrew Collette escrigué:
>> Works much, much better with the current svn version. :) Numexpr now
>> outperforms everything except the "simple" technique, and then only
>> for small data sets.
>
> Correct.  This is because of the cost of parsing the expression and
> initializing the virtual machine.  However, as soon as the sizes of the
> operands exceeds the cache of your processor you are starting to see
> the improvement in performance.
>
>> Along the lines you mentioned I noticed that simply changing from a
>> shape of (100*100*100,) to (100, 100, 100) results in nearly a factor
>> of 2 worse performance, a factor which seems constant when changing
>> the size of the data set.
>
> Sorry, but I cannot reproduce this.  When using the expression:
>
> "63 + (a*b) + (c**2) + b"
>
> I get on my machine (Core2@3 GHz, running openSUSE Linux 11.1):
>
> 1000000 f8 (average of 10 runs)
> Simple:  0.0278068065643
> Numexpr:  0.00839750766754
> Chunked:  0.0266514062881
>
> (100, 100, 100) f8 (average of 10 runs)
> Simple:  0.0277318000793
> Numexpr:  0.00851640701294
> Chunked:  0.0346593856812
>
> and these are the expected results (i.e. no change in performance due to
> multidimensional arrays).  Even for larger arrays, I don't see nothing
> unexpected:
>
> 10000000 f8 (average of 10 runs)
> Simple:  0.334054994583
> Numexpr:  0.110022115707
> Chunked:  0.29678030014
>
> (100, 100, 100, 10) f8 (average of 10 runs)
> Simple:  0.339299607277
> Numexpr:  0.111632704735
> Chunked:  0.375299096107
>
> Can you tell us which platforms are you using?
>
>> Is this related to the way numexpr handles
>> broadcasting rules?  It would seem the memory contents should be
>> identical for these two cases.
>>
>> Andrew
>>
>> On Tue, Jan 20, 2009 at 6:13 AM, Francesc Alted <faltet@pytables.org>
> wrote:
>> > A Tuesday 20 January 2009, Andrew Collette escrigué:
>> >> Hi Francesc,
>> >>
>> >> Looks like a cool project!  However, I'm not able to achieve the
>> >> advertised speed-ups.  I wrote a simple script to try three
>> >> approaches to this kind of problem:
>> >>
>> >> 1) Native Python code (i.e. will try to do everything at once
>> >> using temp arrays) 2) Straightforward numexpr evaluation
>> >> 3) Simple "chunked" evaluation using array.flat views.  (This
>> >> solves the memory problem and allows the use of arbitrary Python
>> >> expressions).
>> >>
>> >> I've attached the script; here's the output for the expression
>> >> "63 + (a*b) + (c**2) + sin(b)"
>> >> along with a few combinations of shapes/dtypes.  As expected,
>> >> using anything other than "f8" (double) results in a performance
>> >> penalty. Surprisingly, it seems that using chunks via array.flat
>> >> results in similar performance for f8, and even better performance
>> >> for other dtypes.
>> >
>> > [clip]
>> >
>> > Well, there were two issues there.  The first one is that when
>> > transcendental functions are used (like sin() above), the
>> > bottleneck is on the CPU instead of memory bandwidth, so numexpr
>> > speedups are not so high as usual.  The other issue was an actual
>> > bug in the numexpr code that forced a copy of all multidimensional
>> > arrays (I normally only use undimensional arrays for doing
>> > benchmarks).  This has been fixed in trunk (r39).
>> >
>> > So, with the fix on, the timings are:
>> >
>> > (100, 100, 100) f4 (average of 10 runs)
>> > Simple:  0.0426136016846
>> > Numexpr:  0.11350851059
>> > Chunked:  0.0635252952576
>> > (100, 100, 100) f8 (average of 10 runs)
>> > Simple:  0.119254398346
>> > Numexpr:  0.10092959404
>> > Chunked:  0.128384995461
>> >
>> > The speed-up is now a mere 20% (for f8), but at least it is not
>> > slower. With the patches that recently contributed Georg for using
>> > Intel's VML, the acceleration is a bit better:
>> >
>> > (100, 100, 100) f4 (average of 10 runs)
>> > Simple:  0.0417867898941
>> > Numexpr:  0.0944641113281
>> > Chunked:  0.0636183023453
>> > (100, 100, 100) f8 (average of 10 runs)
>> > Simple:  0.120059680939
>> > Numexpr:  0.0832288980484
>> > Chunked:  0.128114104271
>> >
>> > i.e. the speed-up is around 45% (for f8).
>> >
>> > Moreover, if I get rid of the sin() function and use the expresion:
>> >
>> > "63 + (a*b) + (c**2) + b"
>> >
>> > I get:
>> >
>> > (100, 100, 100) f4 (average of 10 runs)
>> > Simple:  0.0119329929352
>> > Numexpr:  0.0198570966721
>> > Chunked:  0.0338240146637
>> > (100, 100, 100) f8 (average of 10 runs)
>> > Simple:  0.0255623102188
>> > Numexpr:  0.00832500457764
>> > Chunked:  0.0340095996857
>> >
>> > which has a 3.1x speedup (for f8).
>> >
>> >> FYI, the current tar file (1.1-1) has a glitch related to the
>> >> VERSION file; I added to the bug report at google code.
>> >
>> > Thanks. Will focus on that asap.  Mmm, seems like there is stuff
>> > enough for another release of numexpr.  I'll try to do it soon.
>> >
>> > Cheers,
>> >
>> > --
>> > Francesc Alted
>> > _______________________________________________
>> > Numpy-discussion mailing list
>> > Numpy-discussion@scipy.org
>> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Francesc Alted
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>


More information about the Numpy-discussion mailing list