[Numpy-discussion] the difference between "+" and np.add?

Francesc Alted francesc@continuum...
Wed Nov 28 05:34:38 CST 2012


On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote:
> On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <francesc@continuum.io> wrote:
>> As Nathaniel said, there is not a difference in terms of *what* is
>> computed.  However, the methods that you suggested actually differ on
>> *how* they are computed, and that has dramatic effects on the time
>> used.  For example:
>>
>> In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
>>
>> In []: %time arr1 + arr2 + arr3 + arr4 + arr5
>> CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
>> Wall time: 0.15 s
>> There are also ways to minimize the size of temporaries, and numexpr is
>> one of the simplests:
> but you can also use np.add (and friends) to reduce the number of
> temporaries. It can make a difference:
>
> In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5):
>     ....:     result = arr1 + arr2
>     ....:     np.add(result, arr3, out=result)
>     ....:     np.add(result, arr4, out=result)
>     ....:     np.add(result, arr5, out=result)
>
> In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5
> 1 loops, best of 3: 528 ms per loop
>
> In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
> 1 loops, best of 3: 293 ms per loop
>
> (don't have numexpr on this machine for a comparison)

Yes, you are right.  However, numexpr still can beat this:

In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5
10 loops, best of 3: 138 ms per loop

In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
10 loops, best of 3: 74.3 ms per loop

In [10]: timeit ne.evaluate("arr1 + arr2 + arr3 + arr4 + arr5")
10 loops, best of 3: 20.8 ms per loop

The reason is that numexpr is multithreaded (using 6 cores above), and 
for memory-bounded problems like this one, fetching data in different 
threads is more efficient than using a single thread:

In [12]: timeit arr1.copy()
10 loops, best of 3: 41 ms per loop

In [13]: ne.set_num_threads(1)
Out[13]: 6

In [14]: timeit ne.evaluate("arr1")
10 loops, best of 3: 30.7 ms per loop

In [15]: ne.set_num_threads(6)
Out[15]: 1

In [16]: timeit ne.evaluate("arr1")
100 loops, best of 3: 13.4 ms per loop

I.e., the joy of multi-threading is that it not only buys you CPU speed, 
but can also bring your data from memory faster.  So yeah, modern 
applications *do* need multi-threading for getting good performance.

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list