[Numpy-discussion] the difference between "+" and np.add?
Francesc Alted
francesc@continuum...
Wed Nov 28 05:34:38 CST 2012
On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote:
> On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <francesc@continuum.io> wrote:
>> As Nathaniel said, there is not a difference in terms of *what* is
>> computed. However, the methods that you suggested actually differ on
>> *how* they are computed, and that has dramatic effects on the time
>> used. For example:
>>
>> In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
>>
>> In []: %time arr1 + arr2 + arr3 + arr4 + arr5
>> CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
>> Wall time: 0.15 s
>> There are also ways to minimize the size of temporaries, and numexpr is
>> one of the simplests:
> but you can also use np.add (and friends) to reduce the number of
> temporaries. It can make a difference:
>
> In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5):
> ....: result = arr1 + arr2
> ....: np.add(result, arr3, out=result)
> ....: np.add(result, arr4, out=result)
> ....: np.add(result, arr5, out=result)
>
> In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5
> 1 loops, best of 3: 528 ms per loop
>
> In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
> 1 loops, best of 3: 293 ms per loop
>
> (don't have numexpr on this machine for a comparison)
Yes, you are right. However, numexpr still can beat this:
In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5
10 loops, best of 3: 138 ms per loop
In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
10 loops, best of 3: 74.3 ms per loop
In [10]: timeit ne.evaluate("arr1 + arr2 + arr3 + arr4 + arr5")
10 loops, best of 3: 20.8 ms per loop
The reason is that numexpr is multithreaded (using 6 cores above), and
for memory-bounded problems like this one, fetching data in different
threads is more efficient than using a single thread:
In [12]: timeit arr1.copy()
10 loops, best of 3: 41 ms per loop
In [13]: ne.set_num_threads(1)
Out[13]: 6
In [14]: timeit ne.evaluate("arr1")
10 loops, best of 3: 30.7 ms per loop
In [15]: ne.set_num_threads(6)
Out[15]: 1
In [16]: timeit ne.evaluate("arr1")
100 loops, best of 3: 13.4 ms per loop
I.e., the joy of multi-threading is that it not only buys you CPU speed,
but can also bring your data from memory faster. So yeah, modern
applications *do* need multi-threading for getting good performance.
--
Francesc Alted
More information about the NumPy-Discussion
mailing list