[Numpy-discussion] the difference between "+" and np.add?

Francesc Alted francesc@continuum...
Thu Nov 22 08:20:39 CST 2012


On 11/22/12 1:41 PM, Chao YUE wrote:
> Dear all,
>
> if I have two ndarray arr1 and arr2 (with the same shape), is there 
> some difference when I do:
>
> arr = arr1 + arr2
>
> and
>
> arr = np.add(arr1, arr2),
>
> and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, 
> then I cannot use np.add anymore as it only recieves 2 arguments.
> then what's the best practice to add these arrays? should I do
>
> arr = arr1 + arr2 + arr3 + arr4 + arr5
>
> or I do
>
> arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
>
> because I just noticed recently that there are functions like np.add, 
> np.divide, np.substract... before I am using all like directly 
> arr1/arr2, rather than np.divide(arr1,arr2).

As Nathaniel said, there is not a difference in terms of *what* is 
computed.  However, the methods that you suggested actually differ on 
*how* they are computed, and that has dramatic effects on the time 
used.  For example:

In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]

In []: %time arr1 + arr2 + arr3 + arr4 + arr5
CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
Wall time: 0.15 s
Out[]:
array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
          4.99999850e+07,   4.99999900e+07,   4.99999950e+07])

In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)
CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s
Wall time: 3.14 s
Out[]:
array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
          4.99999850e+07,   4.99999900e+07,   4.99999950e+07])

The difference is how memory is used.  In the first case, the additional 
memory was just a temporary with the size of the operands, while for the 
second case a big temporary has to be created, so the difference in is 
speed is pretty large.

There are also ways to minimize the size of temporaries, and numexpr is 
one of the simplests:

In []: import numexpr as ne

In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5')
CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s
Wall time: 0.04 s
Out[]:
array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
          4.99999850e+07,   4.99999900e+07,   4.99999950e+07])

Again, the computations are the same, but how you manage memory is critical.

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list