[Numpy-discussion] the difference between "+" and np.add?
Francesc Alted
francesc@continuum...
Thu Nov 22 08:20:39 CST 2012
On 11/22/12 1:41 PM, Chao YUE wrote:
> Dear all,
>
> if I have two ndarray arr1 and arr2 (with the same shape), is there
> some difference when I do:
>
> arr = arr1 + arr2
>
> and
>
> arr = np.add(arr1, arr2),
>
> and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5,
> then I cannot use np.add anymore as it only recieves 2 arguments.
> then what's the best practice to add these arrays? should I do
>
> arr = arr1 + arr2 + arr3 + arr4 + arr5
>
> or I do
>
> arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
>
> because I just noticed recently that there are functions like np.add,
> np.divide, np.substract... before I am using all like directly
> arr1/arr2, rather than np.divide(arr1,arr2).
As Nathaniel said, there is not a difference in terms of *what* is
computed. However, the methods that you suggested actually differ on
*how* they are computed, and that has dramatic effects on the time
used. For example:
In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
In []: %time arr1 + arr2 + arr3 + arr4 + arr5
CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
Wall time: 0.15 s
Out[]:
array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ...,
4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)
CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s
Wall time: 3.14 s
Out[]:
array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ...,
4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
The difference is how memory is used. In the first case, the additional
memory was just a temporary with the size of the operands, while for the
second case a big temporary has to be created, so the difference in is
speed is pretty large.
There are also ways to minimize the size of temporaries, and numexpr is
one of the simplests:
In []: import numexpr as ne
In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5')
CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s
Wall time: 0.04 s
Out[]:
array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ...,
4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
Again, the computations are the same, but how you manage memory is critical.
--
Francesc Alted
More information about the NumPy-Discussion
mailing list