[Numpy-discussion] Slicing slower than matrix multiplication?
Jasper van de Gronde
Sat Dec 12 05:59:16 CST 2009
Francesc Alted wrote:
> Yeah, I think taking slices here is taking quite a lot of time:
> In : timeit E + Xi2[P/2,:]
> 100000 loops, best of 3: 3.95 µs per loop
> In : timeit E + Xi2[P/2]
> 100000 loops, best of 3: 2.17 µs per loop
> don't know why the additional ',:' in the slice is taking so much time, but my
> guess is that passing & analyzing the second argument (slice(None,None,None))
> could be the responsible for the slowdown (but that is taking too much time).
> Mmh, perhaps it would be worth to study this more carefully so that an
> optimization could be done in NumPy.
This is indeed interesting! And very nice that this actually works the
way you'd expect it to. I guess I've just worked too long with Matlab :)
>> I think the lesson mostly should be that with so little data,
>> benchmarking becomes a very difficult art.
> Well, I think it is not difficult, it is just that you are perhaps
> benchmarking Python/NumPy machinery instead ;-) I'm curious whether Matlab
> can do slicing much more faster than NumPy. Jasper?
I had a look, these are the timings for Python for 60x20:
Dot product: 0.051165 (5.116467e-06 per iter)
Add a row: 0.092849 (9.284860e-06 per iter)
Add a column: 0.082523 (8.252348e-06 per iter)
For Matlab 60x20:
Dot product: 0.029927 (2.992664e-006 per iter)
Add a row: 0.019664 (1.966444e-006 per iter)
Add a column: 0.008384 (8.384376e-007 per iter)
For Python 600x200:
Dot product: 1.917235 (1.917235e-04 per iter)
Add a row: 0.113243 (1.132425e-05 per iter)
Add a column: 0.162740 (1.627397e-05 per iter)
For Matlab 600x200:
Dot product: 1.282778 (1.282778e-004 per iter)
Add a row: 0.107252 (1.072525e-005 per iter)
Add a column: 0.021325 (2.132527e-006 per iter)
If I fit a line through these two data points (60 and 600 rows), I get
the following equations:
Python, AR: 3.8e-5 * n + 0.091
Matlab, AC: 2.4e-5 * n + 0.0069
This would suggest that Matlab performs the vector addition about 1.6
times faster and has a 13 times smaller constant cost!
As for the questions about what I'm trying to compute, these tests are
minimized as much as possible to show the bottleneck I encountered, they
are part of a larger loop where it does make sense. In essence I'm
iteratively adjusting w and E has to keep up (because that's what is
used to determine the next change). Instead of recomputing E all the
time based on E = Xi*w a little linear algebra shows that the vector
addition is sufficient.
More information about the NumPy-Discussion