[Numpy-discussion] Slicing slower than matrix multiplication?

Francesc Alted faltet@pytables....
Fri Dec 11 10:03:21 CST 2009


A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué:
> Jasper van de Gronde wrote:
> > Dag Sverre Seljebotn wrote:
> >> Jasper van de Gronde wrote:
> >>> I've attached a test file which shows the problem. It also tries adding
> >>> columns instead of rows (in case the memory layout is playing tricks),
> >>> but this seems to make no difference. This is the output I got:
> >>>
> >>>     Dot product: 5.188786
> >>>     Add a row: 8.032767
> >>>     Add a column: 8.070953
> >>>
> >>> Any ideas on why adding a row (or column) of a matrix is slower than
> >>> computing a matrix product with a similarly sized matrix... (Xi has
> >>> less columns than Xi2, but just as many rows.)
> >>
> >> I think we need some numbers to put this into context -- how big are the
> >> vectors/matrices? How many iterations was the loop run? If the vectors
> >> are small and the loop is run many times, how fast the operation "ought"
> >> to be is irrelevant as it would drown in Python overhead.
> >
> > Originally I had attached a Python file demonstrating the problem, but
> > apparently this wasn't accepted by the list. In any case, the matrices
> > and vectors weren't too big (60x20), so I tried making them bigger and
> > indeed the "fast" version was now considerably faster.
> 
> 60x20 is "nothing", so a full matrix multiplication or a single
> matrix-vector probably takes the same time (that is, the difference
> between them in itself is likely smaller than the error you make during
> measuring).
> 
> In this context, the benchmarks will be completely dominated by the
> number of Python calls you make (each, especially taking the slice,
> means allocating Python objects, calling a bunch of functions in C, etc.
> etc). So it's not that strange, taking a slice isn't free, some Python
> objects must be created etc. etc.

Yeah, I think taking slices here is taking quite a lot of time:

In [58]: timeit E + Xi2[P/2,:]
100000 loops, best of 3: 3.95 µs per loop

In [59]: timeit E + Xi2[P/2]
100000 loops, best of 3: 2.17 µs per loop

don't know why the additional ',:' in the slice is taking so much time, but my 
guess is that passing & analyzing the second argument (slice(None,None,None)) 
could be the responsible for the slowdown (but that is taking too much time).  
Mmh, perhaps it would be worth to study this more carefully so that an 
optimization could be done in NumPy.

> I think the lesson mostly should be that with so little data,
> benchmarking becomes a very difficult art.

Well, I think it is not difficult, it is just that you are perhaps 
benchmarking Python/NumPy machinery instead ;-)  I'm curious whether Matlab 
can do slicing much more faster than NumPy.  Jasper?

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list