# [Numpy-discussion] Slicing slower than matrix multiplication?

Dag Sverre Seljebotn dagss@student.matnat.uio...
Fri Dec 11 09:44:29 CST 2009

```Jasper van de Gronde wrote:
> Dag Sverre Seljebotn wrote:
>
>> Jasper van de Gronde wrote:
>>
>>> I've attached a test file which shows the problem. It also tries adding
>>> columns instead of rows (in case the memory layout is playing tricks),
>>> but this seems to make no difference. This is the output I got:
>>>
>>>     Dot product: 5.188786
>>>
>>> Any ideas on why adding a row (or column) of a matrix is slower than
>>> computing a matrix product with a similarly sized matrix... (Xi has less
>>> columns than Xi2, but just as many rows.)
>>>
>>>
>> I think we need some numbers to put this into context -- how big are the
>> vectors/matrices? How many iterations was the loop run? If the vectors
>> are small and the loop is run many times, how fast the operation "ought"
>> to be is irrelevant as it would drown in Python overhead.
>>
>
> Originally I had attached a Python file demonstrating the problem, but
> apparently this wasn't accepted by the list. In any case, the matrices
> and vectors weren't too big (60x20), so I tried making them bigger and
> indeed the "fast" version was now considerably faster.
>
60x20 is "nothing", so a full matrix multiplication or a single
matrix-vector probably takes the same time (that is, the difference
between them in itself is likely smaller than the error you make during
measuring).

In this context, the benchmarks will be completely dominated by the
number of Python calls you make (each, especially taking the slice,
means allocating Python objects, calling a bunch of functions in C, etc.
etc). So it's not that strange, taking a slice isn't free, some Python
objects must be created etc. etc.

I think the lesson mostly should be that with so little data,
benchmarking becomes a very difficult art.

Dag Sverre

> But still, this seems like a very odd difference. I know Python is an
> interpreted language and has a lot of overhead, but still, selecting a
> row/column shouldn't be THAT slow, should it? To be clear, this is the
> code I used for testing:
> --------------------------------------------------------------------
> import timeit
>
> setupCode = """
> import numpy as np
>
> P = 60
> N = 20
>
> Xi = np.random.standard_normal((P,N))
> w = np.random.standard_normal((N))
> Xi2 = np.dot(Xi,Xi.T)
> E = np.dot(Xi,w)
> """
>
> N = 10000
>
> dotProduct = timeit.Timer('E = np.dot(Xi,w)',setupCode)
> additionRow = timeit.Timer('E += Xi2[P/2,:]',setupCode)
> additionCol = timeit.Timer('E += Xi2[:,P/2]',setupCode)
> print "Dot product: %f" % dotProduct.timeit(N)