[Numpy-discussion] Slicing slower than matrix multiplication?
Francesc Alted
faltet@pytables....
Fri Dec 11 10:03:21 CST 2009
A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué:
> Jasper van de Gronde wrote:
> > Dag Sverre Seljebotn wrote:
> >> Jasper van de Gronde wrote:
> >>> I've attached a test file which shows the problem. It also tries adding
> >>> columns instead of rows (in case the memory layout is playing tricks),
> >>> but this seems to make no difference. This is the output I got:
> >>>
> >>> Dot product: 5.188786
> >>> Add a row: 8.032767
> >>> Add a column: 8.070953
> >>>
> >>> Any ideas on why adding a row (or column) of a matrix is slower than
> >>> computing a matrix product with a similarly sized matrix... (Xi has
> >>> less columns than Xi2, but just as many rows.)
> >>
> >> I think we need some numbers to put this into context -- how big are the
> >> vectors/matrices? How many iterations was the loop run? If the vectors
> >> are small and the loop is run many times, how fast the operation "ought"
> >> to be is irrelevant as it would drown in Python overhead.
> >
> > Originally I had attached a Python file demonstrating the problem, but
> > apparently this wasn't accepted by the list. In any case, the matrices
> > and vectors weren't too big (60x20), so I tried making them bigger and
> > indeed the "fast" version was now considerably faster.
>
> 60x20 is "nothing", so a full matrix multiplication or a single
> matrix-vector probably takes the same time (that is, the difference
> between them in itself is likely smaller than the error you make during
> measuring).
>
> In this context, the benchmarks will be completely dominated by the
> number of Python calls you make (each, especially taking the slice,
> means allocating Python objects, calling a bunch of functions in C, etc.
> etc). So it's not that strange, taking a slice isn't free, some Python
> objects must be created etc. etc.
Yeah, I think taking slices here is taking quite a lot of time:
In [58]: timeit E + Xi2[P/2,:]
100000 loops, best of 3: 3.95 µs per loop
In [59]: timeit E + Xi2[P/2]
100000 loops, best of 3: 2.17 µs per loop
don't know why the additional ',:' in the slice is taking so much time, but my
guess is that passing & analyzing the second argument (slice(None,None,None))
could be the responsible for the slowdown (but that is taking too much time).
Mmh, perhaps it would be worth to study this more carefully so that an
optimization could be done in NumPy.
> I think the lesson mostly should be that with so little data,
> benchmarking becomes a very difficult art.
Well, I think it is not difficult, it is just that you are perhaps
benchmarking Python/NumPy machinery instead ;-) I'm curious whether Matlab
can do slicing much more faster than NumPy. Jasper?
--
Francesc Alted
More information about the NumPy-Discussion
mailing list