# [Numpy-discussion] Optimized sum of squares

Charles R Harris charlesr.harris@gmail....
Sun Oct 18 13:19:32 CDT 2009

```On Sun, Oct 18, 2009 at 11:37 AM, <josef.pktd@gmail.com> wrote:

> On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold <jsseabold@gmail.com>
> wrote:
> > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux
> > <gael.varoquaux@normalesup.org> wrote:
> >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
> >>> Hi Gaël,
> >>
> >>> If you've got a 1D array/vector called "a", I think the normal idiom is
> >>
> >>> np.dot(a,a)
> >>
> >>> For the more general case, I think
> >>> np.tensordot(a, a, axes=something_else)
> >>> should do it, where you should be able to figure out something_else for
> >>
> >> Ha, yes. Good point about the tensordot trick.
> >>
> >> Thank you
> >>
> >> Gaël
> >
> > in statsmodels and didn't much think about it.
> >
> > There is
> >
> > import numpy as np
> > from scipy.stats import ss
> >
> > a = np.ones(5000)
> >
> > but
> >
> > timeit ss(a)
> > 10000 loops, best of 3: 21.5 µs per loop
> >
> > 100000 loops, best of 3: 15 µs per loop
> >
> > timeit np.dot(a,a)
> > 100000 loops, best of 3: 5.38 µs per loop
> >
> > Do the number of loops matter in the timings and is dot always faster
> > even without the blas dot?
>
> David's reply once was that it depends on ATLAS and the version of
> lapack/blas.
>
> I usually switched to using dot for 1d. Using tensordot looks to
> complicated for me, to figure out the axes when I quickly want a sum of
> squares.
>
> I never tried the timing of tensordot for 2d arrays, especially for
> axis=0 for a
> c ordered array. If it's faster, this could be useful to rewrite stats.ss.
>
> I don't remember that np.add.reduce is much faster than np.sum. This might
> be
>
>
If you are using numpy from svn, it might be due to te recent optimizations
that Luca Citi did for some of the ufuncs. Now we just need a multiply and