# [SciPy-user] Creating a Matrix from a Sum

David Warde-Farley dwf@cs.toronto....
Sat Jul 4 05:56:27 CDT 2009

```On 3-Jul-09, at 9:18 PM, Tom K. wrote:

> A more
> straight-forward approach in python is just a double for-loop, which
> may be
> the most readable and perfectly sufficient unless your N is huge and/
> or this
> function becomes the most significant item in a profile of your
> application.
>
> sums = np.cumsum(x[::-1])[::-1]
> P = np.zeros((len(x), len(x))
> for i in range(len(x)):
>  for j in  range(len(x)):
>    P[i, j] = sums[np.max((i, j))]
>
> #Here it is with list comprehensions:
>  P = np.array([[sums[np.max((i, j))] for j in range(len(x))] for i in
> range(len(x))])

Before I begin, a quick tip: replace range() with xrange() to do the
same with less memory (range() causes a list allocation, xrange()
provides a constant-size generator).

This is an instance where the intuition gleaned from asymptotic
notation fails to really address one critical piece of the puzzle:
both the implementation above and the implementation I posted are
doing roughly the same amount of work (mine is doing a little more in
the form of copies and allocations), but one is doing it in C whereas
the other is doing it in interpreted Python. It also depends on your
definition of a "huge" problem. Speed wise, Python-vs-C starts to
matter very, very quickly, i.e. on the order of hundreds of elements,
and by 1000 things get really horrible:

In [144]: def python_impl(x):
P = np.zeros((len(x), len(x)))
sums = np.cumsum(x[::-1])[::-1]    for i in range(len(x)):
for j in range(len(x)):
P[i, j] = sums[np.max((i,j))]
.....:

In [150]: def numpy_impl(x):
sums = np.cumsum(x[::-1])[::-1]
idx = np.concatenate([M[:,:,np.newaxis] for M in
np.mgrid[0:len(x), 0:len(x)]],axis=2).max(axis=2)
P = sums[idx] / x.sum()
.....:

In [154]: x = randn(1000)

In [155]: timeit numpy_impl(x)
1 loops, best of 3: 200 ms per loop

In [156]: timeit python_impl(x)
1 loops, best of 3: 37.7 s per loop

In [157]: x = randn(5000)

In [158]: timeit numpy_impl(x)
1 loops, best of 3: 5.41 s per loop

I did start running python_impl with len(x) = 5000 but my back of the
envelope calculation is that it will be taking at least 1 hour per
try, so timeit will be done in about 3 hours. :(

The overhead of the Python nested loops may be a showstopper even for
moderately sized problems. In a lot of these situations I find I run
out of patience far before I run out of memory to throw around. :)

David
```