[SciPy-user] Creating a Matrix from a Sum

David Warde-Farley dwf@cs.toronto....
Sat Jul 4 05:56:27 CDT 2009

On 3-Jul-09, at 9:18 PM, Tom K. wrote:

> A more
> straight-forward approach in python is just a double for-loop, which  
> may be
> the most readable and perfectly sufficient unless your N is huge and/ 
> or this
> function becomes the most significant item in a profile of your  
> application.
> sums = np.cumsum(x[::-1])[::-1]
> P = np.zeros((len(x), len(x))
> for i in range(len(x)):
>  for j in  range(len(x)):
>    P[i, j] = sums[np.max((i, j))]
> #Here it is with list comprehensions:
>  P = np.array([[sums[np.max((i, j))] for j in range(len(x))] for i in
> range(len(x))])

Before I begin, a quick tip: replace range() with xrange() to do the  
same with less memory (range() causes a list allocation, xrange()  
provides a constant-size generator).

This is an instance where the intuition gleaned from asymptotic  
notation fails to really address one critical piece of the puzzle:  
both the implementation above and the implementation I posted are  
doing roughly the same amount of work (mine is doing a little more in  
the form of copies and allocations), but one is doing it in C whereas  
the other is doing it in interpreted Python. It also depends on your  
definition of a "huge" problem. Speed wise, Python-vs-C starts to  
matter very, very quickly, i.e. on the order of hundreds of elements,  
and by 1000 things get really horrible:

In [144]: def python_impl(x):
     P = np.zeros((len(x), len(x)))
     sums = np.cumsum(x[::-1])[::-1]    for i in range(len(x)):
         for j in range(len(x)):
             P[i, j] = sums[np.max((i,j))]

In [150]: def numpy_impl(x):
     sums = np.cumsum(x[::-1])[::-1]
     idx = np.concatenate([M[:,:,np.newaxis] for M in  
np.mgrid[0:len(x), 0:len(x)]],axis=2).max(axis=2)
     P = sums[idx] / x.sum()

In [154]: x = randn(1000)

In [155]: timeit numpy_impl(x)
1 loops, best of 3: 200 ms per loop

In [156]: timeit python_impl(x)
1 loops, best of 3: 37.7 s per loop

In [157]: x = randn(5000)

In [158]: timeit numpy_impl(x)
1 loops, best of 3: 5.41 s per loop

I did start running python_impl with len(x) = 5000 but my back of the  
envelope calculation is that it will be taking at least 1 hour per  
try, so timeit will be done in about 3 hours. :(

The overhead of the Python nested loops may be a showstopper even for  
moderately sized problems. In a lot of these situations I find I run  
out of patience far before I run out of memory to throw around. :)


More information about the SciPy-user mailing list