[Numpy-discussion] how to deal with large arrays

Tim Hochberg tim.hochberg at cox.net
Fri Oct 15 15:11:05 CDT 2004

Darren Dale wrote:

>I have two 2D arrays, Q is 3-by-q and R is r-by-3. At each q, I need to sum 
>q(r) over R, so I take a dot product RQ and then sum along one axis to get a 
>1-by-q result.
>I'm doing this with dot products because it is much faster than the equivalent 
>for or while loop. The intermediate r-by-q array can get very large though 
>(200MB in my case), so I was wondering if there is a better way to go about 
I think so. I believe you are doing something like this:

   result_1 = na.sum(na.dot(R,Q), 0)

I'm fairly certain (but I urge you to double check), that this reduces to:

    result_2 = na.dot(na.sum(R, 0), Q)

which will take up much less intermediate storage and be faster to boot. 
In more quasi-mathematical notations:

   result_1 => sum_i  sum_j  R_ij Qjk = sum_j sum_i R_ij Q_jk = sum_j 
Q_jk sum_i R_ij => result_2

A quick test seems to confirm this:

import numarray as na
from numarray import random_array

q = 10
r = 12

R = random_array.random((r,3))
Q = random_array.random((3,q))

x1 = na.sum(na.dot(R,Q), 0)
x2 = na.dot(na.sum(R, 0), Q)

print na.allclose(x1, x2)


>If not, I can slice up R and deal with it one chunk at a time, then the 
>intermediate arrays fit within the available system resources. Would somebody 
>offer a suggestion of how to do this intelligently? Should the intermediate 
>array be about the size of the processor cache, some fraction of the 
>available memory, or is there something else I need to consider?
>Thank you,
>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
>Use IT products in your business? Tell us what you think of them. Give us
>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net

More information about the Numpy-discussion mailing list