[Numpy-discussion] how to deal with large arrays
Tim Hochberg
tim.hochberg at cox.net
Fri Oct 15 15:11:05 CDT 2004
Darren Dale wrote:
>Hello,
>
>I have two 2D arrays, Q is 3-by-q and R is r-by-3. At each q, I need to sum
>q(r) over R, so I take a dot product RQ and then sum along one axis to get a
>1-by-q result.
>
>I'm doing this with dot products because it is much faster than the equivalent
>for or while loop. The intermediate r-by-q array can get very large though
>(200MB in my case), so I was wondering if there is a better way to go about
>it?
>
>
I think so. I believe you are doing something like this:
result_1 = na.sum(na.dot(R,Q), 0)
I'm fairly certain (but I urge you to double check), that this reduces to:
result_2 = na.dot(na.sum(R, 0), Q)
which will take up much less intermediate storage and be faster to boot.
In more quasi-mathematical notations:
result_1 => sum_i sum_j R_ij Qjk = sum_j sum_i R_ij Q_jk = sum_j
Q_jk sum_i R_ij => result_2
A quick test seems to confirm this:
import numarray as na
from numarray import random_array
q = 10
r = 12
R = random_array.random((r,3))
Q = random_array.random((3,q))
x1 = na.sum(na.dot(R,Q), 0)
x2 = na.dot(na.sum(R, 0), Q)
print na.allclose(x1, x2)
-tim
>If not, I can slice up R and deal with it one chunk at a time, then the
>intermediate arrays fit within the available system resources. Would somebody
>offer a suggestion of how to do this intelligently? Should the intermediate
>array be about the size of the processor cache, some fraction of the
>available memory, or is there something else I need to consider?
>
>Thank you,
>Darren
>
>
>-------------------------------------------------------
>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
>Use IT products in your business? Tell us what you think of them. Give us
>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
>http://productguide.itmanagersjournal.com/guidepromo.tmpl
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
More information about the Numpy-discussion
mailing list