[SciPy-user] Dealing with Large Data Sets

lechtlr lechtlr@yahoo....
Sat May 10 09:14:18 CDT 2008

I try to create an array called 'results' as provided in an example below.  Is there a way to do this operation more efficiently when the number of 'data_x' array gets larger ?  Also, I am looking for pointers to eliminate intermediate 'data_x' arrays, while creating 'results' in the following procedure.


from numpy import *
from numpy.random import *

# what is the best way to create an array named 'results' below 
# when number of 'data_x' (i.e., x = 1, 2.....1000) is large.
# Also nrows and ncolumns can go upto 10000

nrows = 5
ncolumns = 10

data_1 = zeros([nrows, ncolumns], 'd')
data_2 = zeros([nrows, ncolumns], 'd')
data_3 = zeros([nrows, ncolumns], 'd')

# to store squared sum of each column from the arrays above
results = zeros([3,ncolumns], 'd')

# loop to store raw data from a numerical operation; 
# rand() is given as an example here
for i in range(nrows):
    for j in range(ncolumns):
        data_1[i,j] = rand()
        data_2[i,j] = rand()
        data_3[i,j] = rand()

# store squared sum of each column from data_x
for k in range(ncolumns):
    results[0,k] = dot(data_1[:,k], data_1[:,k])
    results[1,k] = dot(data_2[:,k], data_2[:,k])
    results[2,k] = dot(data_3[:,k], data_3[:,k])

print results

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080510/f3dadc07/attachment.html 

More information about the SciPy-user mailing list