[Numpy-discussion] numpy large arrays?

Søren Dyrsting sorendyrsting@gmail....
Wed Dec 12 08:29:57 CST 2007


Hi all

I need to perform computations involving large arrays. A lot of rows and no
more than e.g. 34 columns. My first choice is python/numpy because I'm
already used to code in matlab.

However I'm experiencing memory problems even though there is still 500 MB
available (2 GB total). I have cooked down my code to following meaningless
code snip. This code share some of the same structure and calls as my real
program and shows the same behaviour.

********************************************************
import numpy as N
import scipy as S

def stress():
    x = S.randn(200000,80)
    for i in range(8):
        print "%(0)d" % {"0": i}
        s = N.dot(x.T, x)
        sd = N.array([s.diagonal()])
        r = N.dot(N.ones((N.size(x,0),1),'d'), sd)
        x = x + r
        x = x / 1.01

********************************************************


To different symptoms depending how big x are:
1) the program becomes extremely slow after a few iterations.
2) if the size of x is increased a little the program fails with the message
"MemoryError" for example at line 'x = x + r', but different places in the
code depending on the matrice size and which computer I'm testing. This
might also occur after several iterations, not just during the first pass.

I'm using Windows XP, ActivePython 2.5.1.1, NumPy 1.0.4, SciPy  0.6.0.

- Is there an error under the hood in NumPy?
- Am I balancing on the edge of the performance of Python/NumPy and should
consider other environments. Fortran, C, BLAS, LAPACK e.t.c.
- Am I misusing NumPy? Changing coding style will be a good workaround and
even perform on larger datasets without errors?

Thanks in advance
/Søren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20071212/cd3e93d0/attachment.html 


More information about the Numpy-discussion mailing list