[Numpy-discussion] corrcoeff on big matrix
Todd Miller
jmiller at stsci.edu
Tue Mar 16 14:29:03 CST 2004
On Tue, 2004-03-16 at 10:41, CL WU wrote:
> Hi, group,
> I have a big "Float64" matrix (42x22300) and I want to get
> its correlation coefficient matrix, but I got the error as the following:
>
> >>> data.shape
> (42, 22300)
> >>> mlab.corrcoef(data)
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> File "C:\Python23\Lib\site-packages\numarray\linear_algebra\mlab.py",
> line 300, in corrcoef
> c = cov(x, y)
> File "C:\Python23\Lib\site-packages\numarray\linear_algebra\mlab.py",
> line 294, in cov
> val = squeeze(dot(transpose(m),conjugate(y)) / fact)
> File "C:\Python23\Lib\site-packages\numarray\numarraycore.py", line
> 1150, in dot
> return ufunc.innerproduct(array1, _gen.swapaxes(array2, -1, -2))
> File "C:\Python23\Lib\site-packages\numarray\ufunc.py", line 2047, in
> innerproduct
> r = a.__class__(shape=adots+bdots, type=rtype)
> ValueError: new_memory: invalid region size: -633294592.
>
> I suspect corrcoef function can not handle such a big matrix. If so,
> what is the upper limit for array size?
The memory limit is appears to be driven by the numarray.memory and is
2G. Trying to run your function call winds up creating a dot product
output array which is 22300**2. This is ~400M * 8 bytes per float just
for the dot product output, which is 3.2G, hence the exception.
I think 16384**2 is the ideal limit of what you can achieve with
numarray, and in practice, think you'll get considerably less depending
on how many arrays are needed at once to complete your computation.
> How can I get around this
> problem in numarray?
One possibility is to consider using Float32 to stretch out your
memory. I don't know whether that's numerically viable or not.
Another way is 64-bit computing. That is largely unexplored territory,
and Python itself has issues there. It will likely take some work
because we haven't done it yet ourselves.
I hope this at least sheds some light on the problem, if not the actual
solution.
Regards,
Todd
> BTW, I am using numarray 0.9/python 2.3.3 on win2kSP4
>
> Thanks.
>
> Chunlei
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
--
Todd Miller <jmiller at stsci.edu>
More information about the Numpy-discussion
mailing list