[Numpy-discussion] Numpy Memory Error with corrcoef

Pierre Haessig pierre.haessig@crans....
Tue Mar 27 04:38:06 CDT 2012

Hi Nicole,

Le 27/03/2012 11:12, Nicole Stoffels a écrit :
> *if __name__ == '__main__':
>     data_records = random.random((459375, 24))
>     correlation = corrcoef(data_records)*

May I assume that your data_record is made of 24 different variables of
which you have 459375 observations ?

If this is so and if you expect corrcoeff to return a 24*24 matrix, you
need to either transpose data_records :

>>> correlation = corrcoef(data_records.T)

or use the rowvar=0 argument (see np.corrcoef or np.cov docstring)

>>> correlation = corrcoef(data_records, rowvar = 0)

Both work on my computer, while your example indeed leads to a
MemoryError (because shape 459375*459375 would be a decently big matrix...)

I don't know if it's your case, but for those used to the Matlab (and
textbooks) convention of having variables stored in columns, the default
behaviour of numpy's covariance function is a bit surprising. I guess
historical reasons are involved in this choice. Just a matter of getting
used to it !

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120327/2085ebfc/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120327/2085ebfc/attachment.bin 

More information about the NumPy-Discussion mailing list