[Numpy-discussion] Numpy Memory Error with corrcoef

Olivier Delalleau shish@keba...
Tue Mar 27 08:30:53 CDT 2012


Le 27 mars 2012 06:04, Nicole Stoffels <nicole.stoffels@forwind.de> a écrit
:

> **
> Hi Pierre,
>
> thanks for the fast answer!
>
> I actually have timeseries of 24 hours for 459375 gridpoints in Europe.
> The timeseries of every grid point is stored in a column. That's why in my
> real program I already transposed the data, so that the correlation is made
> column by column. What I finally need is the correlation of each gridpoint
> with every other gridpoint. I'm afraid that this results in a 459375*459375
> matrix.
>
> The correlation is actually just an interim result. So I'm currently
> trying to loop over every gridpoint to get single correlations which will
> then be processed further. Is this the right approach?
>
> for column in range(len(data_records)):
>     for columnnumber in range(len(data_records)):
>         correlation = corrcoef(data_records[column],
> data_records[columnnumber])
>
> Best wished,
> Nicole
>

It may be painfully slow... You should make sure you don't compute twice
each off-diagonal element.
Also, if all your computations can be vectorized, you'll probably get a
significant performance boost by computing your matrix by blocks instead of
element-by-element. Take blocks as big as can fit in memory.

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120327/9c9b1734/attachment.html 


More information about the NumPy-Discussion mailing list