# [Numpy-discussion] Numpy Memory Error with corrcoef

Nicole Stoffels nicole.stoffels@forwind...
Tue Mar 27 05:04:01 CDT 2012

Hi Pierre,

I actually have timeseries of 24 hours for 459375 gridpoints in Europe.
The timeseries of every grid point is stored in a column. That's why in
my real program I already transposed the data, so that the correlation
is made column by column. What I finally need is the correlation of each
gridpoint with every other gridpoint. I'm afraid that this results in a
459375*459375 matrix.

The correlation is actually just an interim result. So I'm currently
trying to loop over every gridpoint to get single correlations which
will then be processed further. Is this the right approach?

for column in range(len(data_records)):
for columnnumber in range(len(data_records)):
correlation = corrcoef(data_records[column],
data_records[columnnumber])

Best wished,
Nicole

On 27.03.2012 11:38, Pierre Haessig wrote:
> Hi Nicole,
>
> Le 27/03/2012 11:12, Nicole Stoffels a écrit :
>> *if __name__ == '__main__':
>>
>>     data_records = random.random((459375, 24))
>>     correlation = corrcoef(data_records)*
>
> May I assume that your data_record is made of 24 different variables
> of which you have 459375 observations ?
>
> If this is so and if you expect corrcoeff to return a 24*24 matrix,
> you need to either transpose data_records :
>
> >>> correlation = corrcoef(data_records.T)
>
> or use the rowvar=0 argument (see np.corrcoef or np.cov docstring)
>
> >>> correlation = corrcoef(data_records, rowvar = 0)
>
> Both work on my computer, while your example indeed leads to a
> MemoryError (because shape 459375*459375 would be a decently big
> matrix...)
>
> I don't know if it's your case, but for those used to the Matlab (and
> textbooks) convention of having variables stored in columns, the
> default behaviour of numpy's covariance function is a bit surprising.
> I guess historical reasons are involved in this choice. Just a matter
> of getting used to it !
>
> Best,
> Pierre
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Dipl.-Met. Nicole Stoffels

Wind Power Forecasting and Simulation

ForWind - Center for Wind Energy Research
Institute of Physics
Carl von Ossietzky University Oldenburg

Ammerländer Heerstr. 136
D-26129 Oldenburg

Tel: +49(0)441 798 - 5079
Fax: +49(0)441 798 - 5099

Web  : www.ForWind.de
Email: nicole.stoffels@forwind.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120327/f6070d67/attachment-0001.html