[SciPy-User] Correlation coefficient of large arrays

josef.pktd@gmai... josef.pktd@gmai...
Tue Mar 16 00:56:49 CDT 2010


On Tue, Mar 16, 2010 at 1:39 AM, Vincent Davis <vincent@vincentdavis.net>wrote:

> @Josef
>
> how much memory does a
>
> >>> 230000**2 = 52900000000L  float (double) array take ?
>
>
>
> I guess I don't have a real appreciation for how large this is. I can do
> this numpy.ones((100000,50000),dtype=np.float64) and it uses about 85% of
> the memory I have available. But thats a long ways from 230,000X230,000. Of
> course the array is symmetric.
>
> Is it feasible to do writing it to the disk?
> The end goal is to find the difference between two correlation arrays and
> then calculate the mean of each column. Which then leaves me with an array
> 1X230,000
>

If you don't really care about the correlation matrix itself and only need
the column (or row) sum then I would just loop over it in batches and never
construct the full matrix.
e.g. take the first 1000 variables and calculate the correlation with all
variables (1000 * 230000 -> 1000 for sum)
and loop.
Not using np.corrcoef would avoid some duplicate calculations, but there are
still several intermediate arrays necessary. So maybe using pytables or
similar would still be better to avoid duplicate calculations.

Josef



>
>   *Vincent Davis
> 720-301-3003 *
> vincent@vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
>
> On Mon, Mar 15, 2010 at 11:16 PM, <josef.pktd@gmail.com> wrote:
>
>>
>>
>> On Tue, Mar 16, 2010 at 1:04 AM, Vincent Davis <vincent@vincentdavis.net>wrote:
>>
>>> I have an array 10 observations of 230,000 variables and what to find the
>>> correlation coefficient between each variable.
>>> numpy.corrcef(data) works except I can only do it with about 30,000
>>> variables at a time. numpy.corrcef(data[:30000]). It uses up a lot of
>>> memory.
>>> Is there a better way?
>>>
>>
>>
>> how much memory does a
>> >>> 230000**2
>> 52900000000L
>>
>> float (double) array take ?
>>
>> Josef
>> (I'm not going to try)
>>
>>
>>
>>>
>>>   *Vincent Davis
>>> 720-301-3003 *
>>> vincent@vincentdavis.net
>>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100316/26a18f15/attachment.html 


More information about the SciPy-User mailing list