[SciPy-user] Dealing with Large Data Sets

Damian Eads eads@soe.ucsc....
Sun May 11 02:38:50 CDT 2008


Anne Archibald wrote:
> 2008/5/10 Damian Eads <eads@soe.ucsc.edu>:
>> Damian Eads wrote:
>>
>>> which perform the operations in an in-place fashion. If data.sum(axis =
>>> 2) is large, preallocate an array to store the sum,
>>>
>>>    # for summing over columns
>>>    sum_result = numpy.zeros(data.shape[0:2])
>> I meant to include
>>
>>    data **= 2
>>    np.sum(data, axis=2, out=sum_result)
>>
>> which does an in-place, element-wise exponentiate, sums over the
>> columns, and stores the result in sum_result.
> 
> What is the advantage to preallocating the result rather than letting
> sum() do the allocation?

If the computation is repeated millions of times and the sum array is 
large (100s of MBs), then it is certainly advantageous to allocate the 
sum array once than for each computation.

Damian


More information about the SciPy-user mailing list