[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Pauli Virtanen pav@iki...
Thu Jul 9 04:07:45 CDT 2009


Thu, 09 Jul 2009 09:54:26 +0200, Matthieu Brucher kirjoitti:
> 2009/7/9 Pauli Virtanen <pav+sp@iki.fi>:
[clip]
>> I'm still kind of hoping that it's possible to make some minimal
>> assumptions about CPU caches in general, and have a rule that decides a
>> code path that is good enough, if not optimal.
> 
> Unfortunately, this is not possible. We've been playing with blocking
> loops for a long time in finite difference schemes, and it is always
> compiler dependent (that is, the optimal size of the block is bandwidth
> dependent and even operation dependent).

I'm not completely sure about this: the data access pattern in a reduce 
operation is in principle relatively simple, and the main focus would be 
in improving worst cases rather than being completely optimal. This could 
perhaps be achieved with a generic rule that tries to maximize data 
locality.

Of course, I may be wrong here...

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list