[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Pauli Virtanen
pav@iki...
Thu Jul 9 04:07:45 CDT 2009
Thu, 09 Jul 2009 09:54:26 +0200, Matthieu Brucher kirjoitti:
> 2009/7/9 Pauli Virtanen <pav+sp@iki.fi>:
[clip]
>> I'm still kind of hoping that it's possible to make some minimal
>> assumptions about CPU caches in general, and have a rule that decides a
>> code path that is good enough, if not optimal.
>
> Unfortunately, this is not possible. We've been playing with blocking
> loops for a long time in finite difference schemes, and it is always
> compiler dependent (that is, the optimal size of the block is bandwidth
> dependent and even operation dependent).
I'm not completely sure about this: the data access pattern in a reduce
operation is in principle relatively simple, and the main focus would be
in improving worst cases rather than being completely optimal. This could
perhaps be achieved with a generic rule that tries to maximize data
locality.
Of course, I may be wrong here...
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list