[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Thu Jul 9 02:54:26 CDT 2009
2009/7/9 Pauli Virtanen <email@example.com>:
> On 2009-07-08, Stéfan van der Walt <firstname.lastname@example.org> wrote:
>> I know very little about cache optimality, so excuse the triviality of
>> this question: Is it possible to design this loop optimally (taking
>> into account certain build-time measurable parameters), or is it the
>> kind of thing that can only be discovered by tuning at compile-time?
>> ATNumPy... scary :-)
> I'm still kind of hoping that it's possible to make some minimal
> assumptions about CPU caches in general, and have a rule that
> decides a code path that is good enough, if not optimal.
Unfortunately, this is not possible. We've been playing with blocking
loops for a long time in finite difference schemes, and it is always
compiler dependent (that is, the optimal size of the block is
bandwidth dependent and even operation dependent).
Information System Engineer, Ph.D.
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
More information about the NumPy-Discussion