[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Matthieu Brucher
matthieu.brucher@gmail....
Thu Jul 9 02:54:26 CDT 2009
2009/7/9 Pauli Virtanen <pav+sp@iki.fi>:
> On 2009-07-08, Stéfan van der Walt <stefan@sun.ac.za> wrote:
>> I know very little about cache optimality, so excuse the triviality of
>> this question: Is it possible to design this loop optimally (taking
>> into account certain build-time measurable parameters), or is it the
>> kind of thing that can only be discovered by tuning at compile-time?
>> ATNumPy... scary :-)
>
> I'm still kind of hoping that it's possible to make some minimal
> assumptions about CPU caches in general, and have a rule that
> decides a code path that is good enough, if not optimal.
Unfortunately, this is not possible. We've been playing with blocking
loops for a long time in finite difference schemes, and it is always
compiler dependent (that is, the optimal size of the block is
bandwidth dependent and even operation dependent).
Matthieu
--
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
More information about the NumPy-Discussion
mailing list