[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Wed Jul 8 18:02:47 CDT 2009
On 2009-07-08, Stéfan van der Walt <email@example.com> wrote:
> I know very little about cache optimality, so excuse the triviality of
> this question: Is it possible to design this loop optimally (taking
> into account certain build-time measurable parameters), or is it the
> kind of thing that can only be discovered by tuning at compile-time?
> ATNumPy... scary :-)
I'm still kind of hoping that it's possible to make some minimal
assumptions about CPU caches in general, and have a rule that
decides a code path that is good enough, if not optimal.
I don't think we want to go the ATNumPy route, or even have
tunable parameters chosen at build or compile time. (Unless, of
course, we want to bring a monster into the world -- think about
cross-breeding distutils with the ATLAS build system :)
More information about the NumPy-Discussion