[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)
Charles R Harris
Wed Jul 8 18:24:52 CDT 2009
On Wed, Jul 8, 2009 at 5:02 PM, Pauli Virtanen <email@example.com<pav%2Bsp@iki.fi>
> On 2009-07-08, Stéfan van der Walt <firstname.lastname@example.org> wrote:
> > I know very little about cache optimality, so excuse the triviality of
> > this question: Is it possible to design this loop optimally (taking
> > into account certain build-time measurable parameters), or is it the
> > kind of thing that can only be discovered by tuning at compile-time?
> > ATNumPy... scary :-)
> I'm still kind of hoping that it's possible to make some minimal
> assumptions about CPU caches in general, and have a rule that
> decides a code path that is good enough, if not optimal.
> I don't think we want to go the ATNumPy route, or even have
> tunable parameters chosen at build or compile time. (Unless, of
> course, we want to bring a monster into the world -- think about
> cross-breeding distutils with the ATLAS build system :)
Sort of the software version of the Human Fly. Sounds like next summer's
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion