[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Robert Kern robert.kern@gmail....
Wed Jul 8 18:06:55 CDT 2009

On Wed, Jul 8, 2009 at 18:02, Pauli Virtanen<pav+sp@iki.fi> wrote:
> On 2009-07-08, Stéfan van der Walt <stefan@sun.ac.za> wrote:
>> I know very little about cache optimality, so excuse the triviality of
>> this question: Is it possible to design this loop optimally (taking
>> into account certain build-time measurable parameters), or is it the
>> kind of thing that can only be discovered by tuning at compile-time?
>> ATNumPy... scary :-)
> I'm still kind of hoping that it's possible to make some minimal
> assumptions about CPU caches in general, and have a rule that
> decides a code path that is good enough, if not optimal.
> I don't think we want to go the ATNumPy route, or even have
> tunable parameters chosen at build or compile time. (Unless, of
> course, we want to bring a monster into the world -- think about
> cross-breeding distutils with the ATLAS build system :)

I imagine that we could do some or all of this configuration at
runtime. We have a dynamic language. ATLAS does not.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the NumPy-Discussion mailing list