[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal
Francesc Alted
faltet@pytables....
Thu Mar 18 12:53:00 CDT 2010
A Thursday 18 March 2010 16:26:09 Anne Archibald escrigué:
> Speak for your own CPUs :).
>
> But seriously, congratulations on the wide publication of the article;
> it's an important issue we often don't think enough about. I'm just a
> little snarky because this exact issue came up for us recently - a
> visiting astro speaker put it as "flops are free" - and so I did some
> tests and found that even without optimizing for memory access, our
> tasks are already CPU-bound:
> http://lighthouseinthesky.blogspot.com/2010/03/flops.html
Well, I thought that my introduction was enough to convince anybody about the
problem, but forgot that you, the scientists, always try to demonstrate things
experimentally :-/
Seriously, your example is a clear example of what I'm recommending in the
article, i.e. always try to use libraries that are already leverage the
blocking technique (that is, taking advantage of both temporal and spatial
locality). Don't know about FFTW (never used it, sorry), but after having a
look at its home page, I'm pretty convinced that its authors are very
conscious about these techniques.
Being said this, it seems that, in addition, you are applying the blocking
technique yourself also: get the data in bunches (256 floating point elements,
which fits perfectly well on modern L1 caches), apply your computation (in
this case, FFTW) and put the result back in memory. A perfect example of what
I wanted to show to the readers so, congratulations! you made it without the
need to read my article (so perhaps the article was not so necessary after all
:-)
> In terms of specifics, I was a little surprised you didn't mention
> FFTW among your software tools that optimize memory access. FFTW's
> planning scheme seems ideal for ensuring memory locality, as much as
> possible, during large FFTs. (And in fact I also found that for really
> large FFTs, reducing padding - memory size - at the cost of a
> non-power-of-two size was also worth it.)
I must say that I'm quite naïve in many existing great tools for scientific
computing. What I know, is that when I need to do something I always look for
good existing tools first. So this is basically why I spoke about numexpr and
BLAS/LAPACK: I know them well.
> Heh. Indeed numexpr is a good tool for this sort of thing; it's an
> unfortunate fact that simple use of numpy tends to do operations in
> the pessimal order...
Well, to honor the truth, NumPy does not have control in the order of the
operations in expressions and how temporaries are managed: it is Python who
decides that. NumPy only can do what Python wants it to do, and do it as good
as possible. And NumPy plays its role reasonably well here, but of course,
this is not enough for providing performance. In fact, this problem probably
affects to all interpreted languages out there, unless they implement a JIT
compiler optimised for evaluating expressions --and this is basically what
numexpr is.
Anyway, thanks for constructive criticism, I really appreciate it!
--
Francesc Alted
More information about the NumPy-Discussion
mailing list