[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

Anne Archibald peridot.faceted@gmail....
Sat Mar 20 09:20:38 CDT 2010


On 20 March 2010 06:32, Francesc Alted <faltet@pytables.org> wrote:
> A Friday 19 March 2010 18:13:33 Anne Archibald escrigué:
> [clip]
>> What I didn't go into in detail in the article was that there's a
>> trade-off of processing versus memory access available: we could
>> reduce the memory load by a factor of eight by doing interpolation on
>> the fly instead of all at once in a giant FFT. But that would cost
>> cache space and flops, and we're not memory-dominated.
>>
>> One thing I didn't try, and should: running four of these jobs at once
>> on a four-core machine. If I correctly understand the architecture,
>> that won't affect the cache issues, but it will effectively quadruple
>> the memory bandwidth needed, without increasing the memory bandwidth
>> available. (Which, honestly, makes me wonder what the point is of
>> building multicore machines.)
>>
>> Maybe I should look into that interpolation stuff.
>
> Please do.  Although you may be increasing the data rate by 4x, your program
> is already very efficient in how it handles data, so chances are that you
> still get a good speed-up.  I'd glad to hear you back on your experience.

The thing is, it reduces the data rate from memory, but at the cost of
additional FFTs (to implement convolutions). If my program is already
spending all its time doing FFTs, and the loads from memory are
happening while the CPU does FFTs, then there's no win in runtime from
reducing the memory load, and there's a runtime cost of doing those
convolutions - not just more flops but also more cache pressure (to
store the interpolated array and the convolution kernels). One could
go a further step and do interpolation directly, without convolution,
but that adds really a lot of flops, which translates directly to
runtime.

On the other hand, if it doesn't completely blow out the cache, we do
have non-interpolated FFTs already on disk (with red noise adjustments
already applied), so we might save on the relatively minor cost of the
giant FFT. I'll have to do some time trials.


Anne

> --
> Francesc Alted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list