[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal
Sat Mar 20 14:22:57 CDT 2010
On 20 March 2010 14:56, Dag Sverre Seljebotn
> Pauli Virtanen wrote:
>> Anne Archibald wrote:
>>> I'm not knocking numpy; it does (almost) the best it can. (I'm not
>>> sure of the optimality of the order in which ufuncs are executed; I
>>> think some optimizations there are possible.)
>> Ufuncs and reductions are not performed in a cache-optimal fashion, IIRC
>> dimensions are always traversed in order from left to right. Large
>> speedups are possible in some cases, but in a quick try I didn't manage to
>> come up with an algorithm that would always improve the speed (there was a
>> thread about this last year or so, and there's a ticket). Things varied
>> between computers, so this probably depends a lot on the actual cache
>> But perhaps numexpr has such heuristics, and we could steal them?
> At least in MultiIter (and I always assumed ufuncs too, but perhaps not)
> there's functionality to remove the largest dimension so that it can be
> put innermost in a loop. In many situations, removing the dimension with
> the smallest stride from the iterator would probably work much better.
> It's all about balancing iterator overhead and memory overhead. Something
> simple like "select the dimension with length > 200 which has smallest
> stride, or the dimension with largest length if none are above 200" would
> perhaps work well?
There's more to it than that: there's no point (I think) going with
the smallest stride if that stride is more than a cache line (usually
64 bytes). There are further optimizations - often
stride[i]*shape[i]==shape[j] for some (i,j), and in that case these
two dimensions can be treated as one with stride stride[i] and length
shape[i]*shape[j]. Applying this would "unroll" all C- or
Fortran-contiguous arrays to a single non-nested loop.
Of course, reordering ufuncs is potentially hazardous in terms of
breaking user code: the effect of
depends on whether you run through A in index order or reverse index
order (which might be the order it's stored in memory, potentially
faster from a cache point of view).
> Dag Sverre
> NumPy-Discussion mailing list
More information about the NumPy-Discussion