[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Pauli Virtanen pav@iki...
Sat Feb 19 11:50:02 CST 2011


On Sat, 19 Feb 2011 18:13:44 +0100, Sebastian Haase wrote:
> Thanks a lot. Very informative. I guess what you say about "cache line
> is dirtied" is related to the info I got with valgrind (see my email in
> this thread: L1 Data Write Miss 3636). Can one assume that the cache
> line is always a few mega bytes ?

Cache lines are typically much smaller, 16-512 bytes.

In this specific case, since the stride of the `i` loop is only 
2*sizeof(float) = 16 bytes << cache line size, threads running with 
different `i` tend to write to the same cache lines.

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list