[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
Sat Feb 19 11:50:02 CST 2011
On Sat, 19 Feb 2011 18:13:44 +0100, Sebastian Haase wrote:
> Thanks a lot. Very informative. I guess what you say about "cache line
> is dirtied" is related to the info I got with valgrind (see my email in
> this thread: L1 Data Write Miss 3636). Can one assume that the cache
> line is always a few mega bytes ?
Cache lines are typically much smaller, 16-512 bytes.
In this specific case, since the stride of the `i` loop is only
2*sizeof(float) = 16 bytes << cache line size, threads running with
different `i` tend to write to the same cache lines.
More information about the NumPy-Discussion