[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Charles R Harris
Sun Mar 23 01:18:34 CDT 2008
On Sat, Mar 22, 2008 at 10:59 PM, David Cournapeau <
> Charles R Harris wrote:
> > It looks like memory access is the bottleneck, otherwise running 4
> > floats through in parallel should go a lot faster. I need to modify
> > the program a bit and see how it works for doubles.
> I am not sure the benchmark is really meaningful: it does not uses
> aligned buffers (16 bytes alignement), and because of that, does not
> give a good idea of what can be expected from SSE. It shows why it is
> not so easy to get good performances, and why just throwing a few
> optimized loops won't work, though. Using sse/sse2 from unaligned
> buffers is a waste of time. Without this alignement, you need to take
> into account the alignement (using _mm_loadu_ps vs _mm_load_ps), and
> that's extremely slow, basically killing most of the speed increase you
> can expect from using sse.
Yep, but I expect the compilers to take care of alignment, say by inserting
a few single ops when needed. So I would just as soon leave vectorization to
the compilers and wait until they get that good. The only thing needed then
is contiguous data.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion