[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
Matthieu Brucher
matthieu.brucher@gmail....
Tue Feb 15 10:25:00 CST 2011
Use directly restrict in C99 mode (__restrict does not have exactly the same
semantics).
For a valgrind profil, you can check my blog (
http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/)
Basically, if you have a python script, you can valgrind --optionsinmyblog
python myscript.py
For PAPI, you have to install several packages (perf module for kernel for
instance) and a GUI to analyze the results (in Eclispe, it should be
possible).
Matthieu
2011/2/15 Sebastian Haase <seb.haase@gmail.com>
> Thanks Matthieu,
> using __restrict__ with g++ did not change anything. How do I use
> valgrind with C extensions?
> I don't know what "PAPI profil" is ...?
> -Sebastian
>
>
> On Tue, Feb 15, 2011 at 4:54 PM, Matthieu Brucher
> <matthieu.brucher@gmail.com> wrote:
> > Hi,
> > My first move would be to add a restrict keyword to dist (i.e. dist is
> the
> > only pointer to the specific memory location), and then declare dist_
> inside
> > the first loop also with a restrict.
> > Then, I would run valgrind or a PAPI profil on your code to see what
> causes
> > the issue (false sharing, ...)
> > Matthieu
> >
> > 2011/2/15 Sebastian Haase <seb.haase@gmail.com>
> >>
> >> Hi,
> >> I assume that someone here could maybe help me, and I'm hoping it's
> >> not too much off topic.
> >> I have 2 arrays of 2d point coordinates and would like to calculate
> >> all pairwise distances as fast as possible.
> >> Going from Python/Numpy to a (Swigged) C extension already gave me a
> >> 55x speedup.
> >> (.9ms vs. 50ms for arrays of length 329 and 340).
> >> I'm using gcc on Linux.
> >> Now I'm wondering if I could go even faster !?
> >> My hope that the compiler might automagically do some SSE2
> >> optimization got disappointed.
> >> Since I have a 4 core CPU I thought OpenMP might be an option;
> >> I never used that, and after some playing around I managed to get
> >> (only) 50% slowdown(!) :-(
> >>
> >> My code in short is this:
> >> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
> >> -------<Ccode> ----------
> >> void dists2d(
> >> double *a_ps, int nx1, int na,
> >> double *b_ps, int nx2, int nb,
> >> double *dist, int nx3, int ny3) throw (char*)
> >> {
> >> if(nx1 != 2) throw (char*) "a must be of shape (n,2)";
> >> if(nx2 != 2) throw (char*) "b must be of shape (n,2)";
> >> if(nx3 != nb || ny3 != na) throw (char*) "c must be of shape
> (na,nb)";
> >>
> >> double *dist_;
> >> int i, j;
> >>
> >> #pragma omp parallel private(dist_, j, i)
> >> {
> >> #pragma omp for nowait
> >> for(i=0;i<na;i++)
> >> {
> >> //num_threads=omp_get_num_threads(); --> 4
> >> dist_ = dist+i*nb; // dists_ is only
> >> introduced for OpenMP
> >> for(j=0;j<nb;j++)
> >> {
> >> *dist_++ = sqrt( sq(a_ps[i*nx1] - b_ps[j*nx2])
> +
> >>
> sq(a_ps[i*nx1+1]
> >> - b_ps[j*nx2+1]) );
> >> }
> >> }
> >> }
> >> }
> >> -------</Ccode> ----------
> >> There is probably a simple mistake in this code - as I said I never
> >> used OpenMP before.
> >> It should be not too difficult to use OpenMP correctly here
> >> or - maybe better -
> >> is there a simple SSE(2,3,4) version that might be even better than
> >> OpenMP... !?
> >>
> >> I supposed, that I did not get the #pragma omp lines right - any idea ?
> >> Or is it in general not possible to speed this kind of code up using
> >> OpenMP !?
> >>
> >> Thanks,
> >> Sebastian Haase
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > --
> > Information System Engineer, Ph.D.
> > Blog: http://matt.eifelle.com
> > LinkedIn: http://www.linkedin.com/in/matthieubrucher
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110215/778e7856/attachment.html
More information about the NumPy-Discussion
mailing list