# [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Sebastian Haase seb.haase@gmail....
Wed Feb 16 03:19:59 CST 2011

```Chris,
OK, sorry -- I miss read (cdist doc says A and B must have same number
of "columns"(!) not "rows").
On my machine I got the exact same timing as my (non OpenMP) C code.
That is really got, compared to normal ufunc based numpy code.
But my question in this thread is, how to get better than that,  using either
SSE or OpenMP (or CUDA?)

Thanks again for telling me about scipy.distance,
Sebastian

On Wed, Feb 16, 2011 at 2:28 AM, Chris Colbert <sccolbert@gmail.com> wrote:
>
> The `cdist` function in scipy spatial does what you want, and takes ~ 1ms on
> my machine.
>
> In [1]: import numpy as np
> In [2]: from scipy.spatial.distance import cdist
> In [3]: a = np.random.random((340, 2))
> In [4]: b = np.random.random((329, 2))
> In [5]: c = cdist(a, b)
> In [6]: c.shape
> Out[6]: (340, 329)
> In [7]: %timeit cdist(a, b)
> 1000 loops, best of 3: 1.18 ms per loop
>
>
> On Tue, Feb 15, 2011 at 4:42 PM, Sebastian Haase <seb.haase@gmail.com>
> wrote:
>>
>> Hi Eat,
>> I will surely try these routines tomorrow,
>> but I still think that neither scipy function does the complete
>> distance calculation of all possible pairs as done by my C code.
>> For 2 arrays, X and Y, of nX and nY 2d coordinates respectively, I
>> need to get nX times nY distances computed.
>> >From the online documentation I understand that
>> pdist calculates something like nX square distances for a single array
>> X of nX coordinates,
>> and cdist would calculate nX distances, where nX is required to equal nY.
>>
>> Thanks for looking into this,
>> Sebastian
>>
>> On Tue, Feb 15, 2011 at 9:11 PM, eat <e.antero.tammi@gmail.com> wrote:
>> > Hi,
>> >
>> > On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase <seb.haase@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >> I assume that someone here could maybe help me, and I'm hoping it's
>> >> not too much off topic.
>> >> I have 2 arrays of 2d point coordinates and would like to calculate
>> >> all pairwise distances as fast as possible.
>> >> Going from Python/Numpy to a (Swigged) C extension already gave me a
>> >> 55x speedup.
>> >> (.9ms vs. 50ms for arrays of length 329 and 340).
>> >
>> > With my very modest machine (Intel Dual CPU E2200, 2.2 Ghz) utilizing
>> > scipy.spatial.distance.pdist will take some 1.5 ms for such arrays. Even
>> > the
>> > simple linear algebra based full matrix calculations can be done less
>> > than 5
>> > ms.
>> >
>> > My two cents,
>> > eat
>> >>
>> >> I'm using gcc on Linux.
>> >> Now I'm wondering if I could go even faster !?
>> >> My hope that the compiler might automagically do some SSE2
>> >> optimization got disappointed.
>> >> Since I have a 4 core CPU I thought OpenMP might be an option;
>> >> I never used that, and after some playing around I managed to get
>> >> (only) 50% slowdown(!) :-(
>> >>
>> >> My code in short is this:
>> >> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
>> >> -------<Ccode> ----------
>> >> void dists2d(
>> >>                   double *a_ps, int nx1, int na,
>> >>                   double *b_ps, int nx2, int nb,
>> >>                   double *dist, int nx3, int ny3)  throw (char*)
>> >> {
>> >>  if(nx1 != 2)  throw (char*) "a must be of shape (n,2)";
>> >>  if(nx2 != 2)  throw (char*) "b must be of shape (n,2)";
>> >>  if(nx3 != nb || ny3 != na)    throw (char*) "c must be of shape
>> >> (na,nb)";
>> >>
>> >>  double *dist_;
>> >>  int i, j;
>> >>
>> >> #pragma omp parallel private(dist_, j, i)
>> >>  {
>> >> #pragma omp for nowait
>> >>        for(i=0;i<na;i++)
>> >>          {
>> >>                dist_ = dist+i*nb;                 // dists_  is  only
>> >> introduced for OpenMP
>> >>                for(j=0;j<nb;j++)
>> >>                  {
>> >>                        *dist_++  = sqrt( sq(a_ps[i*nx1]   -
>> >> b_ps[j*nx2]) +
>> >>
>> >>  sq(a_ps[i*nx1+1]
>> >> - b_ps[j*nx2+1]) );
>> >>                  }
>> >>          }
>> >>  }
>> >> }
>> >> -------</Ccode> ----------
>> >> There is probably a simple mistake in this code - as I said I never
>> >> used OpenMP before.
>> >> It should be not too difficult to use OpenMP correctly here
>> >>  or -  maybe better -
>> >> is there a simple SSE(2,3,4) version that might be even better than
>> >> OpenMP... !?
>> >>
>> >> I supposed, that I did not get the #pragma omp lines right - any idea ?
>> >> Or is it in general not possible to speed this kind of code up using
>> >> OpenMP !?
>> >>
>> >> Thanks,
>> >> Sebastian Haase
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
```