[Numpy-discussion] Profiling numpy ? (parts written in C)

Francesc Altet faltet at carabos.com
Wed Dec 20 13:09:01 CST 2006


A Dimecres 20 Desembre 2006 19:32, Andrew Straw escrigué:
> I added a ticket for Francesc's enhancement:
> http://projects.scipy.org/scipy/numpy/ticket/403

Thanks Andrew, but I realized that my patch is not safe for dealing
with unaligned arrays (Sun machines would segfault). After thinking
several alternatives, I've ended modifying the iter_subscript_*
funtions instead (see the new patch below).

For this, I've created a small function named assign_behaved() that
only will get called when the arrays source and destination are well
behaved (i.e. aligned and in native byteorder), and small enough so
that optimizers can easily inline it (this is key so as to achieve the
new speed-up).

The results are quite good, as one can achieve almost a 2x of speedup
over the original functions.

Here is the time for a.flat[b] (in original numpy):

         862 function calls in 3.482 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.325    3.325    3.482    3.482 prova.py:31(bench_take)
        1    0.133    0.133    0.133    0.133 {numpy.core.multiarray.array}
      257    0.017    0.000    0.017    0.000 {map}


and here with the new patch applied:

         862 function calls in 1.815 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.662    1.662    1.815    1.815 prova.py:31(bench_take)
        1    0.131    0.131    0.131    0.131 {numpy.core.multiarray.array}
      257    0.016    0.000    0.016    0.000 {map}


We can compare this against the original take(a, b):

         2862 function calls in 7.030 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    6.792    0.007    6.792    0.007 {method 'take' 
of 'numpy.ndarray' objects}
        1    0.142    0.142    0.142    0.142 {numpy.core.multiarray.array}
        1    0.063    0.063    7.030    7.030 prova.py:31(bench_take)

So, the iterator approach plus the patch is more than 4x faster.

Given these results, the iterator provided by Travis is becoming very
useful for dealing with a wider range of situations, without loosing
performance (or even drastically achieving quite more, like in the
above example).

Below is the patch. I've checked that it passes all the tests in
numpy, but still, maybe Travis could see if I forgot something
important. Also, it would be nice to look into another places in the
code that can benefit of the new assig_behaved() function.

Index: numpy/core/src/arrayobject.c
===================================================================
--- numpy/core/src/arrayobject.c	(revision 3487)
+++ numpy/core/src/arrayobject.c	(working copy)
@@ -8988,6 +8988,19 @@
         return self->size;
 }
 
+/* Specific function that accelerates the copy of some types through 
assignments */
+static void
+assign_behaved(void *dest, void *src, size_t itemsize)
+{
+  switch (itemsize) {
+  case 1: *((npy_int8 *)dest) = *((npy_int8 *)src); break;
+  case 2: *((npy_int16 *)dest) = *((npy_int16 *)src); break;
+  case 4: *((npy_int32 *)dest) = *((npy_int32 *)src); break;
+    /* npy_float64 is more efficient than npy_int64 in assignments */
+  case 8: *((npy_float64 *)dest) = *((npy_float64 *)src); break;
+  default: memcpy(dest, src, itemsize); break;
+  }
+}
 
 static PyObject *
 iter_subscript_Bool(PyArrayIterObject *self, PyArrayObject *ind)
@@ -8996,7 +9009,7 @@
         intp count=0;
         char *dptr, *optr;
         PyObject *r;
-        int swap;
+        int swap, isbehaved;
         PyArray_CopySwapFunc *copyswap;
 
 
@@ -9038,6 +9051,9 @@
         swap = (PyArray_ISNOTSWAPPED(self->ao) != PyArray_ISNOTSWAPPED(r));
         while(index--) {
                 if (*((Bool *)dptr) != 0) {
+		  if (isbehaved)
+		      assign_behaved(optr, self->dataptr, itemsize);
+		  else
                         copyswap(optr, self->dataptr, swap, self->ao);
                         optr += itemsize;
                 }
@@ -9055,9 +9071,9 @@
         PyObject *r;
         PyArrayIterObject *ind_it;
         int itemsize;
-        int swap;
+        int swap, isbehaved;
         char *optr;
-        int index;
+        int index;		/* shouldn't be intp? */
         PyArray_CopySwapFunc *copyswap;
 
         itemsize = self->ao->descr->elsize;
@@ -9092,6 +9108,7 @@
         index = ind_it->size;
         copyswap = PyArray_DESCR(r)->f->copyswap;
         swap = (PyArray_ISNOTSWAPPED(r) != PyArray_ISNOTSWAPPED(self->ao));
+        isbehaved = PyArray_ISBEHAVED(r) && PyArray_ISBEHAVED_RO(self->ao);
         while(index--) {
                 num = *((intp *)(ind_it->dataptr));
                 if (num < 0) num += self->size;
@@ -9106,7 +9123,10 @@
                         return NULL;
                 }
                 PyArray_ITER_GOTO1D(self, num);
-                copyswap(optr, self->dataptr, swap, r);
+	       if (isbehaved)
+		  assign_behaved(optr, self->dataptr, itemsize);
+                else
+		  copyswap(optr, self->dataptr, swap, r);
                 optr += itemsize;
                 PyArray_ITER_NEXT(ind_it);
         }
----------------------------------------------------------------------

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the Numpy-discussion mailing list