[Numpy-discussion] Speeding up Numeric

Francesc Altet faltet at carabos.com
Thu Jan 27 12:48:40 CST 2005


Hi,

After a while of waiting for some free time, I'm playing myself with
the excellent oprofile, and try to help in reducing numarray creation.

For that goal, I selected the next small benchmark:

import numarray
a = numarray.arange(2000)
a.shape=(1000,2)
for j in xrange(1000):
    for i in range(len(a)):
        row=a[i]

I know that it mixes creation with indexing cost, but as the indexing
cost of numarray is only a bit slower (perhaps a 40%) than Numeric,
while array creation time is 5 to 10 times slower, I think this
benchmark may provide a good starting point to see what's going on.

For numarray, I've got the next results:

samples  %        image name               symbol name
902       7.3238  python                   PyEval_EvalFrame
835       6.7798  python                   lookdict_string
408       3.3128  python                   PyObject_GenericGetAttr
384       3.1179  python                   PyDict_GetItem
383       3.1098  libc-2.3.2.so            memcpy
358       2.9068  libpthread-0.10.so       __pthread_alt_unlock
293       2.3790  python                   _PyString_Eq
273       2.2166  libnumarray.so           NA_updateStatus
273       2.2166  python                   PyType_IsSubtype
271       2.2004  python                   countformat
252       2.0461  libc-2.3.2.so            memset
249       2.0218  python                   string_hash
248       2.0136  _ndarray.so              _universalIndexing

while for Numeric I've got this:

samples  %        image name               symbol name
279      15.6478  libpthread-0.10.so       __pthread_alt_unlock
216      12.1144  libc-2.3.2.so            memmove
187      10.4879  python                   lookdict_string
162       9.0858  python                   PyEval_EvalFrame
144       8.0763  libpthread-0.10.so       __pthread_alt_lock
126       7.0667  libpthread-0.10.so       __pthread_alt_trylock
56        3.1408  python                   PyDict_SetItem
53        2.9725  libpthread-0.10.so       __GI___pthread_mutex_unlock
45        2.5238  _numpy.so                PyArray_FromDimsAndDataAndDescr
39        2.1873  libc-2.3.2.so            __malloc
36        2.0191  libc-2.3.2.so            __cfree

one preliminary result is that numarray spends a lot more time in
Python space than do Numeric, as Todd already said here. The problem
is that, as I have not yet patched my kernel, I can't get the call
tree, and I can't look for the ultimate responsible for that.

So, I've tried to run the profile module included in the standard
library in order to see which are the hot spots in python:

$ time ~/python.nobackup/Python-2.4/python -m profile -s time 
create-numarray.py
         1016105 function calls (1016064 primitive calls) in 25.290 CPU 
seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   19.220   19.220   25.290   25.290 create-numarray.py:1(?)
   999999    5.530    0.000    5.530    0.000 numarraycore.py:514(__del__)
     1753    0.160    0.000    0.160    0.000 :0(eval)
        1    0.060    0.060    0.340    0.340 numarraycore.py:3(?)
        1    0.050    0.050    0.390    0.390 generic.py:8(?)
        1    0.040    0.040    0.490    0.490 numarrayall.py:1(?)
     3455    0.040    0.000    0.040    0.000 :0(len)
        1    0.030    0.030    0.190    0.190 ufunc.py:1504(_makeCUFuncDict)
       51    0.030    0.001    0.070    0.001 ufunc.py:184(_nIOArgs)
     3572    0.030    0.000    0.030    0.000 :0(has_key)
     2582    0.020    0.000    0.020    0.000 :0(append)
     1000    0.020    0.000    0.020    0.000 :0(range)
        1    0.010    0.010    0.010    0.010 generic.py:510
(_stridesFromShape)
     42/1    0.010    0.000   25.290   25.290 <string>:1(?)

but, to say the truth, I can't really see where the time is exactly
consumed. Perhaps somebody with more experience can put more light on
this?

Another thing that I find intriguing has to do with Numeric and
oprofile output. Let me remember:

samples  %        image name               symbol name
279      15.6478  libpthread-0.10.so       __pthread_alt_unlock
216      12.1144  libc-2.3.2.so            memmove
187      10.4879  python                   lookdict_string
162       9.0858  python                   PyEval_EvalFrame
144       8.0763  libpthread-0.10.so       __pthread_alt_lock
126       7.0667  libpthread-0.10.so       __pthread_alt_trylock
56        3.1408  python                   PyDict_SetItem
53        2.9725  libpthread-0.10.so       __GI___pthread_mutex_unlock
45        2.5238  _numpy.so                PyArray_FromDimsAndDataAndDescr
39        2.1873  libc-2.3.2.so            __malloc
36        2.0191  libc-2.3.2.so            __cfree

we can see that a lot of the time in the benchmark using Numeric is
consumed in libc space (a 37% or so). However, only a 16% is used in
memory-related tasks (memmove, malloc and free) while the rest seems
to be used in thread issues (??). Again, anyone can explain why the
pthread* routines take so many time, or why they appear here at all?.
Perhaps getting rid of these calls might improve the Numeric
performance even further.

Cheers,

-- 
>qo<   Francesc Altet     http://www.carabos.com/
V  V   Cárabos Coop. V.   Enjoy Data
 ""





More information about the Numpy-discussion mailing list