[Numpy-discussion] Speeding up Numeric
Francesc Altet
faltet at carabos.com
Thu Jan 27 12:48:40 CST 2005
Hi,
After a while of waiting for some free time, I'm playing myself with
the excellent oprofile, and try to help in reducing numarray creation.
For that goal, I selected the next small benchmark:
import numarray
a = numarray.arange(2000)
a.shape=(1000,2)
for j in xrange(1000):
for i in range(len(a)):
row=a[i]
I know that it mixes creation with indexing cost, but as the indexing
cost of numarray is only a bit slower (perhaps a 40%) than Numeric,
while array creation time is 5 to 10 times slower, I think this
benchmark may provide a good starting point to see what's going on.
For numarray, I've got the next results:
samples % image name symbol name
902 7.3238 python PyEval_EvalFrame
835 6.7798 python lookdict_string
408 3.3128 python PyObject_GenericGetAttr
384 3.1179 python PyDict_GetItem
383 3.1098 libc-2.3.2.so memcpy
358 2.9068 libpthread-0.10.so __pthread_alt_unlock
293 2.3790 python _PyString_Eq
273 2.2166 libnumarray.so NA_updateStatus
273 2.2166 python PyType_IsSubtype
271 2.2004 python countformat
252 2.0461 libc-2.3.2.so memset
249 2.0218 python string_hash
248 2.0136 _ndarray.so _universalIndexing
while for Numeric I've got this:
samples % image name symbol name
279 15.6478 libpthread-0.10.so __pthread_alt_unlock
216 12.1144 libc-2.3.2.so memmove
187 10.4879 python lookdict_string
162 9.0858 python PyEval_EvalFrame
144 8.0763 libpthread-0.10.so __pthread_alt_lock
126 7.0667 libpthread-0.10.so __pthread_alt_trylock
56 3.1408 python PyDict_SetItem
53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock
45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr
39 2.1873 libc-2.3.2.so __malloc
36 2.0191 libc-2.3.2.so __cfree
one preliminary result is that numarray spends a lot more time in
Python space than do Numeric, as Todd already said here. The problem
is that, as I have not yet patched my kernel, I can't get the call
tree, and I can't look for the ultimate responsible for that.
So, I've tried to run the profile module included in the standard
library in order to see which are the hot spots in python:
$ time ~/python.nobackup/Python-2.4/python -m profile -s time
create-numarray.py
1016105 function calls (1016064 primitive calls) in 25.290 CPU
seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 19.220 19.220 25.290 25.290 create-numarray.py:1(?)
999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__)
1753 0.160 0.000 0.160 0.000 :0(eval)
1 0.060 0.060 0.340 0.340 numarraycore.py:3(?)
1 0.050 0.050 0.390 0.390 generic.py:8(?)
1 0.040 0.040 0.490 0.490 numarrayall.py:1(?)
3455 0.040 0.000 0.040 0.000 :0(len)
1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict)
51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs)
3572 0.030 0.000 0.030 0.000 :0(has_key)
2582 0.020 0.000 0.020 0.000 :0(append)
1000 0.020 0.000 0.020 0.000 :0(range)
1 0.010 0.010 0.010 0.010 generic.py:510
(_stridesFromShape)
42/1 0.010 0.000 25.290 25.290 <string>:1(?)
but, to say the truth, I can't really see where the time is exactly
consumed. Perhaps somebody with more experience can put more light on
this?
Another thing that I find intriguing has to do with Numeric and
oprofile output. Let me remember:
samples % image name symbol name
279 15.6478 libpthread-0.10.so __pthread_alt_unlock
216 12.1144 libc-2.3.2.so memmove
187 10.4879 python lookdict_string
162 9.0858 python PyEval_EvalFrame
144 8.0763 libpthread-0.10.so __pthread_alt_lock
126 7.0667 libpthread-0.10.so __pthread_alt_trylock
56 3.1408 python PyDict_SetItem
53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock
45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr
39 2.1873 libc-2.3.2.so __malloc
36 2.0191 libc-2.3.2.so __cfree
we can see that a lot of the time in the benchmark using Numeric is
consumed in libc space (a 37% or so). However, only a 16% is used in
memory-related tasks (memmove, malloc and free) while the rest seems
to be used in thread issues (??). Again, anyone can explain why the
pthread* routines take so many time, or why they appear here at all?.
Perhaps getting rid of these calls might improve the Numeric
performance even further.
Cheers,
--
>qo< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
""
More information about the Numpy-discussion
mailing list