[off topic] Re: [Numpy-discussion] numarray speed - PySequence_GetItem

Tim Hochberg tim.hochberg at cox.net
Tue Jun 29 10:12:38 CDT 2004

Todd Miller wrote:

>On Mon, 2004-06-28 at 17:14, Sebastian Haase wrote:
>> [SNIP]
>>My original question was just this: Does anyone know why numarray is maybe 10 
>>times slower that Numeric with that particular code segment 
>>(PySequence_GetItem) ?
>Well, the short answer is probably: no.
>Looking at the numarray sequence protocol benchmarks in
>Examples/bench.py, and looking at what wxPython is probably doing
>(fetching a 1x2 element array from an Nx2 and then fetching 2 numerical
>values from that)... I can't fully nail it down.  My benchmarks show
>that numarray is 4x slower for fetching the two element array but only
>1.1x slower for the two numbers;  that makes me expect at most 4x
>Noticing the 50k __del__ calls in your profile,  I eliminated __del__
>(breaking numarray) to see if that was the problem;  the ratios changed
>to 2.5x slower and 0.9x slower (actually faster) respectively.
This reminds me, when profiling bits and pieces of my code I've often 
noticed that __del__ chews up a large chunk of time. Is there any 
prospect of this being knocked down at all, or is it inherent in the 
structure of numarray?

>The large number of "Check" routines preceding the numarray path (I
>count 7 looking at my copy of wxPython) has me a little concerned.  I
>think those checks are  more expensive for numarray because it is a new
>style class.  
If that's really a significant slowdown, the culprit's are likely 
PyTuple_Check, PyList_Check and wxPySwigInstance_Check.  
PySequence_Check appears to just be pointer compares and shouldn't 
invoke any new style class machinery. PySequence_Length calls sq_length, 
but appears also to not involve new class machinery. Of these, I think 
PyTuple_Check and PyList_Check could be replaced with PyTuple_CheckExact 
and PyList_CheckExact. This would slow down people using subclasses of 
tuple/list, but speed everyone else up since the latter pair of 
functions are just pointer compares. I think the former group is a very 
small minority, possibly nonexistent, minority, so this would probably 
be acceptable.

I don't see any easy/obvious ways to speed up wxPySwigInstance_Check, 
but I believe that wxPoints now obey the PySequence protocol, so I think 
that the whole wxPySwigInstance_Check branch could be removed. To get 
that into wxPython you'd probably have to convince Robin that it 
wouldn't hurt the speed of list of wxPoints unduly.

Wait... If the above doesn't work, I think I do have a way that might 
work for speeding the check for a wxPoint. Before the loop starts, get a 
pointer to wx.core.Point (the class for wxPoints these days) and call it 
wxPoint_Type. Then just use for the check:
        o->ob_type == &wxPoint_Type
Worth a try anyway.

Unfortunately, I don't have any time to try any of this out right now.

Chris, are you feeling bored?


>I have a hard time imagining a 10x difference overall, 
>but I think Python does have to traverse the numarray class hierarchy
>rather than do a type pointer comparison so they are more expensive.

More information about the Numpy-discussion mailing list