[off topic] Re: [Numpy-discussion] numarray speed - PySequence_GetItem

Todd Miller jmiller at stsci.edu
Tue Jun 29 09:05:49 CDT 2004

On Mon, 2004-06-28 at 20:45, Tim Hochberg wrote:
> Todd Miller wrote:
> >On Mon, 2004-06-28 at 17:14, Sebastian Haase wrote:
> >  
> >
> >> [SNIP]
> >>
> >>My original question was just this: Does anyone know why numarray is maybe 10 
> >>times slower that Numeric with that particular code segment 
> >>(PySequence_GetItem) ?
> >>    
> >>
> >
> >Well, the short answer is probably: no.
> >
> >Looking at the numarray sequence protocol benchmarks in
> >Examples/bench.py, and looking at what wxPython is probably doing
> >(fetching a 1x2 element array from an Nx2 and then fetching 2 numerical
> >values from that)... I can't fully nail it down.  My benchmarks show
> >that numarray is 4x slower for fetching the two element array but only
> >1.1x slower for the two numbers;  that makes me expect at most 4x
> >slower.  
> >
> >Noticing the 50k __del__ calls in your profile,  I eliminated __del__
> >(breaking numarray) to see if that was the problem;  the ratios changed
> >to 2.5x slower and 0.9x slower (actually faster) respectively.
> >  
> >
> This reminds me, when profiling bits and pieces of my code I've often 
> noticed that __del__ chews up a large chunk of time. Is there any 
> prospect of this being knocked down at all, or is it inherent in the 
> structure of numarray?

__del__ is IMHO the elegant way to do numarray's shadowing of
"misbehaved arrays".  misbehaved arrays are ones which don't meet the
requirements of a particular C-function, but generally that means
noncontiguous, byte-swapped, misaligned, or of the wrong type;  it also
can mean some other sequence type like a list or tuple.  I think using
the destructor is "necessary" for maintaining Numeric compatibility in C
because you can generally count on arrays being DECREF'd,  but obviously
you couldn't count on some new API call being called.  

__del__ used to be implemented in C as tp_dealloc,  but I was running
into segfaults which I tracked down to the order in which a new style
class instance is torn down.  The purpose of __del__ is to copy the
contents of a well behaved working array (the shadow) back onto the
original mis-behaved array.  The problem was that, because of the
numarray class hierarchy, critical pieces of the shadow (the instance
dictionary) had already been torn down before the tp_dealloc was
called.  The only way I could think of to fix it was to move the
destructor farther down in the class hierarchy, i.e. from
_numarray.tp_dealloc to NumArray.__del__ in Python.

If anyone can think of a way to get rid of __del__, I'm all for it.

> >The large number of "Check" routines preceding the numarray path (I
> >count 7 looking at my copy of wxPython) has me a little concerned.  I
> >think those checks are  more expensive for numarray because it is a new
> >style class.  
> >
> If that's really a significant slowdown, the culprit's are likely 
> PyTuple_Check, PyList_Check and wxPySwigInstance_Check.  
> PySequence_Check appears to just be pointer compares and shouldn't 
> invoke any new style class machinery. PySequence_Length calls sq_length, 
> but appears also to not involve new class machinery. Of these, I think 
> PyTuple_Check and PyList_Check could be replaced with PyTuple_CheckExact 
> and PyList_CheckExact. This would slow down people using subclasses of 
> tuple/list, but speed everyone else up since the latter pair of 
> functions are just pointer compares. I think the former group is a very 
> small minority, possibly nonexistent, minority, so this would probably 
> be acceptable.
> I don't see any easy/obvious ways to speed up wxPySwigInstance_Check, 

Why no CheckExact, even if it's hand coded?  Maybe the setup is tedious?

> but I believe that wxPoints now obey the PySequence protocol, so I think 
> that the whole wxPySwigInstance_Check branch could be removed. To get 
> that into wxPython you'd probably have to convince Robin that it 
> wouldn't hurt the speed of list of wxPoints unduly.
> Wait... If the above doesn't work, I think I do have a way that might 
> work for speeding the check for a wxPoint. Before the loop starts, get a 
> pointer to wx.core.Point (the class for wxPoints these days) and call it 
> wxPoint_Type. Then just use for the check:
>         o->ob_type == &wxPoint_Type
> Worth a try anyway.
> Unfortunately, I don't have any time to try any of this out right now.
> Chris, are you feeling bored?
> -tim

What's the chance of adding direct support for numarray to wxPython? 
Our PEP reduces the burden on a package to at worst adding 3 include
files for numarray plus the specialized package code.   With those
files,  the package can be compiled by users without numarray and also
run without numarray, but would receive a real boost for people willing
to install numarray since the sequence protocol could be bypassed.


More information about the Numpy-discussion mailing list