[off topic] Re: [Numpy-discussion] numarray speed - PySequence_GetItem
jmiller at stsci.edu
Tue Jun 29 09:05:49 CDT 2004
On Mon, 2004-06-28 at 20:45, Tim Hochberg wrote:
> Todd Miller wrote:
> >On Mon, 2004-06-28 at 17:14, Sebastian Haase wrote:
> >> [SNIP]
> >>My original question was just this: Does anyone know why numarray is maybe 10
> >>times slower that Numeric with that particular code segment
> >>(PySequence_GetItem) ?
> >Well, the short answer is probably: no.
> >Looking at the numarray sequence protocol benchmarks in
> >Examples/bench.py, and looking at what wxPython is probably doing
> >(fetching a 1x2 element array from an Nx2 and then fetching 2 numerical
> >values from that)... I can't fully nail it down. My benchmarks show
> >that numarray is 4x slower for fetching the two element array but only
> >1.1x slower for the two numbers; that makes me expect at most 4x
> >Noticing the 50k __del__ calls in your profile, I eliminated __del__
> >(breaking numarray) to see if that was the problem; the ratios changed
> >to 2.5x slower and 0.9x slower (actually faster) respectively.
> This reminds me, when profiling bits and pieces of my code I've often
> noticed that __del__ chews up a large chunk of time. Is there any
> prospect of this being knocked down at all, or is it inherent in the
> structure of numarray?
__del__ is IMHO the elegant way to do numarray's shadowing of
"misbehaved arrays". misbehaved arrays are ones which don't meet the
requirements of a particular C-function, but generally that means
noncontiguous, byte-swapped, misaligned, or of the wrong type; it also
can mean some other sequence type like a list or tuple. I think using
the destructor is "necessary" for maintaining Numeric compatibility in C
because you can generally count on arrays being DECREF'd, but obviously
you couldn't count on some new API call being called.
__del__ used to be implemented in C as tp_dealloc, but I was running
into segfaults which I tracked down to the order in which a new style
class instance is torn down. The purpose of __del__ is to copy the
contents of a well behaved working array (the shadow) back onto the
original mis-behaved array. The problem was that, because of the
numarray class hierarchy, critical pieces of the shadow (the instance
dictionary) had already been torn down before the tp_dealloc was
called. The only way I could think of to fix it was to move the
destructor farther down in the class hierarchy, i.e. from
_numarray.tp_dealloc to NumArray.__del__ in Python.
If anyone can think of a way to get rid of __del__, I'm all for it.
> >The large number of "Check" routines preceding the numarray path (I
> >count 7 looking at my copy of wxPython) has me a little concerned. I
> >think those checks are more expensive for numarray because it is a new
> >style class.
> If that's really a significant slowdown, the culprit's are likely
> PyTuple_Check, PyList_Check and wxPySwigInstance_Check.
> PySequence_Check appears to just be pointer compares and shouldn't
> invoke any new style class machinery. PySequence_Length calls sq_length,
> but appears also to not involve new class machinery. Of these, I think
> PyTuple_Check and PyList_Check could be replaced with PyTuple_CheckExact
> and PyList_CheckExact. This would slow down people using subclasses of
> tuple/list, but speed everyone else up since the latter pair of
> functions are just pointer compares. I think the former group is a very
> small minority, possibly nonexistent, minority, so this would probably
> be acceptable.
> I don't see any easy/obvious ways to speed up wxPySwigInstance_Check,
Why no CheckExact, even if it's hand coded? Maybe the setup is tedious?
> but I believe that wxPoints now obey the PySequence protocol, so I think
> that the whole wxPySwigInstance_Check branch could be removed. To get
> that into wxPython you'd probably have to convince Robin that it
> wouldn't hurt the speed of list of wxPoints unduly.
> Wait... If the above doesn't work, I think I do have a way that might
> work for speeding the check for a wxPoint. Before the loop starts, get a
> pointer to wx.core.Point (the class for wxPoints these days) and call it
> wxPoint_Type. Then just use for the check:
> o->ob_type == &wxPoint_Type
> Worth a try anyway.
> Unfortunately, I don't have any time to try any of this out right now.
> Chris, are you feeling bored?
What's the chance of adding direct support for numarray to wxPython?
Our PEP reduces the burden on a package to at worst adding 3 include
files for numarray plus the specialized package code. With those
files, the package can be compiled by users without numarray and also
run without numarray, but would receive a real boost for people willing
to install numarray since the sequence protocol could be bypassed.
More information about the Numpy-discussion