[Numpy-discussion] Speeding up wxPython/numarray

Todd Miller jmiller at stsci.edu
Wed Jun 30 14:48:09 CDT 2004


On Wed, 2004-06-30 at 15:57, Tim Hochberg wrote:
> I spend some time seeing what I could do in the way of speeding up 
> wxPoint_LIST_helper by tweaking the numarray code. My first suspect was 
> _universalIndexing by way of _ndarray_item. However, due to some 
> new-style machinations, _ndarray_item was never getting called. Instead, 
> _ndarray_subscript was being called. So, I added a special case to 
> _ndarray_subscript. This sped things up by 50% or so (I don't recall 
> exactly). The code for that is at the end of this message; it's not 
> gauranteed to be 100% correct; it's all experimental.
> 
> After futzing around some more I figured out a way to trick python into 
> using _ndarray_item. I added "type->tp_as_sequence->sq_item = 
> _ndarray_item;" to _ndarray new.  

I'm puzzled why you had to do this.  You're using Python-2.3.x,  right? 
There's conditionally compiled code which should be doing this
statically.  (At least I thought so.)

> I then optimized _ndarray_item (code 
> at end). This halved the execution time of my arbitrary benchmark. This 
> trick may have horrible, unforseen consequences so use at your own risk.

Right now the sq_item hack strikes me as somewhere between completely
unnecessary and too scary for me!  Maybe if python-dev blessed it.

This optimization looks good to me.

> Finally I commented out the __del__  method numarraycore. This resulted 
> in an additional speedup of 64% for a total speed up of 240%. Still not 
> close to 10x, but a large improvement. However, this is obviously not 
> viable for real use, but it's enough of a speedup that I'll try to see 
> if there's anyway to move the shadow stuff back to tp_dealloc.

FYI, the issue with tp_dealloc may have to do with which mode Python is
compiled in, --with-pydebug, or not.  One approach which seems like it
ought to work (just thought of this!) is to add an extra reference in C
to the NumArray instance __dict__ (from NumArray.__init__ and stashed
via a new attribute in the PyArrayObject struct) and then DECREF it as
the last part of the tp_dealloc.  

> In summary:
> 
> Version                       Time   Rel Speedup   Abs Speedup
> Stock                            0.398         ----                  ----
> _naarray_item mod      0.192          107%               107%
> del __del__                  0.117          64%                 240%
> 
> There were a couple of other things I tried that resulted in additional 
> small speedups, but the tactics I used were too horrible to reproduce 
> here. The main one of interest is that all of the calls to 
> NA_updateDataPtr seem to burn some time. However, I don't have any idea 
> what one could do about that.

Francesc Alted had the same comment about NA_updateDataPtr a while ago.
I tried to optimize it then but didn't get anywhere.  NA_updateDataPtr()
should be called at most once per extension function (more is
unnecessary but not harmful) but needs to be called at least once as a
consequence of the way the buffer protocol doesn't give locked
pointers.  

> That's all for now.
> 
> -tim

Well, be picking out your beer.

Todd

> 
> 
> 
> static PyObject*
> _ndarray_subscript(PyArrayObject* self, PyObject* key)
>    
> {
>     PyObject *result;
> #ifdef TAH
>         if (PyInt_CheckExact(key)) {
>             long ikey = PyInt_AsLong(key);
>             long offset;
>             if (NA_getByteOffset(self, 1, &ikey, &offset) < 0)
>                 return NULL;
>             if (!NA_updateDataPtr(self))
>                 return NULL;
>             return _simpleIndexingCore(self, offset, 1, Py_None);
>         }
> #endif
> #if _PYTHON_CALLBACKS
>     result = PyObject_CallMethod(
>         (PyObject *) self, "_universalIndexing", "(OO)", key, Py_None);
> #else
>     result = _universalIndexing(self, key, Py_None);
> #endif
>     return result;
> }
> 
> 
> 
> static PyObject *
> _ndarray_item(PyArrayObject *self, int i)
> {
> #ifdef TAH
>     long offset;
>     if (NA_getByteOffset(self, 1, &i, &offset) < 0)
>         return NULL;
>     if (!NA_updateDataPtr(self))
>         return NULL;
>     return _simpleIndexingCore(self, offset, 1, Py_None);
> #else
>     PyObject *result;
>     PyObject *key = PyInt_FromLong(i);
>     if (!key) return NULL;
>     result = _universalIndexing(self, key, Py_None);
>     Py_DECREF(key);
>     return result;
> #endif
> }
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email sponsored by Black Hat Briefings & Training.
> Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
> digital self defense, top technical experts, no vendor pitches, 
> unmatched networking opportunities. Visit www.blackhat.com
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- 





More information about the Numpy-discussion mailing list