[Numpy-discussion] Speeding up wxPython/numarray
tim.hochberg at cox.net
Wed Jun 30 16:02:02 CDT 2004
Todd Miller wrote:
>On Wed, 2004-06-30 at 15:57, Tim Hochberg wrote:
>>After futzing around some more I figured out a way to trick python into
>>using _ndarray_item. I added "type->tp_as_sequence->sq_item =
>>_ndarray_item;" to _ndarray new.
>I'm puzzled why you had to do this. You're using Python-2.3.x, right?
>There's conditionally compiled code which should be doing this
>statically. (At least I thought so.)
By this do you mean the "#if PY_VERSION_HEX >= 0x02030000 " that is
wrapped around _ndarray_item? If so, I believe that it *is* getting
compiled, it's just never getting called.
What I think is happening is that the class NumArray inherits its
sq_item from PyClassObject. In particular, I think it picks up
instance_item from Objects/classobject.c. This appears to be fairly
expensive and, I think, ends up calling tp_as_mapping->mp_subscript.
Thus, _ndarray's sq_item slot never gets called. All of this is pretty
iffy since I don't know this stuff very well and I didn't trace it all
the way through. However, it explains what I've seen thus far.
This is why I ended up using the horrible hack. I'm resetting NumArray's
sq_item to point to _ndarray_item instead of instance_item. I believe
that access at the python level goes through mp_subscrip, so it
shouldn't be affected, and only objects at the C level should notice and
they should just get the faster sq_item. You, will notice that there are
an awful lot of I thinks in the above paragraphs though...
>>I then optimized _ndarray_item (code
>>at end). This halved the execution time of my arbitrary benchmark. This
>>trick may have horrible, unforseen consequences so use at your own risk.
>Right now the sq_item hack strikes me as somewhere between completely
>unnecessary and too scary for me! Maybe if python-dev blessed it.
Yes, very scary. And it occurs to me that it will break subclasses of
NumArray if they override __getitem__. When these subclasses are
accessed from C they will see nd_array's sq_item instead of the
overridden getitem. However, I think I also know how to fix it. But
it does point out that it is very dangerous and there are probably dark
corners of which I'm unaware. Asking on Python-List or PyDev would
probably be a good idea.
The nonscary, but painful, fix would to rewrite NumArray in C.
>This optimization looks good to me.
Unfortunately, I don't think the optimization to sq_item will affect
much since NumArray appears to override it with
>>Finally I commented out the __del__ method numarraycore. This resulted
>>in an additional speedup of 64% for a total speed up of 240%. Still not
>>close to 10x, but a large improvement. However, this is obviously not
>>viable for real use, but it's enough of a speedup that I'll try to see
>>if there's anyway to move the shadow stuff back to tp_dealloc.
>FYI, the issue with tp_dealloc may have to do with which mode Python is
>compiled in, --with-pydebug, or not. One approach which seems like it
>ought to work (just thought of this!) is to add an extra reference in C
>to the NumArray instance __dict__ (from NumArray.__init__ and stashed
>via a new attribute in the PyArrayObject struct) and then DECREF it as
>the last part of the tp_dealloc.
That sounds promising.
>Well, be picking out your beer.
I was only about half right, so I'm not sure I qualify...
More information about the Numpy-discussion