[Numpy-discussion] Recarray sort() slowness
Charles R Harris
Sun Mar 4 09:37:45 CST 2007
On 3/4/07, Francesc Altet <firstname.lastname@example.org> wrote:
> I've finally implemented Chuck's suggestion of sorting of a recarray of
> two fields (the first being the actual array to be sorted and the other
> being the array to be reordered following the resulting order for the
> first one). Indeed, this approach saves me an amount of memory
> equivalent to the size of the second column, which is really nice.
> However, I'm afraid that I wouldn't be able to use this approach as it
> is 25x slower (see the attached benchmark; beware! only runs on a Linux
> kernel 2.6!) than regular argsorting of the first field and then doing a
> fancy indexing over the second. If the slowdown would be 2x I still can
> have a chance to use it, but 25x is a no go.
> I'm curious why the recarray.sort(order='fieldN') is so slow, and I'm
> wondering if this can be speed-up in some way or another.
I suspect there are several reasons.
1) It defines a new type with the comparison done on all fields
2) exchanges are done by copying the specified number of bytes
I think Travis was clever to define a new type, it made things easy and very
general, but it wasn't aimed at speed. There might be some optimizations
possible in there that Travis could speak to.
It would be pretty easy to modify argsort itself to do what you want in a
type specific way using a key vector and a vector to be sorted by the keys.
I expect it would be about 1/2 as fast as the normal argsort. Hmmm,
something like keysort(...).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion