[Numpy-discussion] String sort
Francesc Altet
faltet@carabos....
Sat Feb 9 16:19:42 CST 2008
A Saturday 09 February 2008, Charles R Harris escrigué:
> I'm also thinking that at some point it becomes more efficient to do
> a indirect sort followed by take than to move all those big strings
> around. But I guess we won't know where that point is until we have
> both versions available.
I've done some experiments in that matter too. They are saying that,
with the current mergesort in NumPy, an indirect sort followed by take
performs similarly to direct sort for small string lengths (<=16), but
indirect sort starts to win afterwards.
The version with quicksort and optimized sSWAP should be between 2x and
3x times faster than current mergesort implementation, so the advantage
for direct sort could grow up to somewhere between 50 and 100. A nice
idea could be doing some more toughful experiments in order to find the
point where an indirect sort followed by a take would be more efficient
and automatically select this method beyond that point.
However, this has the drawback that you have to use additional memory
for keeping the indices in the indirect method. Of course, when
strings are large, those indices should take a rather negligible space
compared with strings itself. In any case, in some situations where
space is critical, this can still be important. I don't know, but my
opinion is that we shouldn't take too agressive optimizations for that
matter. My vote is to document this possibility in the docstrings, so
that the user wanting for extreme performance can use this approach if
he wants to. Still, for string sizes greater than, say, 1000, well, an
automatic selection of the indirect method is very tempting indeed.
Cheers,
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the Numpy-discussion
mailing list