[Numpy-discussion] String sort

Charles R Harris charlesr.harris@gmail....
Sun Feb 10 22:44:11 CST 2008

On Feb 8, 2008 5:29 AM, Francesc Altet <faltet@carabos.com> wrote:

> Hi,
> I'm a bit confused that the sort method of a string character doesn't
> allow a mergesort:
> >>> s = numpy.empty(10, "S10")
> >>> s.sort(kind="merge")
> TypeError: desired sort not supported for this type
> However, by looking at the numpy sources, it seems that the only
> implemented method for sorting array strings is "merge" (I presume
> because it is stable).  So, perhaps the message above should be fixed.

The only available method is the default quicksort. The mergesort is for
argsort and was put in for lexsort to use.

> Also, in the context of my work in indexing, and because of the slowness
> of the current implementation in NumPy, I've ended with an
> implementation of the quicksort method for 1-D array strings.  For
> moderately large arrays, it is about 2.5x-3x faster than the
> (supposedly) mergesort version in NumPy, not only due to the quicksort,
> but also because I've implemented a couple of macros for efficient
> string swapping and copy.  If this is of interest for NumPy developers,
> tell me and I will provide the code.

I've now got a string/ucs4 specific argsort(kind='q'), the string version of
which is about 40% faster than the old default and about 10% faster than the
mergesort version, but the string/ucs4 specific versions of sort aren't yet
fairing as well. I'm timing things with

In [1]: import timeit

In [2]: t = timeit.Timer("np.fromstring(np.empty(10000).tostring(),dtype='|S8').sort(kind='q')","import
numpy as np")

In [3]: t.repeat(3,100)
Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195]

That's with the current sort(kind='q') in svn, which uses the new string
compare function but is otherwise the old default quicksort. The new string
specific version of quicksort I'm testing is actually a bit slower than
that. Both versions correctly sort the array. So I'm going to continue to
experiment a bit until I see what is going on.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080210/8c38c93e/attachment.html 

More information about the Numpy-discussion mailing list