[Numpy-discussion] String sort
Charles R Harris
Mon Feb 11 08:47:13 CST 2008
On Feb 11, 2008 2:58 AM, Francesc Altet <firstname.lastname@example.org> wrote:
> A Monday 11 February 2008, Charles R Harris escrigué:
> > On Feb 8, 2008 5:29 AM, Francesc Altet <email@example.com> wrote:
> > > Hi,
> > >
> > > I'm a bit confused that the sort method of a string character
> > > doesn't
> > >
> > > allow a mergesort:
> > > >>> s = numpy.empty(10, "S10")
> > > >>> s.sort(kind="merge")
> > >
> > > TypeError: desired sort not supported for this type
> > >
> > > However, by looking at the numpy sources, it seems that the only
> > > implemented method for sorting array strings is "merge" (I presume
> > > because it is stable). So, perhaps the message above should be
> > > fixed.
> > The only available method is the default quicksort. The mergesort is
> > for argsort and was put in for lexsort to use.
> That's good to know. However, I'm curious about where it is the
> specific quicksort implementation for strings/unicode. I've had a look
> at _sortmodule.c.src, but I can only find a quicksort implementation
The default is the C qsort, it is called from PyArray_Sort in
multiarraymodule.c. The type specific sorts, when they exist, are also
called from there through _new_sort. To see what type specific sorts are
registered, look at the end of _sortmodule.c.src. You can write a new sort,
and if it isn't registered it won't be used. So commenting out the
registration is a good way to get back to the default.
> The version you are testing is your own or the one that I provided?
> Here are the timings for my laptop:
They turned out to be essentially identical, except I used len instead of ss
;) I also used inlined functions for the copy and swap as they are safer
with the arguments. Comparison with the macro versions showed no difference
> In : a = np.random.rand(10000).astype('S8')
> In : %timeit a.copy().sort()
> 100 loops, best of 3: 3.82 ms per loop
> In : %timeit newqsort(a.copy())
> 100 loops, best of 3: 3.29 ms per loop
> Here, the difference in performance has been reduced to a mere 15%
> (still favouring newqsort). So, with this, it seems like the
> performance of the original sorting in NumPy only suffers a lot when
> running in old processors (eg. Pentium 4), while the performance is
> reasonable with newer ones (Opteron).
It could also depend on the C library that comes with the compiler. I'm
running on a E6600, but numpy compiles everything for the i386, which might
make a difference also.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion