[Numpy-discussion] String sort
Charles R Harris
Thu Feb 14 13:29:48 CST 2008
On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet <firstname.lastname@example.org> wrote:
> A Thursday 14 February 2008, Charles R Harris escrigué:
> > On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet <email@example.com>
> > > Looking forward to see the new qsort for strings in NumPy (the
> > > specific version for merge sort is very welcome too!).
> > I could never figure out what the merge sort was good for. I did the
> > indirect version in numarray because I needed a stable sort to
> > implement lexsort, which was my original aim. I just added the direct
> > version for completeness. If you have a use for it, I would love to
> > hear it.
> Well, I must to confess that I've not used merge sorts yet, but I'd like
> to test them in the context of my PSI (Partially Sorted Indexes, see
>  for a white paper on a concrete implementation) work. My hope is
> that, as a merge sort keeps the order of indices of elements that are
> equal (this is what 'stable' means), this would allow better
> compression rates for indices (and hence, less I/O effort to bring the
> indices from disk into memory and ultimately allowing for faster lookup
> speed). This will probably be only important when one have data
> distributions with rather low cardinality, but these scenarios can be
> more frequent/important than one may think.
Well, I take that back a bit. I think mergesort might work best for large
memory mapped arrays because it does sequential accesses, which might be
more disk efficient than random accesses. Then again, a divide and conquer
approach like quicksort eventually becomes localized too. I've never
experimented with really large sorts, they might perform differently than
the sorts that fit in memory.
Insertion sort is supposed to work well for almost sorted sequences, but
that application has always seemed a bit specialized to me. Although I'll
admit to being occasionally tempted to pull the insertion sorts out of
quicksort and mergesort and make them their own (type specific) routines.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion