[Numpy-discussion] String sort
Charles R Harris
Thu Feb 14 10:58:03 CST 2008
On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet <email@example.com> wrote:
> A Wednesday 13 February 2008, Charles R Harris escrigué:
> > On Feb 13, 2008 10:56 AM, Francesc Altet <firstname.lastname@example.org> wrote:
> > > Be warned, I'd like to stress out that these are my figures for my
> > > _own laptop_. It would be nice if you can verify all of this with
> > > other achitectures (your Core2 machine seems different enough). I
> > > can run the benchmarks on Windows (installed in the same laptop)
> > > too. Tell me if you are interested on me doing this.
> > Its easy enough to test if you compile from svn, just add your new
> > copy function and change the name in this line:
> > #copy=copy_string, copy_ucs4#
> > to use your function instead of copy_string.
> I've spoken too fast. I've never compiled NumPy on Windows before, and
> had problems when trying to compile it using MSVC 7.1 and a recent copy
> of the repository. Well, in any case, I've exercised the Opteron
> platform, using gcc 4.1.3 (i.e. the one that can optimize newqsort
> correctly), and this brings new light to our study.
> From the plot (attached), it can be drawn the next conclusions:
> 1) copy_string2 (the combination of manual copy and memcpy) is not
> better than memcpy for *any* value of the string length in our Opteron
> platform. Also, the improvements with Pentium4 was not that big
> (<20%). In consequence, I'd choose to always use memcpy and discard
Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy
the default for all string lengths because I've read that memcpy was much
improved in later gcc (>= 4.1 ?), but known slow in older versions. So
perhaps in a year or two when the newer compilers are more widespread would
be a better time to make the change. The switch at the 16 char length
shouldn't make that much difference in practice. I'll put a comment in the
source so that the thought won't get lost. BTW, using copy_string2 much
improved the performance of the string mergesort where a lot of data needs
to be copied to the work array. It's now half as fast as quicksort instead
of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of
all. Slow but safe.
> 2) Curiously enough, the indirect sort in Opterons is *always* faster
> than newqsort+memcpy. For large values of string lengths (> 256), the
> speed-up can be up to 3x, which is a lot. And I'd say that this keeps
> true also for most modern processors (read Core2, Barcelona). For older
> processors (Pentium4), the indirect method can be slower than direct
> plot for small lengths, but by a very few extent (<10%).
The new indirect quicksort for strings is faster than the old qsort based
default, so perhaps that is also making a difference.
> Conclusion 2 makes me wonder if it wouldn't be useful the introduction
> of a new flag in sort, like:
> `indirect` - Use the indirect method for sorting. This requires more
> memory (4/8 bytes per array element), but for sorting arrays of strings
> it is almost always faster than the direct approach (default). Beware:
> this is not the case when using numerical values, where the use of this
> method for sorting is not recommendable.
I'm more inclined to leave this to the user. I have a todo to add a function
to numpy that makes it easier to use the argsort output to sort
multidimensional arrays, I'll name it argtake or some such and it will use
the argsort output along with an axis argument. It won't be quite as memory
efficient for multidimensional arrays, but it should work about the same in
the 1D case.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion