[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Thu Jan 11 09:49:27 CST 2007
Francesc Altet wrote:
> A Dimecres 10 Gener 2007 22:49, Stefan van der Walt escrigué:
>> On Wed, Jan 10, 2007 at 08:28:14PM +0100, Francesc Altet wrote:
>>> El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
>>>
>>> escriure:
>>> time (putmask)--> 1.38
>>> time (where)--> 2.713
>>> time (numexpr where)--> 1.291
>>> time (fancy+assign)--> 0.967
>>> time (numexpr clip)--> 0.596
>>>
>>> It is interesting to see there how fancy-indexing + assignation is quite
>>> more efficient than putmask.
>> Not on my machine:
>>
>> time (putmask)--> 0.181
>> time (where)--> 0.783
>> time (numexpr where)--> 0.26
>> time (fancy+assign)--> 0.202
>
> Yeah, a lot of difference indeed. Just for reference, my results above were
> done using a Duron (an Athlon but with only 128 KB of secondary cache) at 0.9
> GHz. Now, using my laptop (Intel 4 @ 2 GHz, 512 KB of secondary cache), I
> get:
>
> time (putmask)--> 0.244
> time (where)--> 2.111
> time (numexpr where)--> 0.427
> time (fancy+assign)--> 0.316
> time (numexpr clip)--> 0.184
>
> so, on my laptop fancy+assign is way slower than putmask. It should be noted
> also that the implementation of clip in numexpr (i.e. in pure C) is not that
> much faster than putmask (just a 30%); so perhaps it is not so necessary to
> come up with a pure C implementation for clip (or at least, on Intel P4
> machines!).
>
> In any case, it is really shocking seeing how differently can perform the
> several CPU architectures on this apparently simple problem.
I am not sure it is such a simple problem: it involves massive branching.
I have never taken a look a numexpr, but the idea seems really
interesting, I will take a look at it when I will have some time.
Anyway, I've just finished and tested a pure C implementation of the
clip function. As it is, it should be able to replace PyArray_Clip calls
by PyArray_FastClip in numpy/core/multiarray.c. The idea is that for
'easy' cases, it uses a trivial but fast implementation; for all other
cases, it uses the old implementation for now. By easy cases, I mean
scalar min and max, for non-complex number with native endianness (from
npy_bool to npy_longdouble), which should cover most usages.
There are still some things I am unsure:
- the original clip is supposed to work with complex numbers, but I
am not sure about the semantics in this case.
- If you have a float32 input, but float64 min/max values, the
original clip does not upcast the input. If you have integer input but
floating point min/max, the original clip fails. Is this the wanted
behaviour ? My implementation upcasts whenever possible; but then, I am
not sure how to handle the cases where no copy is asked (which I am not
handling myself for now).
As for now, when PyArray_FastClip uses a fast implementation, it is
roughly 5x faster for float32 and float64 input. I expect a double speed
once the no copy option is implemented (again, for easy cases).
I attached blop.c which implements the fast clip in a module, the
clip_imp.c which implements the clipping for all native types (it is
generated by autogen because I wanted to avoid depending on
numpy.distutils for development), a Makefile and a test file which also
profile the clip function with float32 inputs.
Does it look Ok to other so that it can be commited to numpy (once the
two above problems are solved, of course, to keep the same behaviour
than PyArray_Clip) ?
cheers,
David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blop.c
Type: text/x-csrc
Size: 9544 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070112/0529cb14/attachment-0002.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clip_imp.c
Type: text/x-csrc
Size: 11219 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070112/0529cb14/attachment-0003.bin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Makefile
Url: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070112/0529cb14/attachment-0001.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_take.py
Type: text/x-python
Size: 9896 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070112/0529cb14/attachment-0001.py
More information about the Numpy-discussion
mailing list