[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)
david at ar.media.kyoto-u.ac.jp
Thu Jan 11 09:58:28 CST 2007
David Cournapeau wrote:
> Francesc Altet wrote:
>> A Dimecres 10 Gener 2007 22:49, Stefan van der Walt escrigué:
>>> On Wed, Jan 10, 2007 at 08:28:14PM +0100, Francesc Altet wrote:
>>>> El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
>>>> time (putmask)--> 1.38
>>>> time (where)--> 2.713
>>>> time (numexpr where)--> 1.291
>>>> time (fancy+assign)--> 0.967
>>>> time (numexpr clip)--> 0.596
>>>> It is interesting to see there how fancy-indexing + assignation is
>>>> more efficient than putmask.
>>> Not on my machine:
>>> time (putmask)--> 0.181
>>> time (where)--> 0.783
>>> time (numexpr where)--> 0.26
>>> time (fancy+assign)--> 0.202
>> Yeah, a lot of difference indeed. Just for reference, my results
>> above were done using a Duron (an Athlon but with only 128 KB of
>> secondary cache) at 0.9 GHz. Now, using my laptop (Intel 4 @ 2 GHz,
>> 512 KB of secondary cache), I get:
>> time (putmask)--> 0.244
>> time (where)--> 2.111
>> time (numexpr where)--> 0.427
>> time (fancy+assign)--> 0.316
>> time (numexpr clip)--> 0.184
>> so, on my laptop fancy+assign is way slower than putmask. It should
>> be noted also that the implementation of clip in numexpr (i.e. in
>> pure C) is not that much faster than putmask (just a 30%); so perhaps
>> it is not so necessary to come up with a pure C implementation for
>> clip (or at least, on Intel P4 machines!).
>> In any case, it is really shocking seeing how differently can perform
>> the several CPU architectures on this apparently simple problem.
> I am not sure it is such a simple problem: it involves massive branching.
To be more precise, you can do clipping without branching, but then the
clipping is highly type and machine dependent (using bit mask and other
tricks). It may worth the trouble for double, float and int, dunno.
More information about the Numpy-discussion