[Numpy-discussion] slow numpy.clip ?

David Cournapeau david at ar.media.kyoto-u.ac.jp
Mon Dec 18 23:10:29 CST 2006

Eric Firing wrote:
> David,
> I think my earlier post got lost in the exchange between you and Stefan, 
> so I will reiterate the central point: numpy.clip *is* slow, in that an 
> implementation using putmask is substantially faster:
> def fastclip(a, vmin, vmax):
> 	a = a.copy()
> 	putmask(a, a<=vmin, vmin)
> 	putmask(a, a>=vmax, vmax)
> 	return a
> Using the equivalent of this in a modification of your benchmark, the 
> time using the native clip on *or* your alternative on my machine was 
> about 2.3 s, versus 1.5 s for the putmask-based equivalent.  It seems 
> that putmask is quite a bit faster than boolean indexing.
> Obviously, the function above could be implemented as a method, and a 
> copy kwarg could be used to make the copy optional--often one does not 
> need a copy.
> It is also clear that it should be possible to make a much faster native 
> clip function that does everything in one pass with no intermediate 
> arrays at all.  Whether this is something numpy devels would want to do, 
> and how much effort it would take, are entirely different questions.  I 
> looked at the present code in clip (and part of the way through the 
> chain of functions it invokes) and was quite baffled.
Well, this is something I would be willing to try *if* this is the main 
bottleneck of imshow/show. I am still unsure about the problem, because 
if I change numpy.clip to my function, including a copy, I really get a 
big difference myself:

val = ma.array(nx.clip(val.filled(vmax), vmin, vmax),


def myclip(b, m, M):
    a       = b.copy()
    a[a<m]  = m
    a[a>M]  = M
    return a
val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask)

By trying the best result, I get 0.888 ms vs 0.784 for a show() call, 
which is already a 10 % improvement, and I get almost a 15 % if I remove 
the copy. I am updating numpy/scipy/mpl on my laptop to see if this is 
specific to the CPU of my workstation (big cache, high frequency clock, 
bi CPU with HT enabled).

I would really like to see the imshow/show calls goes in the range of a 
few hundred ms; for interactive plotting, this really change a lot in my 



