[Numpy-discussion] slow numpy.clip ?
efiring at hawaii.edu
Tue Dec 19 01:19:21 CST 2006
David Cournapeau wrote:
> Eric Firing wrote:
>> I think my earlier post got lost in the exchange between you and Stefan,
>> so I will reiterate the central point: numpy.clip *is* slow, in that an
>> implementation using putmask is substantially faster:
>> def fastclip(a, vmin, vmax):
>> a = a.copy()
>> putmask(a, a<=vmin, vmin)
>> putmask(a, a>=vmax, vmax)
>> return a
>> Using the equivalent of this in a modification of your benchmark, the
>> time using the native clip on *or* your alternative on my machine was
>> about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems
>> that putmask is quite a bit faster than boolean indexing.
>> Obviously, the function above could be implemented as a method, and a
>> copy kwarg could be used to make the copy optional--often one does not
>> need a copy.
>> It is also clear that it should be possible to make a much faster native
>> clip function that does everything in one pass with no intermediate
>> arrays at all. Whether this is something numpy devels would want to do,
>> and how much effort it would take, are entirely different questions. I
>> looked at the present code in clip (and part of the way through the
>> chain of functions it invokes) and was quite baffled.
> Well, this is something I would be willing to try *if* this is the main
> bottleneck of imshow/show. I am still unsure about the problem, because
> if I change numpy.clip to my function, including a copy, I really get a
> big difference myself:
> val = ma.array(nx.clip(val.filled(vmax), vmin, vmax),
> def myclip(b, m, M):
> a = b.copy()
> a[a<m] = m
> a[a>M] = M
> return a
> val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask)
> By trying the best result, I get 0.888 ms vs 0.784 for a show() call,
> which is already a 10 % improvement, and I get almost a 15 % if I remove
> the copy. I am updating numpy/scipy/mpl on my laptop to see if this is
> specific to the CPU of my workstation (big cache, high frequency clock,
> bi CPU with HT enabled).
Please try the putmask version without the copy on your machines; I
expect it will be quite a bit faster on both machines. The relative
speeds of the versions may differ widely depending on how many values
actually get changed, though.
More information about the Numpy-discussion