[Numpy-discussion] Simple problem. Is it possible without a loop?
"V. Armando Solé"
sole@esrf...
Thu Jun 10 02:00:38 CDT 2010
Hi Bruce,
In the context of the actual problem, I have a long series of
non-equidistant and irregularly spaced float numbers and I have to take
values between given limits with the constraint of keeping a minimal
separation. Option 2 just misses the first value of the input array if
it is within the limits, but for my purposes (perform a fit with a given
function) is acceptable. I said "this seems to be quite close to what I
need" because I do not like missing the first point because that gives
equivalent but not exactly the same solutions.
By the way, thanks for the % hint. That should make the .astype(int)
disappear and make the expression look nicer.
Armando
Bruce Southey wrote:
> On 06/09/2010 10:24 AM, Vicente Sole wrote:
>>>> ? Well a loop or list comparison seems like a good choice to me. It is
>>>> much more obvious at the expense of two LOCs. Did you profile the two
>>>> possibilities and are they actually performance-critical?
>>>>
>>>> cheers
>>>>
>>>>
>> The second is between 8 and ten times faster on my machine.
>>
>> import numpy
>> import time
>> x0 = numpy.arange(10000.)
>> niter = 2000 # I expect between 10000 and 100000
>>
>>
>> def option1(x, delta=0.2):
>> y = [x[0]]
>> for value in x:
>> if (value - y[-1]) > delta:
>> y.append(value)
>> return numpy.array(y)
>>
>> def option2(x, delta=0.2):
>> y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int)
>> i1 = numpy.nonzero(y[1:]> y[:-1])
>> return numpy.take(x, i1)
>>
>>
>> t0 = time.time()
>> for i in range(niter):
>> t = option1(x0)
>> print "Elapsed = ", time.time() - t0
>> t0 = time.time()
>> for i in range(niter):
>> t = option2(x0)
>> print "Elapsed = ", time.time() - t0
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> For integer arguments for delta, I don't see any different between
> using option1 and using the '%' operator.
> >>> (x0[(x0*10)%2==0]-option1(x0)).sum()
> 0.0
>
> Also option2 gives a different result than option1 so these are not
> equivalent functions. You can see that from the shapes
> >>> option2(x0).shape
> (1, 9998)
> >>> option1(x0).shape
> (10000,)
> >>> ((option1(x0)[:9998])-option2(x0)).sum()
> 0.0
>
> So, allowing for shape difference, option2 is the same for most of
> output from option1 but it is still smaller than option1.
>
> Probably the main reason for the speed difference is that option2 is
> virtually pure numpy (and hence done in C) and option1 is using a lot
> of array lookups that are always slow. So keep it in numpy as most as
> possible.
>
>
> Bruce
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list