[Numpy-discussion] Simple problem. Is it possible without a loop?

"V. Armando Solé" sole@esrf...
Thu Jun 10 02:00:38 CDT 2010


Hi Bruce,

In the context of the actual problem, I have a long series of 
non-equidistant and irregularly spaced float numbers and I have to take 
values between given limits with the constraint of keeping a minimal 
separation. Option 2 just misses the first value of the input array if 
it is within the limits, but for my purposes (perform a fit with a given 
function) is acceptable. I said "this seems to be quite close to what I 
need" because I do not like missing the first point because that gives 
equivalent but not exactly the same solutions.

By the way, thanks for the % hint. That should make the .astype(int) 
disappear and make the expression look nicer.

Armando

Bruce Southey wrote:
> On 06/09/2010 10:24 AM, Vicente Sole wrote:
>>>> ? Well a loop or list comparison seems like a good choice to me. It is
>>>> much more obvious at the expense of two LOCs. Did you profile the two
>>>> possibilities and are they actually performance-critical?
>>>>
>>>> cheers
>>>>
>>>>       
>> The second is between 8 and ten times faster on my machine.
>>
>> import numpy
>> import time
>> x0 = numpy.arange(10000.)
>> niter = 2000   # I expect between 10000 and 100000
>>
>>
>> def option1(x, delta=0.2):
>>      y = [x[0]]
>>      for value in x:
>>          if (value - y[-1]) > delta:
>>              y.append(value)
>>      return numpy.array(y)
>>
>> def option2(x, delta=0.2):
>>      y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int)
>>      i1 = numpy.nonzero(y[1:]>  y[:-1])
>>      return numpy.take(x, i1)
>>
>>
>> t0 = time.time()
>> for i in range(niter):
>>      t = option1(x0)
>> print "Elapsed = ", time.time() - t0
>> t0 = time.time()
>> for i in range(niter):
>>      t = option2(x0)
>> print "Elapsed = ", time.time() - t0
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>   
> For integer arguments for delta, I don't see any different between 
> using option1 and using the '%' operator.
> >>> (x0[(x0*10)%2==0]-option1(x0)).sum()
> 0.0
>
> Also option2 gives a different result than option1 so these are not 
> equivalent functions. You can see that from the shapes
> >>> option2(x0).shape
> (1, 9998)
> >>> option1(x0).shape
> (10000,)
> >>> ((option1(x0)[:9998])-option2(x0)).sum()
> 0.0
>
> So, allowing for shape difference, option2 is the same for most of 
> output from option1 but it is still smaller than option1.
>
> Probably the main reason for the speed difference is that option2 is 
> virtually pure numpy (and hence done in C) and option1 is using a lot 
> of array lookups that are always slow. So keep it in numpy as most as 
> possible.
>
>
> Bruce
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   




More information about the NumPy-Discussion mailing list