[Numpy-discussion] Fast function application on list of 2D points?

Eric LEBIGOT Eric.Lebigot@normalesup....
Tue Jan 13 03:23:20 CST 2009


Thank you so much for the suggestion, Paulo!  Selecting 2D points in a list 
by creating an array 'mask' of booleans and then using arr[mask, :] is indeed 
really fast compared to using numpy.apply_along_axis(), in my case (simple 
"larger than" tests on individual coordinates).

I had not realized that you could do "arr[mask, :]": this works great!

EOL

PS: here are the speed tests I've done on the selection of 2D points from a 
list, with the following results:

filter0: 107.2 s
filter1: 0.3 s
filter2: 9.7 s
filter3: 0.6 s

obtained with:

#!/usr/bin/env python

import numpy

def filter0(points):
   """
   Returns only those points that match the filter.
   """
   def filter(p):
     return (p[0] > 0.5) and (p[1] < 0.5)
   return points[numpy.apply_along_axis(filter, axis = 1, arr = points)]

def filter1(points):
   """
   Returns only those points that match the filter.
   """

   mask = (points[:, 0] > 0.5) & (points[:, 1] < 0.5)
   return points[mask, :]

def filter2(points):
   """
   Returns only those points that match the filter.
   """
   return numpy.array([p for p in points if ((p[0] > 0.5) and p[1] < 0.5)])

def filter3(points):
   """
   Returns only those points that match the filter.
   """

   mask = (points[:, 0] > 0.5)
   points = points[mask, :]
   mask = points[:, 1] < 0.5
   return points[mask, :]

if __name__ == '__main__':

   import timeit


   # We generate many random points:
   NUM_PTS = 1000000
   points = numpy.random.random((NUM_PTS, 2))

   # We make sure that all the filters give the same result:
   #print "Initial points:"
   #print points
   #print "Filtered points:"
   #print filter0(points)
   #print filter1(points)
   #print filter2(points)
   #print filter3(points)


   for filter_num in range(4):
     func_name = "filter%d" % filter_num
     t = timeit.Timer("%s(points)" % func_name,
                      "from __main__ import %s, points" % func_name)
     print "%s: %.1f s" % (func_name, t.timeit(number = 3))

> Date: Mon, 12 Jan 2009 11:33:08 -0300
> From: "Paulo J. S. Silva" <pjssilva@ime.usp.br>
> Subject: Re: [Numpy-discussion] Fast function application on list of
> 	2D points?
> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Message-ID: <1231770788.6170.3.camel@trinity>
> Content-Type: text/plain; charset="UTF-8"
>
> Why you don't create a mask to select only the points in array that
> satisfies the condition on x and y coordinates. For example the code
> below applies filter only to the values that have x coordinate bigger
> than 0.7 and y coordinate smaller than 0.3:
>
>    mask = numpy.logical_and(points[:,0] > 0.7, points[:,1] < 0.3)
>    points = numpy.apply_along_axis(filter, axis = 1, arr = points[mask,:])
>
> best,
>
> Paulo
>
> Em Seg, 2009-01-12 ?s 15:21 +0100, Eric LEBIGOT escreveu:
>> Hello,
>>
>> What is the fastest way of applying a function on a list of 2D points?  More
>> specifically, I have a list of 2D points, and some do not meet some criteria
>> and must be rejected.  Even more specifically, the filter only lets through
>> points whose x coordinate satisfies some condition, _and_ whose y coordinates
>> satisfies another condition (maybe is there room for optimization, here?).
>>
>> Currently, I use
>>
>>    points = numpy.apply_along_axis(filter, axis = 1, arr = points)
>>
>> but this creates a bottleneck in my program (array arr may contains 1 million
>> points, for instance).
>>
>> Is there anything that could be faster?
>>
>> Any suggestion would be much appreciated!
>>
>> EOL


More information about the Numpy-discussion mailing list