[Numpy-discussion] Strange numpy behaviour (bug?)
Sturla Molden
sturla@molden...
Tue Jan 17 23:26:10 CST 2012
While "playing" with a point-in-polygon test, I have discovered some a
failure mode that I cannot make sence of.
The algorithm is vectorized for NumPy from a C and Python implementation
I found on the net (see links below). It is written to process a large
dataset in chunks. I'm rather happy with it, it can test 100,000 x,y
points against a non-convex pentagon in just 50 ms.
Anyway, here is something very strange (or at least I think so):
If I use a small chunk size, it sometimes fails. I know I shouldn't
blame it on NumPy, beacuse it is by all likelood my mistake. But it does
not make any sence, as the parameter should not affect the computation.
Observed behavior:
1. Processing the whole dataset in one big chunk always works.
2. Processing the dataset in big chunks (e.g. 8192 points) always works.
3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.
4. Processing the dataset element-wise always work.
5. The scalar version behaves like the numpy version: fine for large
chunks, sometimes it fails for small. That is, when list comprehensions
is used for chunks. Big list comprehensions always work, small ones
might fail.
It looks like the numerical robstness of the alorithm depends on a
parameter that has nothing to do with the algorithm at all. For example
in (5), we might think that calling a function from a nested loop makes
it fail, depending on the length of the inner loop. But calling it from
a single loop works just fine.
???
So I wonder:
Could there be a bug in numpy that only shows up only when taking a huge
number of short slices?
I don't know... But try it if you care.
In the function "inpolygon", change the call that says __chunk(n,8192)
to e.g. __chunk(n,32) to see it fail (or at least it does on my
computer, running Enthought 7.2-1 on Win64).
Regards,
Sturla Molden
def __inpolygon_scalar(x,y,poly):
# Source code taken from:
# http://paulbourke.net/geometry/insidepoly
# http://www.ariel.com.au/a/python-point-int-poly.html
n = len(poly)
inside = False
p1x,p1y = poly[0]
xinters = 0
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xinters:
inside = not inside
p1x,p1y = p2x,p2y
return inside
# the rest is (C) Sturla Molden, 2012
# University of Oslo
def __inpolygon_numpy(x,y,poly):
""" numpy vectorized version """
n = len(poly)
inside = np.zeros(x.shape[0], dtype=bool)
xinters = np.zeros(x.shape[0], dtype=float)
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
mask = (y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <=
max(p1x,p2x))
if p1y != p2y:
xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x:
inside[mask] = ~inside[mask]
else:
mask2 = x[mask] <= xinters[mask]
idx, = np.where(mask)
idx2, = np.where(mask2)
idx = idx[idx2]
inside[idx] = ~inside[idx]
p1x,p1y = p2x,p2y
return inside
def __chunk(n,size):
x = range(0,n,size)
if (n%size):
x.append(n)
return zip(x[:-1],x[1:])
def inpolygon(x, y, poly):
"""
point-in-polygon test
x and y are numpy arrays
polygon is a list of (x,y) vertex tuples
"""
if np.isscalar(x) and np.isscalar(y):
return __inpolygon_scalar(x, y, poly)
else:
x = np.asarray(x)
y = np.asarray(y)
n = x.shape[0]
z = np.zeros(n, dtype=bool)
for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
if j-i > 1:
z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
else:
z[i] = __inpolygon_scalar(x[i], y[i], poly)
return z
if __name__ == "__main__":
import matplotlib
import matplotlib.pyplot as plt
from time import clock
n = 100000
polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
xp = [x for x,y in polygon]
yp = [y for x,y in polygon]
x = np.random.rand(n)
y = np.random.rand(n)
t0 = clock()
inside = inpolygon(x,y,polygon)
t1 = clock()
print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
plt.figure()
plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
plt.axis([0,1,0,1])
plt.show()
More information about the NumPy-Discussion
mailing list