[Numpy-discussion] Strange numpy behaviour (bug?)

Sturla Molden sturla@molden...
Tue Jan 17 23:57:38 CST 2012


Never mind this, it was my own mistake as I expected :-)

def __chunk(n,size):
     x = range(0,n,size)
     x.append(n)
     return zip(x[:-1],x[1:])

makes it a lot better :)

Sturla



Den 18.01.2012 06:26, skrev Sturla Molden:
> While "playing" with a point-in-polygon test, I have discovered some a
> failure mode that I cannot make sence of.
>
> The algorithm is vectorized for NumPy from a C and Python implementation
> I found on the net (see links below). It is written to process a large
> dataset in chunks. I'm rather happy with it, it can test 100,000 x,y
> points against a non-convex pentagon in just 50 ms.
>
> Anyway, here is something very strange (or at least I think so):
>
> If I use a small chunk size, it sometimes fails. I know I shouldn't
> blame it on NumPy, beacuse it is by all likelood my mistake. But it does
> not make any sence, as the parameter should not affect the computation.
>
> Observed behavior:
>
> 1. Processing the whole dataset in one big chunk always works.
>
> 2. Processing the dataset in big chunks (e.g. 8192 points) always works.
>
> 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.
>
> 4. Processing the dataset element-wise always work.
>
> 5. The scalar version behaves like the numpy version: fine for large
> chunks, sometimes it fails for small. That is, when list comprehensions
> is used for chunks. Big list comprehensions always work, small ones
> might fail.
>
> It looks like the numerical robstness of the alorithm depends on a
> parameter that has nothing to do with the algorithm at all. For example
> in (5), we might think that calling a function from a nested loop makes
> it fail, depending on the length of the inner loop. But calling it from
> a single loop works just fine.
>
> ???
>
> So I wonder:
>
> Could there be a bug in numpy that only shows up only when taking a huge
> number of short slices?
>
> I don't know... But try it if you care.
>
> In the function "inpolygon", change the call that says __chunk(n,8192)
> to e.g. __chunk(n,32) to see it fail (or at least it does on my
> computer, running Enthought 7.2-1 on Win64).
>
>
> Regards,
> Sturla Molden
>
>
>
>
>
> def __inpolygon_scalar(x,y,poly):
>
>       # Source code taken from:
>       # http://paulbourke.net/geometry/insidepoly
>       # http://www.ariel.com.au/a/python-point-int-poly.html
>
>       n = len(poly)
>       inside = False
>       p1x,p1y = poly[0]
>       xinters = 0
>       for i in range(n+1):
>           p2x,p2y = poly[i % n]
>           if y>  min(p1y,p2y):
>               if y<= max(p1y,p2y):
>                   if x<= max(p1x,p2x):
>                       if p1y != p2y:
>                           xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
>                       if p1x == p2x or x<= xinters:
>                           inside = not inside
>           p1x,p1y = p2x,p2y
>       return inside
>
>
> # the rest is (C) Sturla Molden, 2012
> # University of Oslo
>
> def __inpolygon_numpy(x,y,poly):
>       """ numpy vectorized version """
>       n = len(poly)
>       inside = np.zeros(x.shape[0], dtype=bool)
>       xinters = np.zeros(x.shape[0], dtype=float)
>       p1x,p1y = poly[0]
>       for i in range(n+1):
>           p2x,p2y = poly[i % n]
>           mask = (y>  min(p1y,p2y))&  (y<= max(p1y,p2y))&  (x<=
> max(p1x,p2x))
>           if p1y != p2y:
>               xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
>           if p1x == p2x:
>               inside[mask] = ~inside[mask]
>           else:
>               mask2 = x[mask]<= xinters[mask]
>               idx, = np.where(mask)
>               idx2, = np.where(mask2)
>               idx = idx[idx2]
>               inside[idx] = ~inside[idx]
>           p1x,p1y = p2x,p2y
>       return inside
>
> def __chunk(n,size):
>       x = range(0,n,size)
>       if (n%size):
>           x.append(n)
>       return zip(x[:-1],x[1:])
>
> def inpolygon(x, y, poly):
>       """
>       point-in-polygon test
>       x and y are numpy arrays
>       polygon is a list of (x,y) vertex tuples
>       """
>       if np.isscalar(x) and np.isscalar(y):
>           return __inpolygon_scalar(x, y, poly)
>       else:
>           x = np.asarray(x)
>           y = np.asarray(y)
>           n = x.shape[0]
>           z = np.zeros(n, dtype=bool)
>           for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
>               if j-i>  1:
>                   z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
>               else:
>                   z[i] = __inpolygon_scalar(x[i], y[i], poly)
>           return z
>
>
>
> if __name__ == "__main__":
>
>       import matplotlib
>       import matplotlib.pyplot as plt
>       from time import clock
>
>       n = 100000
>       polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
>       xp = [x for x,y in polygon]
>       yp = [y for x,y in polygon]
>       x = np.random.rand(n)
>       y = np.random.rand(n)
>       t0 = clock()
>       inside = inpolygon(x,y,polygon)
>       t1 = clock()
>       print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
>       plt.figure()
>       plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
>       plt.axis([0,1,0,1])
>       plt.show()
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list