[Numpy-discussion] scan array to extract min-max values (with if condition)

Brett Olsen brett.olsen@gmail....
Sat Sep 11 09:19:49 CDT 2010


On Sat, Sep 11, 2010 at 7:45 AM, Massimo Di Stefano
<massimodisasha@gmail.com> wrote:
> Hello All,
>
> i need to extract data from an array, that are inside a
> rectangle area defined as :
>
> N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625
>
> the data are in a csv (comma delimited text file, with 3 columns X,Y,Z)
>
> #X,Y,Z
> 3020081.5500,769999.3100,0.0300
> 3020086.2000,769991.6500,0.4600
> 3020099.6600,769996.2700,0.9000
> ...
> ...
>
> i read it using " numpy.loadtxt "
>
> data :
>
> http://www.geofemengineering.it/data/csv.txt     5,3 mb (158735 rows)
>
> to extract data that are inside the boundy-box area (N, S, E, W) i'm using a loop
> inside a function like :
>
> import numpy as np
>
> def getMinMaxBB(data, N, S, E, W):
>        mydata = data * 0.3048006096012
>        for i in range(len(mydata)):
>                if mydata[i,0] < E or mydata[i,0] > W or mydata[i,1] < N or mydata[i,1] > S :
>                        if i == 0:
>                                newdata = np.array((mydata[i,0],mydata[i,1],mydata[i,2]), float)
>                        else :
>                                newdata = np.vstack((newdata,(mydata[i,0], mydata[i,1], mydata[i,2])))
>        results = {}
>        results['Max_Z'] = newdata.max(0)[2]
>        results['Min_Z'] = newdata.min(0)[2]
>        results['Num_P'] = len(newdata)
>        return results
>
>
> N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625
> data = '/Users/sasha/csv.txt'
> mydata = np.loadtxt(data, comments='#', delimiter=',')
> out = getMinMaxBB(mydata, N, S, E, W)
>
> print out

Use boolean arrays to index the parts of your array that you want to look at:

def newGetMinMax(data, N, S, E, W):
	mydata = data * 0.3048006096012
	mask = np.zeros(mydata.shape[0], dtype=bool)
	mask |= mydata[:,0] < E
	mask |= mydata[:,0] > W
	mask |= mydata[:,1] < N
	mask |= mydata[:,1] > S
	results = {}
	results['Max_Z'] = mydata[mask,2].max()
	results['Min_Z'] = mydata[mask,2].min()
	results['Num_P'] = mask.sum()
	return results

This runs about 5000 times faster on my machine.

Brett


More information about the NumPy-Discussion mailing list