[SciPy-user] Array selection help
Jose Luis Gomez Dans
Tue Feb 10 15:36:00 CST 2009
Let's say I have two 2D arrays, arr1 and arr2. The elements of arr1 contain
different numbers (such as labels, for example), and the elements of arr2
contain some floating point data (say, height above sea level or something
like that). For each unique value in arr1, I want to work out the mean (...
sum, std dev, etc) of arr2 for the overlapping region. So far, I have used
the following code:
#Get all the unique values in arr1
U = numpy.unique ( arr1 )
#Create a dictionary with the unique values as key, and the
#locations of elements that have that value in arr1
R = dict (zip ( [U[i] for i in xrange(U.shape)], \
[ numpy.nonzero( arr1==U[i]) for i in xrange(U.shape) ] ) )
#Now, calculate the eg mean of arr2 per arr1 "label"
M = dict ( zip ( R.keys(), [ numpy.mean(arr2[R[i]]) for i in R.keys() ] ) )
# So I now have a dictionary with the unique values of arr1, and the mean
# value of arr2 for those pixels.
The code is fast and I was feeling rather smug and pleased with myself about
it :) However, when numpy.unique( arr1 ) increases,
[ numpy.nonzero( arr1==U[i]) for i in xrange(U.shape) ] starts taking a
long time (understandable, there are loads and loads of operations in that
loop). At present, I can easily have numpy.unique ( arr1).shape > 10000,
so it does take a long time.
Apart from looping through different values of arr1, can anyone think of an
efficient way of achieving something similar to this? It doesn't have to be a
dictionary as the output, an array or something else would do nicely.
Remote Sensing Unit | Env. Monitoring and Modelling Group
Dept. of Geography | Dept. of Geography
University College London | King's College London
Gower St, London WC1E 6BT UK | Strand Campus, Strand, London WC2R 2LS UK
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
More information about the SciPy-user