# [SciPy-user] Array selection help

josef.pktd@gmai... josef.pktd@gmai...
Wed Feb 11 10:27:44 CST 2009

```What's your average number of observations per label?

If you have only a few number of observations per label, then using
the looping once through your array in python is faster, then the way
you were building your dict in the initial message:

Below are some timing comparisons, first line is your usage of numpy,
second line is one python loop.
you see that the python loop scales much better

Josef

(length of observation array is 2000, labels are random integers)

mean observation per label 40.0
0.404751721231
0.361348718907
>>>
mean observation per label 200.0
0.149529060262
0.349892234903
>>>
mean observation per label 4.0
2.87190969802
0.380998981716
>>>
mean observation per label 2.0
4.87971013076
0.405277207021
>>>
mean observation per label 400.0
0.117748205434
0.432144029481

for len(arr1) = 100000 and 10000 labels:

mean observation per label 10.0
22.9237349998
0.292642780018

Note: the return types differ, version two return plain lists as dict values
------------------------- file-----------------

import numpy as np
from scipy import ndimage
from numpy.testing import assert_array_equal

n = 10000
size = 100000
print 'mean observation per label', size/float(n)
rvs= np.random.randint(n,size=size)
arr1 = rvs
arr2 = float(n)-rvs

def usendimage(arr1,arr2):
for i in np.unique(arr1):
print i, ndimage.mean(arr2, labels=arr1, index=i)

labelsunique = np.unique(arr1)
print labelsunique
print ndimage.mean(arr2, labels=arr1, index=labelsunique)

def labelcoord1(arr1, arr2):
#Get all the unique values in arr1
U = np.unique ( arr1 )
#Create a dictionary with the unique values as key, and the
#locations of elements that have that value in arr1
R =  dict (zip ( [U[i] for i in xrange(U.shape[0])], \
[ np.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] ) )
return R # value of dict is tuple

def labelcoord2(arr1, arr2):

#Get all the unique values in arr1
U = np.unique ( arr1 )
#Create a dictionary with the unique values as key, and the
#locations of elements that have that value in arr1
R = {}
for index, row in enumerate(zip(arr1,arr2)):
R.setdefault(row[0],[]).append(index)
return R # value of dict is list

# So I now have a dictionary with the unique values of arr1, and the mean
# value of arr2 for those pixels.

import timeit
t=timeit.Timer("labelcoord1(arr1, arr2)", "from __main__ import *")
print t.timeit(1)
t=timeit.Timer("labelcoord2(arr1, arr2)", "from __main__ import *")
print t.timeit(1)

R1 = labelcoord1(arr1, arr2)
R2 = labelcoord2(arr1, arr2)
for k in sorted(R1):
assert_array_equal(R1[k][0], np.array(R2[k]))

On 2/11/09, Jose Luis Gomez Dans <josegomez@gmx.net> wrote:
> Hi,
>
>> > In essence, I want to have an array where each element is the mean value
>> for its corresponding class.
>>
>> Thanks, now I understand!  In that case your for-loop should be fine
>> (I guess you won't have too many unique indices?).
>
> Well, there can be quite a lot of them (~10000 at least), so it does take a
> long while. I was just wondering whether some numpy/scipy array Jedi trick
> might speed it up :)
>
> jose
> --
> Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
> für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
```