[SciPy-user] Array selection help

josef.pktd@gmai... josef.pktd@gmai...
Wed Feb 11 10:47:52 CST 2009


list comprehension is still a bit faster. That's about 90 times faster
than your version for building the dict of indices for this case.

Josef

def labelcoord3(arr1, arr2):
    R = {}
    [R.setdefault(row[0],[]).append(index) for index, row in
                     enumerate(zip(arr1,arr2))]
    return R

mean observation per label 10.0
labelcoord2 0.374560733278
labelcoord3 0.254217505297
>>> len(R3)   # number of different labels
10000
>>> len(arr1)  # number of observations
100000

On 2/11/09, josef.pktd@gmail.com <josef.pktd@gmail.com> wrote:
> What's your average number of observations per label?
>
> If you have only a few number of observations per label, then using
> the looping once through your array in python is faster, then the way
> you were building your dict in the initial message:
>
> Below are some timing comparisons, first line is your usage of numpy,
> second line is one python loop.
> you see that the python loop scales much better
>
> Josef
>
>
> (length of observation array is 2000, labels are random integers)
>
> mean observation per label 40.0
> 0.404751721231
> 0.361348718907
>>>>
> mean observation per label 200.0
> 0.149529060262
> 0.349892234903
>>>>
> mean observation per label 4.0
> 2.87190969802
> 0.380998981716
>>>>
> mean observation per label 2.0
> 4.87971013076
> 0.405277207021
>>>>
> mean observation per label 400.0
> 0.117748205434
> 0.432144029481
>
> for len(arr1) = 100000 and 10000 labels:
>
> mean observation per label 10.0
> 22.9237349998
> 0.292642780018
>
> Note: the return types differ, version two return plain lists as dict
> values
> ------------------------- file-----------------
>
>
> import numpy as np
> from scipy import ndimage
> from numpy.testing import assert_array_equal
>
> n = 10000
> size = 100000
> print 'mean observation per label', size/float(n)
> rvs= np.random.randint(n,size=size)
> arr1 = rvs
> arr2 = float(n)-rvs
>
> def usendimage(arr1,arr2):
>     for i in np.unique(arr1):
>         print i, ndimage.mean(arr2, labels=arr1, index=i)
>
>     labelsunique = np.unique(arr1)
>     print labelsunique
>     print ndimage.mean(arr2, labels=arr1, index=labelsunique)
>
>
>
> def labelcoord1(arr1, arr2):
>     #Get all the unique values in arr1
>     U = np.unique ( arr1 )
>     #Create a dictionary with the unique values as key, and the
>     #locations of elements that have that value in arr1
>     R =  dict (zip ( [U[i] for i in xrange(U.shape[0])], \
>     [ np.nonzero( arr1==U[i]) for i in xrange(U.shape[0]) ] ) )
>     return R # value of dict is tuple
>
> def labelcoord2(arr1, arr2):
>
>     #Get all the unique values in arr1
>     U = np.unique ( arr1 )
>     #Create a dictionary with the unique values as key, and the
>     #locations of elements that have that value in arr1
>     R = {}
>     for index, row in enumerate(zip(arr1,arr2)):
>         R.setdefault(row[0],[]).append(index)
>     return R # value of dict is list
>
>
>     # So I now have a dictionary with the unique values of arr1, and the
> mean
>     # value of arr2 for those pixels.
>
>
>
> import timeit
> t=timeit.Timer("labelcoord1(arr1, arr2)", "from __main__ import *")
> print t.timeit(1)
> t=timeit.Timer("labelcoord2(arr1, arr2)", "from __main__ import *")
> print t.timeit(1)
>
> R1 = labelcoord1(arr1, arr2)
> R2 = labelcoord2(arr1, arr2)
> for k in sorted(R1):
>     assert_array_equal(R1[k][0], np.array(R2[k]))
>
>
>
>
> On 2/11/09, Jose Luis Gomez Dans <josegomez@gmx.net> wrote:
>> Hi,
>>
>>> > In essence, I want to have an array where each element is the mean
>>> > value
>>> for its corresponding class.
>>>
>>> Thanks, now I understand!  In that case your for-loop should be fine
>>> (I guess you won't have too many unique indices?).
>>
>> Well, there can be quite a lot of them (~10000 at least), so it does take
>> a
>> long while. I was just wondering whether some numpy/scipy array Jedi
>> trick
>> might speed it up :)
>>
>> jose
>> --
>> Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
>> für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
>> _______________________________________________
>> SciPy-user mailing list
>> SciPy-user@scipy.org
>> http://projects.scipy.org/mailman/listinfo/scipy-user
>>
>


More information about the SciPy-user mailing list