[Numpy-discussion] help in improving data analysis code

Francesc Altet faltet at carabos.com
Fri Nov 25 07:28:04 CST 2005


A Divendres 25 Novembre 2005 15:24, gf va escriure:
>
> from numarray import add, array, asarray, absolute, argsort, floor, take,
> size
>
> def mean(m,axis=0):
>     m = asarray(m)
>     return add.reduce(m,axis)/float(m.shape[axis])
>
> def eliminate_outliers(dat,frac):
>     num_to_eliminate = int(floor(size(dat,0)*frac))
>     for i in range(num_to_eliminate):
>         ind = argsort(absolute(dat-mean(dat)),0)
>         sdat = take(dat,ind,0)[:,0]
>         dat = sdat[:-1]
>     return dat
>
> #--------------------------------------------------------------------
>
> if __name__ == "__main__":
>     from MLab import rand
>     sz = 100
>     nn = rand(sz,1)
>     nn[:10] = 20*rand(10,1)
>     nn[sz-10:] = -20*rand(10,1)
>     print eliminate_outliers(nn,0.10)

For sz=100, the next line of code is 10x faster on my machine (more if
sz is bigger):

      print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]

I haven't checked it very carefully, so you should double check it.
BTW, you will need to use the numarray MLab interface:

    from numarray.mlab import rand

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"





More information about the Numpy-discussion mailing list