[Numpy-discussion] Medians that ignore values

Peter Saffrey pzs@dcs.gla.ac...
Fri Sep 19 06:21:12 CDT 2008

```David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> writes:

> It may be that nanmedian is slow. But I would sincerly be surprised if
> it were slower than python list, except for some pathological cases, or
> maybe a bug in nanmedian. What do your data look like ? (size, number of
> nan, etc...)
>

I've posted my test code below, which gives me the results:

\$ ./arrayspeed3.py
list build time: 0.01
list median time: 0.01
array nanmedian time: 0.36

I must have done something wrong to hobble nanmedian in this way... I'm quite
new to numpy, so feel free to point out any obviously egregious errors.

Peter

===

from numpy import array, nan, inf
from pylab import rand
from time import clock
from scipy.stats.stats import nanmedian

import pdb
_pdb = pdb.Pdb()
breakpoint = _pdb.set_trace

def my_median(vallist):
num_vals = len(vallist)
vallist.sort()
if num_vals % 2 == 1: # odd
index = (num_vals - 1) / 2
return vallist[index]
else: # even
index = num_vals / 2
return (vallist[index] + vallist[index - 1]) / 2

numtests = 100
testsize = 100
pointlen = 3

t0 = clock()
natests = rand(numtests,testsize,pointlen)
natests[natests > 0.9] = inf
tests = natests.tolist()
natests[natests==inf] = nan
for test in tests:
for point in test:
if inf in point:
point.remove(inf)
t1 = clock()
print "list build time:", t1-t0

t0 = clock()
allmedians = []
for test in tests:
medians = [ my_median(x) for x in test ]
allmedians.append(medians)
t1 = clock()
print "list median time:", t1-t0

t0 = clock()
namedians = []
for natest in natests:
thismed = nanmedian(natest, axis=1)
namedians.append(thismed)
t1 = clock()
print "array nanmedian time:", t1-t0

```