# [Numpy-discussion] Medians that ignore values

Peter Saffrey pzs@dcs.gla.ac...
Mon Sep 22 05:23:59 CDT 2008

```David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> writes:

> Still, it is indeed really slow for your case; when I fixed nanmean and
> co, I did not know much about numpy, I just wanted them to give the
> (where the axis along which the median is computed is really small).
>

I've found that if I just cut nans from the list and use regular numpy median,
it is quicker - 10 times slower than list median, rather than 35 times slower.
Could you just wire nanmedian to do it this way? The only difference is that on
an empty list, nanmedian gives nan, but median throws an IndexError.

Below is my profiling code with this change. Sample output:

\$ ./arrayspeed3.py
list build time: 0.16
list median time: 0.08
array nanmedian time: 0.98

Peter

===

from numpy import *
from pylab import rand
from time import clock
from scipy.stats.stats import nanmedian

def my_median(vallist):
num_vals = len(vallist)
if num_vals == 0:
return nan
vallist.sort()
if num_vals % 2 == 1: # odd
index = (num_vals - 1) / 2
return vallist[index]
else: # even
index = num_vals / 2
return (vallist[index] + vallist[index - 1]) / 2

numtests = 100
testsize = 1000
pointlen = 3

t0 = clock()
natests = rand(numtests,testsize,pointlen)
natests[natests > 0.9] = inf
tests = natests.tolist()
natests[natests==inf] = nan
for test in tests:
for point in test:
while inf in point:
point.remove(inf)
t1 = clock()
print "list build time:", t1-t0

allmedians = []
t0 = clock()
for test in tests:
medians = [ my_median(x) for x in test ]
allmedians.append(medians)
t1 = clock()
print "list median time:", t1-t0

t0 = clock()
namedians = []
for natest in natests:
thismed = []
for point in natest:
else:
med = nan
thismed.append(med)
namedians.append(thismed)
t1 = clock()
print "array nanmedian time:", t1-t0

```