[Numpy-discussion] Medians that ignore values

Peter Saffrey pzs@dcs.gla.ac...
Mon Sep 22 05:23:59 CDT 2008


David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> writes:

> Still, it is indeed really slow for your case; when I fixed nanmean and
> co, I did not know much about numpy, I just wanted them to give the
> right answer :) I think this can be made faster, specially for your case
> (where the axis along which the median is computed is really small).
> 

I've found that if I just cut nans from the list and use regular numpy median,
it is quicker - 10 times slower than list median, rather than 35 times slower.
Could you just wire nanmedian to do it this way? The only difference is that on
an empty list, nanmedian gives nan, but median throws an IndexError.

Below is my profiling code with this change. Sample output:

$ ./arrayspeed3.py
list build time: 0.16
list median time: 0.08
array nanmedian time: 0.98

Peter

===

from numpy import *
from pylab import rand
from time import clock
from scipy.stats.stats import nanmedian

def my_median(vallist):
	num_vals = len(vallist)
	if num_vals == 0:
		return nan
	vallist.sort()
	if num_vals % 2 == 1: # odd
		index = (num_vals - 1) / 2
		return vallist[index]
	else: # even
		index = num_vals / 2
		return (vallist[index] + vallist[index - 1]) / 2

numtests = 100
testsize = 1000
pointlen = 3

t0 = clock()
natests = rand(numtests,testsize,pointlen)
# have to start with inf because list.remove(nan) doesn't remove nan
natests[natests > 0.9] = inf
tests = natests.tolist()
natests[natests==inf] = nan
for test in tests:
	for point in test:
		while inf in point:
			point.remove(inf)
t1 = clock()
print "list build time:", t1-t0


allmedians = []
t0 = clock()
for test in tests:
	medians = [ my_median(x) for x in test ]
	allmedians.append(medians)
t1 = clock()
print "list median time:", t1-t0

t0 = clock()
namedians = []
for natest in natests:
	thismed = []
	for point in natest:
		maskpoint = point[negative(isnan(point))]
		if len(maskpoint) > 0:
			med = median(maskpoint)
		else:
			med = nan
		thismed.append(med)
	namedians.append(thismed)
t1 = clock()
print "array nanmedian time:", t1-t0






More information about the Numpy-discussion mailing list