[SciPy-dev] PEP: Improving the basic statistical functions in Scipy
Bruce Southey
bsouthey@gmail....
Fri Feb 27 11:42:20 CST 2009
josef.pktd@gmail.com wrote:
[snip]
> What I would like to do, but didn't have the time yet is to run the
> tests for stats.stats
> on stats.mstats. This way even if we would have some duplicate
> functions, we would
> have some cross check that they are consistent, and it would be a reminder for
> bug fixing also the other version.
>
Okay, I do not know how to get timeit to work with numpy/scipy but this
is not how I would like it to be. But I managed somehow to (unfairly)
compare the geometric means function (gmean) using this code:
import timeit
stand_t=timeit.Timer('scipy.stats.stats.gmean(X, axis=xs)', 'import
numpy, scipy.stats.stats; X=numpy.random.gamma(shape=2, scale=1,
size=(1,10)); xs=None').timeit(1000)
masked_t=timeit.Timer('scipy.stats.mstats.gmean(X, axis=xs)', 'import
numpy, scipy.stats.stats; X=numpy.random.gamma(shape=2, scale=1,
size=(1,10)); xs=None').timeit(1000)
numpy_t=timeit.Timer('numpy.exp((numpy.log(X).mean()))', 'import numpy,
numpy.random; X=numpy.random.gamma(shape=2, scale=1,
size=(1,10))').timeit(1000)
I use Linux and Python 2.5 but my system is very buzy so perhaps not
that fair for benchmarks.
numpy.__version__ '1.3.0.dev6338'
scipy.__version__ '0.8.0.dev5597'
There is a cost of using _chk_asarray in this case which decreases as
the array size increases. (I am not sure that _chk_asarray is really
needed anyhow.)
There is a huge cost for using masked array for small sizes but
decreases as the array size increases.
For 1 by 10 array, the difference between masked and non masked versions
was 0.13 seconds to do it 1000 times with the ratio of masked to non
masked = 7.94
For 1 by 10000 array, the difference between masked and non masked
versions was 0.07 seconds to do it 1000 times with the ratio of masked
to non masked = 2.14
However, briefly looking at some of these functions, I think that
numpy/scipy would naturally handle the array type as I know
numpy.exp((numpy.log(X).mean())) this works whether X is the usual array
or if it is a masked array. If so then there is no reason for different
functions unless we need to address masks.
Bruce
More information about the Scipy-dev
mailing list