[Scipy-tickets] [SciPy] #631: Bug(s) in scipy.stats
SciPy
scipy-tickets@scipy....
Sun Apr 6 14:24:35 CDT 2008
#631: Bug(s) in scipy.stats
--------------------+-------------------------------------------------------
Reporter: ctw | Owner: somebody
Type: defect | Status: new
Priority: high | Milestone: 0.7
Component: Other | Version:
Severity: normal | Keywords:
--------------------+-------------------------------------------------------
In contrast to numpy.var for which the axis keyword defaults to None, the
scipy.stats.var function defaults to axis=0. However the
scipy.stats.ttest_1samp cals scipy.stats.var without specifying an axis
keyword, but it requires axis to be set to None. Therefore
scipy.stats.ttest_1samp currently gives wrong results when an ndarray is
passed in as an argument (it takes the grand mean, but divides by the
variance across the first axis).
A easy fix would be to change the line
{{{
v = var(a)
}}}
to
{{{
v = var(a,None)
}}}
However, I believe it should be seriously considered to change the
scipy.stats.var function instead. Before I move on to scipy.stats.var,
though, I also noticed the following lines in scipy.stats.ttest_1samp:
{{{
df = n-1
svar = ((n-1)*v) / float(df)
}}}
Obviously this is equivalent to (v is a float):
{{{
df = n-1
svar = v
}}}
I haven't yet checked if the result of scipy.stats.ttest_1samp would be
correct if the variance were calculated properly (and unfortunately there
are no unit tests for this function), so I'm not sure if this is just an
awkward way to do the right thing, or if this calculation is wrong.
The function scipy.stats.obrientransform might also (incorrectly) assume
that scipy.stats.var defaults to axis=None.
Here are some things to consider regarding scipy.stats.var:
1) Should there really be multiple functions by the same name that
calculate (slightly) different values? I think it would be much better to
make all var functions (in numpy and scipy) accept a boolean to determine
if they return the biased or the unbiased variance.
2) Regardless of the above point, I think the axis keyword should always
default to the same value (None).
--
Ticket URL: <http://scipy.org/scipy/scipy/ticket/631>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list