[Scipy-tickets] [SciPy] #631: Bug(s) in scipy.stats

SciPy scipy-tickets@scipy....
Sun Apr 6 14:24:35 CDT 2008


#631: Bug(s) in scipy.stats
--------------------+-------------------------------------------------------
 Reporter:  ctw     |       Owner:  somebody
     Type:  defect  |      Status:  new     
 Priority:  high    |   Milestone:  0.7     
Component:  Other   |     Version:          
 Severity:  normal  |    Keywords:          
--------------------+-------------------------------------------------------
 In contrast to numpy.var for which the axis keyword defaults to None, the
 scipy.stats.var function defaults to axis=0. However the
 scipy.stats.ttest_1samp cals scipy.stats.var without specifying an axis
 keyword, but it requires axis to be set to None. Therefore
 scipy.stats.ttest_1samp currently gives wrong results when an ndarray is
 passed in as an argument (it takes the grand mean, but divides by the
 variance across the first axis).

 A easy fix would be to change the line

 {{{
     v = var(a)
 }}}

 to

 {{{
     v = var(a,None)
 }}}

 However, I believe it should be seriously considered to change the
 scipy.stats.var function instead. Before I move on to scipy.stats.var,
 though, I also noticed the following lines in scipy.stats.ttest_1samp:
 {{{
     df = n-1
     svar = ((n-1)*v) / float(df)
 }}}
 Obviously this is equivalent to (v is a float):
 {{{
     df = n-1
     svar = v
 }}}
 I haven't yet checked if the result of scipy.stats.ttest_1samp would be
 correct if the variance were calculated properly (and unfortunately there
 are no unit tests for this function), so I'm not sure if this is just an
 awkward way to do the right thing, or if this calculation is wrong.

 The function scipy.stats.obrientransform might also (incorrectly) assume
 that scipy.stats.var defaults to axis=None.

 Here are some things to consider regarding scipy.stats.var:

 1) Should there really be multiple functions by the same name that
 calculate (slightly) different values? I think it would be much better to
 make all var functions (in numpy and scipy) accept a boolean to determine
 if they return the biased or the unbiased variance.

 2) Regardless of the above point, I think the axis keyword should always
 default to the same value (None).

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/631>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list