[Scipy-tickets] [SciPy] #845: check consistency in implementation between stats and mstats

SciPy Trac scipy-tickets@scipy....
Mon Aug 6 21:31:23 CDT 2012

#845: check consistency in implementation between stats and mstats
 Reporter:  josefpktd    |       Owner:  somebody   
     Type:  enhancement  |      Status:  new        
 Priority:  normal       |   Milestone:  Unscheduled
Component:  scipy.stats  |     Version:  devel      
 Keywords:               |  

Comment(by yarikoptic):

 moreover some of them diverged in API. e.g. mstats_basic.ttest_1samp does
 not support "axis=" argument, although borrowed __doc__ lists it:

 def ttest_onesamp(a, popmean):
     a = ma.asarray(a)
 ttest_onesamp.__doc__ = stats.ttest_1samp.__doc__
 ttest_1samp = ttest_onesamp

 altogether I still wonder -- wouldn't it be feasible to join the two.
 obviously some functions would need some special handling for masked
 arrays, but many would work "out of the box" with minor changes due to
 fancy python's broadcasting.  E.g. here would be a starting point for
 ttest_1samp to support masked arrays:

 commit c9f7c3e9202680967521445dd56c794252e8a2d6
 Author: Yaroslav Halchenko <debian@onerussian.com>
 Date:   Mon Aug 6 21:35:42 2012 -0400

     ENH: make ttest_1samp handle numpy's masked arrays

     have done that before discovering mstat submodule ;-)

 diff --git a/scipy/stats/_support.py b/scipy/stats/_support.py
 index 122cd90..374a442 100644
 --- a/scipy/stats/_support.py
 +++ b/scipy/stats/_support.py
 @@ -218,6 +218,15 @@ def _chk_asarray(a, axis):
          outaxis = axis
      return a, outaxis

 +def _chk_asanyarray(a, axis):
 +    a = np.asanyarray(a)
 +    if axis is None:
 +        a = a.ravel()
 +        outaxis = 0
 +    else:
 +        outaxis = axis
 +    return a, outaxis
  def _chk2_asarray(a, b, axis):
      if axis is None:
          a = np.ravel(a)
 @@ -229,6 +238,12 @@ def _chk2_asarray(a, b, axis):
          outaxis = axis
      return a, b, outaxis

 +def _get_n(a, axis):
 +    if isinstance(a, numpy.ma.core.MaskedArray):
 +        return a.count(axis=axis)
 +    else:
 +        return a.shape[axis]
  def compute_stderr(a, axis=0, ddof=1):
      a, axis = _chk_asarray(a, axis)
      return np.std(a,axis,ddof=1) / float(np.sqrt(a.shape[axis]))
 diff --git a/scipy/stats/stats.py b/scipy/stats/stats.py
 index 2ba76f2..cc56580 100644
 --- a/scipy/stats/stats.py
 +++ b/scipy/stats/stats.py
 @@ -199,7 +199,7 @@ import distributions

  # Local imports.
  import _support
 -from _support import _chk_asarray, _chk2_asarray
 +from _support import _chk_asarray, _chk2_asarray, _chk_asanyarray, _get_n
  from _rank import rankdata, tiecorrect

  __all__ = ['find_repeats', 'gmean', 'hmean', 'cmedian', 'mode',
 @@ -2931,13 +2931,13 @@ def ttest_1samp(a, popmean, axis=0):
             [  7.89094663e-03,   1.49986458e-04]]))

 -    a, axis = _chk_asarray(a, axis)
 -    n = a.shape[axis]
 +    a, axis = _chk_asanyarray(a, axis)
 +    n = _get_n(a, axis)
      df= n - 1

      d = np.mean(a, axis) - popmean
      v = np.var(a, axis, ddof=1)
 -    denom = np.sqrt(v / float(n))
 +    denom = np.sqrt(v / n)

      t = np.divide(d, denom)
      t, prob = _ttest_finish(df, t)

 another somewhat negative IMHO side-effect now is that masked arrays are
 silently demasked by functions in stats without any warning... I hope that
 anyone (like me) who blindly passes a masked array would first check
 obtained results for validity without a silly hope that awesome
 numpy/scipy folks just figured it all out and it should work with masked
 arrays correctly ;)
 may be it is worth adding checking the class in _chk_asarray for obvious
 victims such as masked arrays and issuing a warning?

Ticket URL: <http://projects.scipy.org/scipy/ticket/845#comment:2>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.

More information about the Scipy-tickets mailing list