[SciPy-dev] Homogenizing stats & mstats

Pierre GM pgmdevlist@gmail....
Fri Jul 24 01:15:39 CDT 2009


All,
I was browsing some recent tickets for scipy.stats, and couldn't but  
noticed that a significant number of them (#845, #822, #901...),  are  
related to some lack of consistency between stats and mstats.

I'd like to eventually get rid of mstats all together, provided the  
same functionalities are supported in stats.
* A first step would be to use np.asanyarray instead of np.asarray.  
That should be sufficient for functions like gmean and hmean for  
example.
* A second step would be to use numpy.ma under the hood, returning  
either a MaskedArray if the input is a MaskedArray itself, or just a  
standard ndarray otherwise. That should take care of the functions  
related to ranking and tie handling (I'm pretty confident into the  
mstats routines, and we can always double-check the results w/ R). If  
needed, we could also add a usemask flag, like we do in  
np.io.genfromtxt.
* A third would be to port the remaining routines of mstats.extras to  
stats or morestats (Harrell-Davies quantiles could be imlemented more  
efficiently in cython, for example).

At each step, we could add a Deprecate warning to a reviewed mstat  
function and call the corresponding stat function instead.

What would be a good time line ? 0.8.0, or is it too late? 0.9.0 ?

Comments expected.
Thx in advance
P.




More information about the Scipy-dev mailing list