[SciPy-dev] Statistics Review progress

Robert Kern robert.kern at gmail.com
Wed Apr 12 12:04:20 CDT 2006

Ed Schofield wrote:
> Robert Kern wrote:

>>* Some of the functions like mean() and std() are replications of functionality
>>in numpy and even the methods of array objects themselves. I would like to
>>remove them, but I imagine they are being used in various places. There's a
>>certain amount of code breakage I'm willing to accept in order to clean up
>>stats.py (e.g. all of my other bullet items), but this seems just gratuitous. 
> I think we should remove the duplicated functions mean, std, and var
> from stats.  The corresponding functions are currently imported from
> numpy into the stats namespace anyway.

Well, not for long.


But we could keep std(), var(), mean(), and median() in mind and import them
specifically. However, numpy.median() will have to grow an axis argument.

>>* We really need to sort out the issue of biased and unbiased estimators. At
>>least, a number of scipy.stats functions compute values that could be computed
>>in two different ways, conventionally given labels "biased" and "unbiased". Now
>>while there is some disagreement as to which is better (you get to guess which I
>>prefer), I think we should offer both.
>>Normally, I try to follow the design principle that if the value of a keyword
>>argument is almost always given as a constant (e.g. bias=True rather than
>>bias=flag_set_somewhere_else_in_my_code), then the functionality should be
>>exposed as two separate functions. However, there are a lot of these functions
>>in scipy.stats, and I don't think we would be doing anyone any favors by
>>doubling the number of these functions. IMO, "practicality beats purity" in this
> I'd argue strongly that var and std should be identical to the functions
> in numpy.  If we want this we'd need separate functions like varbiased.
> I don't really see the benefit of a 'bias' flag. 

Well, you snipped the use-case I gave.

> If we do encounter
> some real problems in handling the biased estimators consistently
> without it, we might as well argue for modifying the corresponding
> functions in numpy. 

Yes. I do in fact argue for that.

> But it'd be trivial to write
> def my_var_function_with_bias_flag(a, bias=True):
>     if bias:
>         return varbiased(a)
>     else:
>         return var(a)
> if this were ever necessary.

This is a bit backwards. I would implement varbiased() and var()

def varbiased(a):
  return var_with_flag(a, bias=True)

def var(a):
  return var_with_flag(a, bias=False)

I *don't* want three versions of each of these functions.

Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

More information about the Scipy-dev mailing list