[SciPy-dev] Statistics Review progress
Robert Kern
robert.kern at gmail.com
Wed Apr 12 12:04:20 CDT 2006
Ed Schofield wrote:
> Robert Kern wrote:
>>* Some of the functions like mean() and std() are replications of functionality
>>in numpy and even the methods of array objects themselves. I would like to
>>remove them, but I imagine they are being used in various places. There's a
>>certain amount of code breakage I'm willing to accept in order to clean up
>>stats.py (e.g. all of my other bullet items), but this seems just gratuitous.
>
> I think we should remove the duplicated functions mean, std, and var
> from stats. The corresponding functions are currently imported from
> numpy into the stats namespace anyway.
Well, not for long.
http://projects.scipy.org/scipy/scipy/ticket/192
But we could keep std(), var(), mean(), and median() in mind and import them
specifically. However, numpy.median() will have to grow an axis argument.
>>* We really need to sort out the issue of biased and unbiased estimators. At
>>least, a number of scipy.stats functions compute values that could be computed
>>in two different ways, conventionally given labels "biased" and "unbiased". Now
>>while there is some disagreement as to which is better (you get to guess which I
>>prefer), I think we should offer both.
>>
>>Normally, I try to follow the design principle that if the value of a keyword
>>argument is almost always given as a constant (e.g. bias=True rather than
>>bias=flag_set_somewhere_else_in_my_code), then the functionality should be
>>exposed as two separate functions. However, there are a lot of these functions
>>in scipy.stats, and I don't think we would be doing anyone any favors by
>>doubling the number of these functions. IMO, "practicality beats purity" in this
>>case.
>
> I'd argue strongly that var and std should be identical to the functions
> in numpy. If we want this we'd need separate functions like varbiased.
>
> I don't really see the benefit of a 'bias' flag.
Well, you snipped the use-case I gave.
> If we do encounter
> some real problems in handling the biased estimators consistently
> without it, we might as well argue for modifying the corresponding
> functions in numpy.
Yes. I do in fact argue for that.
> But it'd be trivial to write
>
> def my_var_function_with_bias_flag(a, bias=True):
> if bias:
> return varbiased(a)
> else:
> return var(a)
>
> if this were ever necessary.
This is a bit backwards. I would implement varbiased() and var()
def varbiased(a):
return var_with_flag(a, bias=True)
def var(a):
return var_with_flag(a, bias=False)
I *don't* want three versions of each of these functions.
--
Robert Kern
robert.kern at gmail.com
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Scipy-dev
mailing list