[SciPy-dev] PEP: Improving the basic statistical functions in Scipy

josef.pktd@gmai... josef.pktd@gmai...
Fri Feb 27 14:14:55 CST 2009


> In your log example, I wouldn't want to get a nice number back. I want
> the function
> to complain. Silently changing the definition of mathematical
> operations creates a
> huge potential for errors (that's why I also don't like the silent
> conversions when casting to int)
> For example, if this is maximum likelihood estimation, the log
> likelihood is -inf
> and not some nice number.
>>>> x=np.array([0,1,2])
>>>> np.log(x).mean()
> I think if users want nice numbers, then they should mask them in the
> first place.

the more I think, about
>>> np.ma.log([0,1,2]).sum()
0.69314718055994529
>>> np.log([0,1,2]).sum()
-inf

the more worried, I get about using ma functions.

One example:
In the fit method of the distributions with bounded support, if there
are observations outside of the bound than the negative log-likelihood
is set to inf:

        cond0 = (x <= self.a) | (x >= self.b)
        if (any(cond0)):
            return inf
        else:
            N = len(x)
            return self._nnlf(x, *args) + N*log(scale)

In this case, it might still produce the correct result since the
check is before the aggregation. However, this is implementation
specific. If I had assigned the inf before the summation of the
log-likelihood contributions, ma.log would have removed them, and
killed the boundary check.

So when working with masked array functions, it is necessary to always
keep in mind that the math is defined differently, which promises many
happy hours of bug hunting.

Josef


More information about the Scipy-dev mailing list