[Numpy-discussion] the mean, var, std of empty arrays

Sebastian Berg sebastian@sipsolutions....
Thu Nov 22 06:14:34 CST 2012


On Wed, 2012-11-21 at 22:58 -0500, josef.pktd@gmail.com wrote:
> On Wed, Nov 21, 2012 at 10:35 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
> >
> >
> > On Wed, Nov 21, 2012 at 7:45 PM, <josef.pktd@gmail.com> wrote:
> >>
> >> On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau <shish@keba.be> wrote:
> >> > Current behavior looks sensible to me. I personally would prefer no
> >> > warning
> >> > but I think it makes sense to have one as it can be helpful to detect
> >> > issues
> >> > faster.
> >>
> >> I agree that nan should be the correct answer.
> >> (I gave up trying to define a default for 0/0 in scipy.stats ttests.)
> >>
> >> some funnier cases
> >>
> >> >>> np.var([1], ddof=1)
> >> 0.0
> >
> >
> > This one is a nan in development.
> >
> >>
> >> >>> np.var([1], ddof=5)
> >> -0
> >> >>> np.var([1,2], ddof=5)
> >> -0.16666666666666666
> >> >>> np.std([1,2], ddof=5)
> >> nan
> >>
> >
> > These still do this. Also
> >
> > In [10]: var([], ddof=1)
> > Out[10]: -0
> >
> > Which suggests that the nan is pretty much an accidental byproduct of
> > division by zero. I think it might make sense to have a definite policy for
> > these corner cases.
> 
> It would also be consistent with the usual pattern to raise a
> ValueError on this. ddof too large, size too small.
> It wouldn't be the case that for some columns or rows we get valid
> answers in this case, as long as we don't allow for missing values.
> 

It seems to me that nan is the reasonable result for these operations
(reduce like operations that do not have an identity). Though actually
reduce operations without an identity throw a ValueError (ie.
`np.minimum.reduce([])`), but then mean/std/var seem special enough to
be different from other reduce operations (for example their result is
always floating point). As for usability I think for example when
plotting errorbars using std, it would be rather annoying to get a
ValueError, so if anything the reduce machinery could give more special
results for empty floating point reductions.

In any case the warning should be clearer and for too large ddof's I
would say it should return nan+Warning as well.

Sebastian

> 
> quick check with np.ma
> 
> looks correct except when delegating to numpy ?
> 
> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=5, axis=0)
> >>> s
> masked_array(data = [-- --],
>              mask = [ True  True],
>        fill_value = 1e+20)
> 
> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=1, axis=0)
> >>> s
> masked_array(data = [0.0 --],
>              mask = [False  True],
>        fill_value = 1e+20)
> 
> >>> s = np.ma.std([1,2], ddof=5)
> >>> s
> masked
> >>> type(s)
> <class 'numpy.ma.core.MaskedConstant'>
> 
> >>> np.ma.var([1,2], ddof=5)
> -0.16666666666666666
> 
> 
> Josef
> 
> >
> > <snip>
> >
> > Chuck
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 




More information about the NumPy-Discussion mailing list