[SciPy-dev] Standard deviations
Paul Barrett
pebarrett at gmail.com
Tue Nov 29 15:48:14 CST 2005
I'd like to see more explicit method names. At first sight, 'a.var' and '
a.std' don't mean much to me, whereas 'a.variance' and 'a.standard_dev' do.
-- Paul
On 11/29/05, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
> Ed Schofield wrote:
>
> >Hi all,
> >
> >I have three questions related to standard deviations and variances in
> >scipy.
> >
> >First, can someone explain the behaviour of array.std() without any
> >arguments?
> >
> > >>> a = arange(30).reshape(3,10)
> > >>> a
> >array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
> > [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
> > [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
> > >>> a.std()
> >array([ 2.99856287, 2.85723522, 2.74647109, 2.67007684, 2.63104804,
> > 2.63104804, 2.67007684, 2.74647109, 2.85723522, 2.99856287])
> >
> >I don't understand what these numbers represent. The correct standard
> >deviations of the column vectors are given by:
> >
> > >>> a.std(0)
> >array([ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.])
> >
> >and the standard deviations of the row vectors are:
> >
> > >>> a.std(1)
> >array([ 3.02765035, 3.02765035, 3.02765035])
> >
> >I would have expected a.std() to give the same output as
> > >>> a.ravel().std()
> >8.8034084308295046
> >
> >which is what a.mean() does.
> >
> >
>
> This is a bug. Thanks for finding it. I'll look into it.
>
> >
> >
> >Second, I'd like to point out that some of the functions in Lib/stats/
> >have a different convention to scipy core about whether operations are
> >performed row-wise or column-wise, and whether anyone would object to my
> >changing the stats functions to operate column-wise. At the moment we
> >get this:
> >
> > >>> average(a)
> >array([ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.])
> >
> >which is column-wise, but
> >
> > >>> std(a)
> >array([ 3.02765035, 3.02765035, 3.02765035])
> >
> >which is row-wise. I presume the default behaviour of std() and friends
> >is just a historical relic. If so we'd be wise to get this straight
> >well before a 1.0 release.
> >
> >
> Good catch. It would be nice to have things as consistent as possible.
> Feel free to make consistency changes --- especially in stats.py which
> is still messy.
>
> >Third, I'd like to request that we add an array.var() method to scipy
> >core to compute an array's sample variance.
> >
> >At the moment it seems that there is no way to compute the sample
> >variance of an array of numbers without installing the full scipy.
> >Users needing to do this will either have to roll their own function in
> >Python, like this:
> >
> >def var(A):
> > m = len(A)
> > return average((a-means)**2) * (m/(m-1.))
> >
> >or square the output of std(). Both are less efficient than a native
> >array.var() would be, requiring extra memory copying and, in the second
> >case, squaring the result of a square root operation, which also
> >introduces numerical imprecision.
> >
> >The extra code required is minimal. There's an example patch below,
> >which works fine except that it inherits the weirdness of std().
> >
> >
> I'm O.K. with this. Anybody else see a problem?
>
> -Travis
>
