[SciPy-User] stats, classes instead of functions for results MovStats
Bruce Southey
bsouthey@gmail....
Mon Nov 30 15:05:03 CST 2009
On 11/22/2009 11:43 PM, josef.pktd@gmail.com wrote:
> Following up on a question by Keith on the numpy list and his reminder
> that covariance can be calculated by the cross-product minus the
> product of the means, I redid and
> enhanced my moving stats functions.
>
> Suppose x and y are two time series, then the moving correlation
> requires the calculation of the mean, variance and covariance for each
> window. Currently in scipy stats intermediate results are usually
> thrown away on return (while rpy/R returns all intermediate results
> used for the calculation.
>
> Using a decorator/descriptor of Fernando written for nitime, I tried
> out to write the function as a class instead, so that any desired (
> intermediate) calculations are only made on demand, but once they are
> calculated they are attached to the class as attributes or properties.
> This seems to be a useful "pattern".
>
> Are there any opinion for using the pattern in scipy.stats ? MovStats
> will currently go into statsmodels
>
> Below is the class (with cutting part of init), a full script is the
> attachment, including examples that test the class.
>
> about MovStats:
> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
> axis=1, should (but may not yet) work for nd arrays along any axis
> (signal.correlate docstring)
> nans are handled by dropping the corresponding observations from the
> window, not adding any additional observations,
> not tested if a window is empty because it contains only nans, nor if
> variance is zero
> (kern is intended for weighted statistics in the window but not tested
> yet, I still need to decide on normalization requirements)
> requires scipy.signal, all calculations done with signal.correlate, no loops
> as often, functions are one-liners
> all results are returned for valid observations only, initial
> observations with incomplete window are cut
> bonus: slope of moving regression of y on x, since it was trivial to add
> still some cleaning and documentation to do
>
> usage:
> ms = MovStats(x, y, axis=1)
> ms.yvar
> ms.xmean
> ms.yxcorr
> ms.yxcov
> ...
>
>
> Josef
>
> class MovStats(object):
> def __init__(self, y, x=None, kern=5, axis=0):
> self.y = y
> self.x = x
> if np.isscalar(kern):
> ws = kern
> <... snip>
>
> @OneTimeProperty
> def ymean(self):
> ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice]
> ym = ys/self.n
> return ym
>
> @OneTimeProperty
> def yvar(self):
> ys2 = signal.correlate(self.y*self.y, self.kern,
> mode='same')[self.sslice]
> yvar = ys2/self.n - self.ymean**2
> return yvar
>
> @OneTimeProperty
> def xmean(self):
> if self.x is None:
> return None
> else:
> xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice]
> xm = xs/self.n
> return xm
>
> @OneTimeProperty
> def xvar(self):
> if self.x is None:
> return None
> else:
> xs2 = signal.correlate(self.x*self.x, self.kern,
> mode='same')[self.sslice]
> xvar = xs2/self.n - self.xmean**2
> return xvar
> @OneTimeProperty
> def yxcov(self):
> xys = signal.correlate(self.x*self.y, self.kern,
> mode='same')[self.sslice]
> return xys/self.n - self.ymean*self.xmean
>
> @OneTimeProperty
> def yxcorr(self):
> return self.yxcov/np.sqrt(self.yvar*self.xvar)
>
> @OneTimeProperty
> def yxslope(self):
> return self.yxcov/self.xvar
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
I think your handling of NaN's is incorrect because you do not drop the
corresponding observations. That is for two arrays
y=np.array([[ 1.229563, -0.339428, 0.83891 , 4.026574, 3.069378,
5.95668 ]])
x=np.array([[-1.236469, 1.941089, -0.346566, -0.268529, np.nan,
0.191336]])
For a windows size of 5, in the first window, the first mean and
variance of y should use all 5 elements of y, the mean and variance of X
should use the first 4 elements of x and, the regression and correlation
coefficients should use the first 4 elements of x and y.
Some other points:
1) Your calculation of variance is susceptible to errors, see
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Provided that you are using sufficient precision (like numpy defaults)
it is probably not that big a problem.
2) You only use the 'full' windows so when the window width is 5, you
miss the first two windows and the last 2 windows. At least the mean
exists in these windows and the variance in most of these partial
windows. This may provided unexpected results to a user if they do not
release which windows are not returned.
3) I think the user needs to define the kern argument for your MovStat
class as there is probably no meaningful default value (except 42).
4) I do not know how you should handle positive and negative infinity.
5) Your code expects at least 2 dimensions so 1-d arrays fail because
you can not do this assignment 'kdim[axis] = ws' with 1-d arrays.
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20091130/64ff314a/attachment.html
More information about the SciPy-User
mailing list