[SciPy-User] stats, classes instead of functions for results MovStats
Mon Nov 23 00:39:16 CST 2009
On Nov 23, 2009, at 12:43 AM, email@example.com wrote:
> Following up on a question by Keith on the numpy list and his reminder
> that covariance can be calculated by the cross-product minus the
> product of the means, I redid and
> enhanced my moving stats functions.
> Suppose x and y are two time series, then the moving correlation
> requires the calculation of the mean, variance and covariance for each
> window. Currently in scipy stats intermediate results are usually
> thrown away on return (while rpy/R returns all intermediate results
> used for the calculation.
> Using a decorator/descriptor of Fernando written for nitime, I tried
> out to write the function as a class instead, so that any desired (
> intermediate) calculations are only made on demand, but once they are
> calculated they are attached to the class as attributes or properties.
> This seems to be a useful "pattern".
> Are there any opinion for using the pattern in scipy.stats ? MovStats
> will currently go into statsmodels
> Below is the class (with cutting part of init), a full script is the
> attachment, including examples that test the class.
> about MovStats:
> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
> axis=1, should (but may not yet) work for nd arrays along any axis
> (signal.correlate docstring)
> nans are handled by dropping the corresponding observations from the
> window, not adding any additional observations,
> not tested if a window is empty because it contains only nans, nor if
> variance is zero
> (kern is intended for weighted statistics in the window but not tested
> yet, I still need to decide on normalization requirements)
> requires scipy.signal, all calculations done with signal.correlate, no loops
> as often, functions are one-liners
> all results are returned for valid observations only, initial
> observations with incomplete window are cut
> bonus: slope of moving regression of y on x, since it was trivial to add
> still some cleaning and documentation to do
Can you add support for MaskedArrays ?
The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed.
You can also check what Matt did w/ scikits.timeseries.
About your suggestion: I'd leave it in statsmodels for now...
More information about the SciPy-User