[SciPy-User] stats, classes instead of functions for results MovStats

josef.pktd@gmai... josef.pktd@gmai...
Mon Nov 23 01:13:28 CST 2009

On Mon, Nov 23, 2009 at 1:39 AM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Nov 23, 2009, at 12:43 AM, josef.pktd@gmail.com wrote:
>> Following up on a question by Keith on the numpy list and his reminder
>> that covariance can be calculated by the cross-product minus the
>> product of the means, I redid and
>> enhanced my moving stats functions.
>> Suppose x and y are two time series, then the moving correlation
>> requires the calculation of the mean, variance and covariance for each
>> window. Currently in scipy stats intermediate results are usually
>> thrown away on return (while rpy/R returns all intermediate results
>> used for the calculation.
>> Using a decorator/descriptor of Fernando written for nitime, I tried
>> out to write the function as a class instead, so that any desired (
>> intermediate) calculations are only made on demand, but once they are
>> calculated they are attached to the class as attributes or properties.
>> This seems to be a useful "pattern".
>> Are there any opinion for using the pattern in scipy.stats ? MovStats
>> will currently go into statsmodels
>> Below is the class (with cutting part of init), a full script is the
>> attachment, including examples that test the class.
>> about MovStats:
>> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
>> axis=1, should (but may not yet) work for nd arrays along any axis
>> (signal.correlate docstring)
>> nans are handled by dropping the corresponding observations from the
>> window, not adding any additional observations,
>> not tested if a window is empty because it contains only nans, nor if
>> variance is zero
>> (kern is intended for weighted statistics in the window but not tested
>> yet, I still need to decide on normalization requirements)
>> requires scipy.signal, all calculations done with signal.correlate, no loops
>> as often, functions are one-liners
>> all results are returned for valid observations only, initial
>> observations with incomplete window are cut
>> bonus: slope of moving regression of y on x, since it was trivial to add
>> still some cleaning and documentation to do
> Can you add support for MaskedArrays ?
> The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed.

Since only __init__ is affected this should be quite easy, I only need
the mask for the calculation of the number of  non-nan elements in a
window, and to fill the data array with zeros. I haven't thought about
different numeric types, I guess I should make sure that also for the
non-ma arrays the calculations are done with floats.

> You can also check what Matt did w/ scikits.timeseries.
The way of calculating this, I initially got from scikits.timeseries
autocovariance, your moving_funcs are mostly in c,
cmov_window uses np.convolve which is only for 1d and needs to loop.
The advantage of scipy.signal over numpy is that it does nd
 I will look at the mask handling in time series again.

I always get mixed up with convolve versus correlate. Is there a
standard sorting for time series, up to down or left to right by
increasing time or reversed? I have to check this for non-flat window

> About your suggestion: I'd leave it in statsmodels for now...
movstat goes into statsmodels.sandbox.tsa which is my playground for
time series analysis

for scipy.stats I was thinking more of existing or other functions,
e.g. my version of groupstats, (mean, variance, demean, ... by groups)
would follow the same pattern of partly expensive calculations on


> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-User mailing list