[SciPy-User] stats, classes instead of functions for results MovStats

josef.pktd@gmai... josef.pktd@gmai...
Mon Nov 23 01:13:28 CST 2009


On Mon, Nov 23, 2009 at 1:39 AM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Nov 23, 2009, at 12:43 AM, josef.pktd@gmail.com wrote:
>> Following up on a question by Keith on the numpy list and his reminder
>> that covariance can be calculated by the cross-product minus the
>> product of the means, I redid and
>> enhanced my moving stats functions.
>>
>> Suppose x and y are two time series, then the moving correlation
>> requires the calculation of the mean, variance and covariance for each
>> window. Currently in scipy stats intermediate results are usually
>> thrown away on return (while rpy/R returns all intermediate results
>> used for the calculation.
>>
>> Using a decorator/descriptor of Fernando written for nitime, I tried
>> out to write the function as a class instead, so that any desired (
>> intermediate) calculations are only made on demand, but once they are
>> calculated they are attached to the class as attributes or properties.
>> This seems to be a useful "pattern".
>>
>> Are there any opinion for using the pattern in scipy.stats ? MovStats
>> will currently go into statsmodels
>>
>> Below is the class (with cutting part of init), a full script is the
>> attachment, including examples that test the class.
>>
>> about MovStats:
>> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with
>> axis=1, should (but may not yet) work for nd arrays along any axis
>> (signal.correlate docstring)
>> nans are handled by dropping the corresponding observations from the
>> window, not adding any additional observations,
>> not tested if a window is empty because it contains only nans, nor if
>> variance is zero
>> (kern is intended for weighted statistics in the window but not tested
>> yet, I still need to decide on normalization requirements)
>> requires scipy.signal, all calculations done with signal.correlate, no loops
>> as often, functions are one-liners
>> all results are returned for valid observations only, initial
>> observations with incomplete window are cut
>> bonus: slope of moving regression of y on x, since it was trivial to add
>> still some cleaning and documentation to do
>
>
> Can you add support for MaskedArrays ?
> The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed.

Since only __init__ is affected this should be quite easy, I only need
the mask for the calculation of the number of  non-nan elements in a
window, and to fill the data array with zeros. I haven't thought about
different numeric types, I guess I should make sure that also for the
non-ma arrays the calculations are done with floats.

> You can also check what Matt did w/ scikits.timeseries.
The way of calculating this, I initially got from scikits.timeseries
autocovariance, your moving_funcs are mostly in c,
cmov_window uses np.convolve which is only for 1d and needs to loop.
The advantage of scipy.signal over numpy is that it does nd
convolution.
 I will look at the mask handling in time series again.

I always get mixed up with convolve versus correlate. Is there a
standard sorting for time series, up to down or left to right by
increasing time or reversed? I have to check this for non-flat window
weights/kernels.


> About your suggestion: I'd leave it in statsmodels for now...
movstat goes into statsmodels.sandbox.tsa which is my playground for
time series analysis

for scipy.stats I was thinking more of existing or other functions,
e.g. my version of groupstats, (mean, variance, demean, ... by groups)
would follow the same pattern of partly expensive calculations on
demand.

Josef


> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list