Pierre GM pgmdevlist@gmail....
Wed Jun 23 13:49:22 CDT 2010

```On Jun 23, 2010, at 2:35 PM, Pierre GM wrote:

>
> On Jun 23, 2010, at 11:45 AM, Andreas wrote:
>
>> Thanks a lot for your input!
>>
>>> You could try to fill your missing values beforehand, w/ functions like
>>> backward_fill and forward_fill, then passing your series to mov_average.
>>
>> Well, that's not really what I want. By doing what you suggest, I make the
>> assumption that the value actually changed on the day for which I have the
>> measurement. But each measurement is only one single point in time, so I
>> do not want to make this assumption.
>
> Ah OK. Makes sense.
>
>> Basically, I'm looking for a simple and efficient way to do something like
>> this::
>>
>>  w = 11 # the window size
>>  s = (w-1)*.5
>>  for d in data.dates:
>>    newdata[d] = data[d-s:d+s+1].mean()
>
> Ah OK. Note that you should use cmov_mean, then...
> Well, several possibilities:
>
> * Make sure you don't have missing dates (use fill_missing_dates), then construct a list of slices and apply .mean() on the .series (so that you don't use __getitem__ on the whole series, only on the masked data part, saves some time).
>
> * Use some tricks:
> - the moving_funcs functions don't need timeseries as inputs, masked arrays are just fine
> - compute cmov_mean on the data part (filled w/ 0)
> - compute cmov_mean on the opposite of the mask (viz, np.logical_not(x.mask)
> - divide the first by the second.

Here, let's have an example:
"""
import numpy as np
import scikits.timeseries as ts
import scikits.timeseries.lib.moving_funcs as mov

size=50
x = ts.time_series(np.arange(size, dtype=float),
dates=ts.date_array(ts.Date('D', "2001-01-01"), length=size*3)[::3])
xx = x.fill_missing_dates()

zdata = mov.mov_sum(xx.filled(0), 20).data

print xx[1:22].mean()

zdata = mov.cmov_mean(xx.filled(0), 20).data

print xx[11:32].mean()
"""

When dealing w/ masked arrays, or series of missing dates, it's important to understand how things actually work. ".mean" on a masked array calls ".sum" on the ".data" part then ".count" on the ".mask" part. When dealing w/ a time series, it's usually more efficient to process the .data, the .mask and the .dates separately.

So, in your problem, we're computing the centered mean on the data first (viz, the sum divided by the span), then on the (opposite of the) mask, and recompute the result.

Note that cmov_ actually calls scipy.convolve, not our own C code like the mov_ functions...

```