[SciPy-User] Sum duplicate dates in a series
Fri Jan 29 12:42:42 CST 2010
On Jan 29, 2010, at 1:16 PM, Robert Ferrell wrote:
> How can I sum data for duplicate dates in a time series? I can do it
> with a loop, but I wonder if there is some tricky magic I might use.
> For instance, I've got a series:
>> In : s
>> timeseries([ 10. 11. 1. 2. 3.],
>> dates = [12-Jan-2010 12-Jan-2010 22-Jan-2010 22-Jan-2010 22-
>> freq = D)
> and I'd like to sum the Jan 12 data together, and the Jan 22 data
> together, and return a new series with just two dates.
>> timeseries([ 21. 6.],
>> dates = [12-Jan-2010 22-Jan-2010],
>> freq = D)
> Is there an easy way?
Unfortunately, not that easy.
You can use ts.find_duplicated_dates to get a dictionary (duplicated dates, indices in the series).
From there, you can easily get a dictionary (dates, sum of the series for those dates).
>>> s = ts.time_series([1,2,3,4,5],dates=ts.date_array(["2001-01","2001-01","2001-02","2001-03","2001-03"],freq="M"))
>>> summed = dict((k,s._series[v].sum()) for (k,v) in ts.find_duplicated_dates(s).items())
You can then reinject summed into a new series
>>> dropped = ts.remove_duplicated_dates(s)
>>> import operator
>>> [operator.setitem(dropped,k,v) for (k,v) in summed.items()]
Thinking about it, we could probably overload ts.remove_duplicated_dates to accept a func argument that tells how to deal with those missing dates... You mind opening a ticket ?
More information about the SciPy-User