[SciPy-User] Sum duplicate dates in a series

Pierre GM pgmdevlist@gmail....
Fri Jan 29 12:42:42 CST 2010


On Jan 29, 2010, at 1:16 PM, Robert Ferrell wrote:
> How can I sum data for duplicate dates in a time series?  I can do it  
> with a loop, but I wonder if there is some tricky magic I might use.
> 
> For instance, I've got a series:
> 
>> In [1597]: s
>> Out[1597]:
>> timeseries([ 10.  11.   1.   2.   3.],
>>   dates = [12-Jan-2010 12-Jan-2010 22-Jan-2010 22-Jan-2010 22- 
>> Jan-2010],
>>   freq  = D)
>> 
> 
> and I'd like to sum the Jan 12 data together, and the Jan 22 data  
> together, and return a new series with just two dates.
> 
>> timeseries([ 21.   6.],
>>   dates = [12-Jan-2010 22-Jan-2010],
>>   freq  = D)
>> 
> 
> Is there an easy way?


Unfortunately, not that easy.
You can use ts.find_duplicated_dates to get a dictionary (duplicated dates, indices in the series).
From there, you can easily get a dictionary (dates, sum of the series for those dates).

>>>  s = ts.time_series([1,2,3,4,5],dates=ts.date_array(["2001-01","2001-01","2001-02","2001-03","2001-03"],freq="M"))
>>> summed = dict((k,s._series[v].sum()) for (k,v) in ts.find_duplicated_dates(s).items())

You can then reinject summed into a new series
>>> dropped = ts.remove_duplicated_dates(s)
>>> import operator
>>> [operator.setitem(dropped,k,v) for (k,v) in summed.items()]

Thinking about it, we could probably overload ts.remove_duplicated_dates to accept a func argument that tells how to deal with those missing dates... You mind opening a ticket ?




More information about the SciPy-User mailing list