[SciPy-User] Sum duplicate dates in a series

josef.pktd@gmai... josef.pktd@gmai...
Fri Jan 29 13:36:42 CST 2010


On Fri, Jan 29, 2010 at 2:13 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Jan 29, 2010, at 2:00 PM, John Hunter wrote:
>> On Fri, Jan 29, 2010 at 12:42 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>>> On Jan 29, 2010, at 1:16 PM, Robert Ferrell wrote:
>>>> How can I sum data for duplicate dates in a time series?  I can do it
>>>> with a loop, but I wonder if there is some tricky magic I might use.
>>
>> If you can put your data in a record array, you can use
>> matplotlib.mlab.rec_groupby
>>
>> http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.rec_groupby
>>
>> http://matplotlib.sourceforge.net/examples/misc/rec_groupby_demo.html
>
> John,
> Could you have a look into numpy.lib.recfunctions ? That's an attempt to homogenize what you did for matplotlib, and it'd be great if you could help.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

I just wanted to show that there will be some advantages when it is
possible to easily move between packages

>>> import scikits.timeseries as ts
>>> import la
>>> s = ts.time_series([1,2,3,4,5],dates=ts.date_array(["2001-01","2001-01","2001-02","2001-03","2001-03"],freq="M"))
>>> dta = la.larry(s.data, label=[range(len(s.data))])
>>> dat = la.larry(s.dates.tolist(), label=[range(len(s.data))])
>>> s2 = ts.time_series(dta.group_mean(dat).x,dates=ts.date_array(dat.x,freq="M"))
>>> s
timeseries([1 2 3 4 5],
   dates = [Jan-2001 Jan-2001 Feb-2001 Mar-2001 Mar-2001],
   freq  = M)

>>> s2
timeseries([ 1.5  1.5  3.   4.5  4.5],
   dates = [Jan-2001 Jan-2001 Feb-2001 Mar-2001 Mar-2001],
   freq  = M)

>>> s2u = ts.remove_duplicated_dates(s2)
>>> s2u
timeseries([ 1.5  3.   4.5],
   dates = [Jan-2001 ... Mar-2001],
   freq  = M)

>>> s2u.dates
DateArray([Jan-2001, Feb-2001, Mar-2001],
          freq='M')

It's not so easy yet. But it would be nice if we can use timeseries,
pandas and la for different things depending on the more convenient
representation.

Josef


More information about the SciPy-User mailing list