[SciPy-User] Averaging over unevenly-spaced records

Wes McKinney wesmckinn@gmail....
Sun Oct 16 18:23:19 CDT 2011


On Sun, Oct 16, 2011 at 6:24 PM, Camilo Polymeris <cpolymeris@gmail.com> wrote:
> On Sun, Oct 16, 2011 at 7:18 PM, nicky van foreest <vanforeest@gmail.com> wrote:
>> Hi,
>>
>> Have you perhaps considered using itertools.groupby? Like this you can
>> group elements by datetime at second accuracy (use a key function that
>> strips all subsecond accuracy from your datetime objects). Then just
>> sum over your colums A and B, and divide for the average by the lenght
>> of the group.
>
> That seems like a better approach. Thanks!
>
> Camilo
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

You should take a look at my project, pandas
(http://pandas.sourceforge.net/groupby.html). It had a lot richer
built-in functionality for this kind of stuff than anything else in
the scientific Python ecosystem.

Assuming your timestamps are unique, you need only do:

def normalize(dt):
    return dt.replace(microsecond=0)
data.groupby(normalize).agg({'A' : np.mean, 'B' : np.sum})

and that will give you exactly what you want

here data is a pandas DataFrame object. To get your record array into
the right format, do:

data = DataFrame.from_records(r, index='datetime')

that will turn the datetimes into the index (row labels) of the DataFrame.

However-- if the datetimes are not unique, all is not lost. Don't set
the DataFrame index and do instead:

data = DataFrame(r)
grouper = data['datetime'].map(normalize)
data.groupby(grouper).agg({'A' : np.mean, 'B' : np.sum})

I think you'll find this a lot more palatable than a DIY approach
using itertools.

- Wes


More information about the SciPy-User mailing list