[SciPy-User] Averaging over unevenly-spaced records

Camilo Polymeris cpolymeris@gmail....
Fri Oct 14 12:24:30 CDT 2011


Hello all,

I am pretty new to numpy (and numerical software packages in general),
so this may be a basic question, but I would appreciate any help.

Say I have a recarray like the following:

    r = array([
...
       (datetime.datetime(2011, 3, 30, 16, 1, 15, 911000), 1.39, 18),
       (datetime.datetime(2011, 3, 30, 16, 1, 16, 181000), 1.34, 22),
       (datetime.datetime(2011, 3, 30, 16, 1, 16, 630000), 1.37, 19),
       (datetime.datetime(2011, 3, 30, 16, 1, 16, 922000), 1.34, 19),
       (datetime.datetime(2011, 3, 30, 16, 1, 17, 324000), 1.33, 19),
...
      dtype=[('datetime', '|O8'), ('A', '<f8'), ('B', '<i8')])

I would like to, for every whole second, e.g. datetime(2011, 3, 30,
16, 1, 16), get the average of column A and the sum of column B, like
this:

r1 = array([
...
       [1.35, 60],  # for second datetime(2011, 3, 30, 16, 1, 16)
...
       ])

As you can see, the datetimes are not homogeneously spaced. There can
be any number of data point in one second (even zero -- then I would
just keep the last value or 0 or NaN, whichever is easier). I have in
the order of 10^8 to 10^9 records.
I think it can be done with reduceat, but I would have to manually
find the indices, which I don't think is the numpythonicest way to do
this. Another option is to use griddata to interpolate the values at
e.g. 1ms, to have evenly data & and then use evenly spaced indices --
more elegant, but seems inefficient. Any suggestions?

Thanks & best regards,

Camilo


More information about the SciPy-User mailing list