[Numpy-discussion] Managing Rolling Data

Anne Archibald peridot.faceted@gmail....
Wed Feb 21 12:16:32 CST 2007

On 21/02/07, Alexander Michael <lxander.m@gmail.com> wrote:
> I'm new to numpy and looking for advice on setting up and managing
> array data for my particular problem. I'm collecting observations of P
> properties for N objects over a rolling horizon of H sample times. I
> could conceptually store the data in three-dimensional array with
> shape (N,P,H) that would allow me to easily (and efficiently with
> strided slices) compute the statistics over both N and H that I am
> interested in. This is great, but the rub is that H, an interval of T,
>  is a rolling horizon. T is to large to fit in memory, so I need to
> load up H, perform my calculations, pop the oldest N x P slice and
> push the newest N x P slice into the data cube. What's the best way to
> do this that will maintain fast computations along the one-dimensional
> slices over N and H? Is there a commonly accepted idiom?
> Fundamentally, I see two solutions. The first would be to essentially
> perform a memcpy to propagate the data. The second would be to manage
> the N x P slices as H discontiguous memory blocks and merely reorder
> the pointers with each new sample. Can I do either of these with
> numpy?

Yes, and several other possibilities as well.

To do a memcpy, all you need is
buffer[...,:-1] = buffer[...,1:]
buffer[...,-1] = new_data()

Discontiguous blocks are somewhat inconvenient; one of the key
assumptions of numpy is that memory is stored in contiguous,
homogeneous blocks. You can use python lists (which are lists of
pointers), though:
listofbuffers = listofbuffers[:-1]+(new_data(),)
Extracting slices is now more awkward, either
somecomputation([buffer[13,5] for buffer in listofbuffers])
or convert the list to an array (which involves copying all the elements):
buffer = array(listofbuffers)

Alternatively, if most of your statistics don't care about the order
of the data, you could maintain a rolling buffer:
buffer[...,oldest] = new_data()
oldest += 1
oldest %= H
(this last copies the data for that one test.) Fancy indexing can also
be used here to pull out the elements in the right order, and if your
statistics can be easily rewritten to work on a wrapped array, this is
probably the most efficient.

Incidentally, if the array wants to be inhomogeneous along one
dimension, you can use recarrays (apparently; I've never investigated

Good luck,

More information about the Numpy-discussion mailing list