[Numpy-discussion] numpy.correlate with phase offset 1D data series

Anne Archibald peridot.faceted@gmail....
Mon Mar 3 18:02:06 CST 2008

On 03/03/2008, Ray Schumacher <subscriber100@rjs.org> wrote:

>  Xie's 2D algorithm reduced to 1D works nicely for computing the
>  relative phase, but is it the fastest way? It might be, since some
>  correlation algorithms use FFTs as well. What does _correlateND use, in scipy?

Which way will be the fastest really depends what you want to do.
Algorithmically, the direct way numpy.correlate operates is O(NM), and
the way FFT-based algorithms operate is (roughly) O((N+M)log(N+M)) (or
for a more sophisticated algorithm O(N log M) where M is less than N).
In practice what this means is that when one or both of the things
you're correlating is short (tens of samples or so), you should use a
direct method; when one or both are long you should use an FFT-based
method. (There are other approaches too, but I don't know of any in
wide use.)

In your case it sounds like you have two signals of equal fairly large
length to compare. Some questions remain, though:

* What do you want to happen at the endpoints? Without padding, only a
small interval (the difference in lengths plus one) is valid.
Zero-padding works, but guarantees a fall-off at the ends. Circular
correlation is easy to implement but not appropriate most of the time.

* Do you care about sub-sample alignment? How much accuracy do you
really need? Direct methods really can't give you this information.
With Fourier methods, you can easily pad the spectrum with zeros and
inverse FFT, giving you a beautifully-interpolated signal. If you want
more accuracy, a quadratic fit to the three points around the peak of
the interpolated signal will get you very close. If you need more
accuracy, you can use numerical maximization, evaluating each point as
sum(a_k exp(2 pi i k x)).

The other common application is to have a template (that presumably
falls to zero at its endpoint) and to want to compute a running
correlation against a stream of data. This too can be done both ways,
depending on the size of the template; all that is needed is to think
carefully about overlaps.


More information about the Numpy-discussion mailing list