[Numpy-discussion] another discussion on numpy correlate (and convolution)
Fri Feb 22 11:01:39 CST 2013
We don't actually want remove sensitive data, but this tutorial should
still allow us to remove a file totally and completely from git history. It
doesn't look that hard:
It will require everyone to rebase, so if you want to do this it may be a
good idea to schedule it for a specific day maybe 2-4 weeks in the future
and warn people on the mailing list so nobody tries to commit anything
while the process is underway.
On Feb 22, 2013 5:40 PM, "Matthew Brett" <firstname.lastname@example.org> wrote:
> On Thu, Feb 21, 2013 at 10:00 AM, Pierre Haessig
> <email@example.com> wrote:
> > Hi everybody,
> > (just coming from a discussion on the performance of Matplotlib's
> > (x)corr function which uses np.correlate)
> > There have been already many discussions on how to compute
> > (cross-)correlations of time-series in Python (like
> > The discussion is spread between the various stakeholders (just to name
> > some I've in mind : scipy, statsmodel, matplotlib, ...).
> > There are basically 2 implementations : time-domain and frequency-domain
> > (using fft + multiplication). My discussion is only on time-domain
> > implementation. The key usecase which I feel is not well adressed today
> > is when computing the (cross)correlation of two long timeseries on only
> > a few lagpoints.
> > For time-domain, one can either write its own implementation or rely on
> > numpy.correlate. The latter rely on the fast C implementation
> > _pyarray_correlate() in multiarraymodule.c
> > (
> > Now, the signature of this function is meant to accept one of three
> > *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
> > of sense when using this function in the context of convoluting two
> > signals (say an "input array" and a "impulse response array"). In the
> > context of computing the (cross)correlation of two long timeseries on
> > only a few lagpoints, those computation modes are unadapted and
> > potentially leads to huge computational and memory wastes
> > (for example :
> > For some time, I was thinking the solution was to write a dedicated
> > function in one of the stakeholder modules (but which ? statsmodel ?)
> > but now I came to think numpy is the best place to put a change and this
> > would quickly benefit to all stackeholders downstream.
> > Indeed, I looked more carefully at the C _pyarray_correlate() function,
> > and I've come to the conclusion that these three computation modes are
> > an unnecessary restriction because the actual computation relies on a
> > triple for loop
> > (
> > which boundaries are governed by two integers `n_left` and `n_right`
> > instead of the three modes.
> > Maybe I've misunderstood the inner-working of this function, but I've
> > the feeling that the ability to set these two integers directly instead
> > of just the mode would open up the possibility to compute the
> > correlation on only a few lagpoints.
> > I'm fully aware that changing the signature of such a core numpy is
> > out-of-question but I've got the feeling that a reasonably small some
> > code refactor might lift the longgoing problem of (cross-)correlation
> > computation. The Python np.correlate would require two additional
> > keywords -`n_left` and `n_right`)which would override the `mode`
> > keyword. Only the C function would need some more care to keep good
> > backward compatibility in case it's used externally.
> >From complete ignorance, do you think it is an option to allow a
> (n_left, n_right) tuple as a value for 'mode'?
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion