[Numpy-discussion] another discussion on numpy correlate (and convolution)
Todd
toddrjen@gmail....
Fri Feb 22 11:01:39 CST 2013
We don't actually want remove sensitive data, but this tutorial should
still allow us to remove a file totally and completely from git history. It
doesn't look that hard:
https://help.github.com/articles/remove-sensitive-data
It will require everyone to rebase, so if you want to do this it may be a
good idea to schedule it for a specific day maybe 2-4 weeks in the future
and warn people on the mailing list so nobody tries to commit anything
while the process is underway.
On Feb 22, 2013 5:40 PM, "Matthew Brett" <matthew.brett@gmail.com> wrote:
> Hi,
>
> On Thu, Feb 21, 2013 at 10:00 AM, Pierre Haessig
> <pierre.haessig@crans.org> wrote:
> > Hi everybody,
> >
> > (just coming from a discussion on the performance of Matplotlib's
> > (x)corr function which uses np.correlate)
> >
> > There have been already many discussions on how to compute
> > (cross-)correlations of time-series in Python (like
> >
> http://stackoverflow.com/questions/6991471/computing-cross-correlation-function
> ).
> > The discussion is spread between the various stakeholders (just to name
> > some I've in mind : scipy, statsmodel, matplotlib, ...).
> >
> > There are basically 2 implementations : time-domain and frequency-domain
> > (using fft + multiplication). My discussion is only on time-domain
> > implementation. The key usecase which I feel is not well adressed today
> > is when computing the (cross)correlation of two long timeseries on only
> > a few lagpoints.
> >
> > For time-domain, one can either write its own implementation or rely on
> > numpy.correlate. The latter rely on the fast C implementation
> > _pyarray_correlate() in multiarraymodule.c
> > (
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177
> )
> >
> > Now, the signature of this function is meant to accept one of three
> > *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
> > of sense when using this function in the context of convoluting two
> > signals (say an "input array" and a "impulse response array"). In the
> > context of computing the (cross)correlation of two long timeseries on
> > only a few lagpoints, those computation modes are unadapted and
> > potentially leads to huge computational and memory wastes
> > (for example :
> >
> https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L4332
> ).
> >
> >
> >
> > For some time, I was thinking the solution was to write a dedicated
> > function in one of the stakeholder modules (but which ? statsmodel ?)
> > but now I came to think numpy is the best place to put a change and this
> > would quickly benefit to all stackeholders downstream.
> >
> > Indeed, I looked more carefully at the C _pyarray_correlate() function,
> > and I've come to the conclusion that these three computation modes are
> > an unnecessary restriction because the actual computation relies on a
> > triple for loop
> > (
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177
> )
> > which boundaries are governed by two integers `n_left` and `n_right`
> > instead of the three modes.
> >
> > Maybe I've misunderstood the inner-working of this function, but I've
> > the feeling that the ability to set these two integers directly instead
> > of just the mode would open up the possibility to compute the
> > correlation on only a few lagpoints.
> >
> > I'm fully aware that changing the signature of such a core numpy is
> > out-of-question but I've got the feeling that a reasonably small some
> > code refactor might lift the longgoing problem of (cross-)correlation
> > computation. The Python np.correlate would require two additional
> > keywords -`n_left` and `n_right`)which would override the `mode`
> > keyword. Only the C function would need some more care to keep good
> > backward compatibility in case it's used externally.
>
> >From complete ignorance, do you think it is an option to allow a
> (n_left, n_right) tuple as a value for 'mode'?
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130222/d9b9ce90/attachment.html
More information about the NumPy-Discussion
mailing list