[SciPy-dev] Fixing correlate: handling api breakage ?

David Cournapeau david@ar.media.kyoto-u.ac...
Sun May 24 06:38:11 CDT 2009

josef.pktd@gmail.com wrote:
> On Sun, May 24, 2009 at 6:16 AM, David Cournapeau
> <david@ar.media.kyoto-u.ac.jp> wrote:
>> Hi,
>>    I have taken a look at the correlate function in scipy.signal. There
>> are several problems with it. First, it is wrong on several accounts:
>>       - It assumes that the correlation of complex numbers corresponds
>> to complex multiplication, but this is not the definition followed by
>> most textbooks, at least as far as signal processing is concerned.
>>       - More significantly, it is wrong with respect to the ordering:
>> it assumes that correlate(a, b) == correlate(b, a), which is not true in
>> general.
> I don't see this in the results. There was recently the report on the
> mailing list that np.correlate
> and signal.correlate switch arrays if the second array is longer.
>>>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0])
> array([0, 0, 1, 2, 0, 0, 0, 0, 0])
>>>> signal.correlate([0, 0, 1, 0, 0],[1, 2, 0, 0, 0] )
> array([0, 0, 0, 0, 0, 2, 1, 0, 0])

Well, you just happened to have very peculiar entries :)

signal.correlate([-1, -2, -3], [1, 2, 3])
-> array([ -3,  -8, -14,  -8,  -3])

signal.correlate([1, 2, 3], [-1, -2, -3])
-> array([ -3,  -8, -14,  -8,  -3])

> I found it a bit confusing, that every package needs it's own correlate function
> numpy, signal, ndimage, stsci.

Yes, it is. Ndimage and stsci are geared toward particular usages,
signal and numpy correlate are the general functions (or maybe I am too
biased toward signal processing - let's say signal.correlate is the
matlab correlate).

numpy.correlate only handles 1d arrays (what matlab xcorr does), whereas
signal's one handle arbitrary dimension (what matlab image toolbox
imfilter does). It is already quite hard to handle arbitrary dimension,
but doing so in a fast way is very difficult, so this justifies at least
both implementations (i.e. replacing numpy.correlate with scipy.signal
one is not feasible IMHO, at least not with the current implementation).

ndimage's one is much more complete, by handling many boundaries
conditions (zero and constant padding, mirroring and repeating). This
makes things even slower without specialized codepaths.

I find it annoying that numpy.correlate and scipy.signal.correlate do
not use the same defaults. Ideally, I would like that for every
supported input x/y, np.correlate(x, y) == scipy.signal.correlate(x, y)
(precision problems aside), but I guess this won't change.

I should also note that my replacement for scipy correlate (not
committed yet) is based on a new "neighborhood iterator" I have created
for that purpose, so the code is much easier to follow I believe
(without being much slower than the current code). This should enables
implementation of faster codepaths for common cases in
scipy.signal.correlate much easier (scipy.signal.correlate is often
unusable for very large arrays because it is either too memory hungry or
too slow - we should care about the cases where fft-based convolution is
not usable, though, as fft is almost always the way to go for large
arrays of comparable size).



More information about the Scipy-dev mailing list