[SciPy-dev] Fixing correlate: handling api breakage ?

josef.pktd@gmai... josef.pktd@gmai...
Sun May 24 07:24:12 CDT 2009


On Sun, May 24, 2009 at 7:38 AM, David Cournapeau
<david@ar.media.kyoto-u.ac.jp> wrote:
> josef.pktd@gmail.com wrote:
>> On Sun, May 24, 2009 at 6:16 AM, David Cournapeau
>> <david@ar.media.kyoto-u.ac.jp> wrote:
>>
>>> Hi,
>>>
>>>    I have taken a look at the correlate function in scipy.signal. There
>>> are several problems with it. First, it is wrong on several accounts:
>>>       - It assumes that the correlation of complex numbers corresponds
>>> to complex multiplication, but this is not the definition followed by
>>> most textbooks, at least as far as signal processing is concerned.
>>>       - More significantly, it is wrong with respect to the ordering:
>>> it assumes that correlate(a, b) == correlate(b, a), which is not true in
>>> general.
>>>
>>
>> I don't see this in the results. There was recently the report on the
>> mailing list that np.correlate
>> and signal.correlate switch arrays if the second array is longer.
>>
>>
>>>>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0])
>>>>>
>> array([0, 0, 1, 2, 0, 0, 0, 0, 0])
>>
>>>>> signal.correlate([0, 0, 1, 0, 0],[1, 2, 0, 0, 0] )
>>>>>
>> array([0, 0, 0, 0, 0, 2, 1, 0, 0])
>>
>
> Well, you just happened to have very peculiar entries :)
>
> signal.correlate([-1, -2, -3], [1, 2, 3])
> -> array([ -3,  -8, -14,  -8,  -3])
>
> signal.correlate([1, 2, 3], [-1, -2, -3])
> -> array([ -3,  -8, -14,  -8,  -3])

One of your arrays is just the negative of the other, and correlate is
the same in this case. For other cases, the results differ:

>>> signal.correlate([-1, -2, -4], [1, 2, 3])
array([ -3,  -8, -17, -10,  -4])
>>> signal.correlate([1, 2, 3],[-1, -2, -4])
array([ -4, -10, -17,  -8,  -3])

>>> signal.correlate([-1, -2, 3], [1, 2, 3])
array([-3, -8,  4,  4,  3])
>>> signal.correlate([1, 2, 3], [-1, -2, 3])
array([ 3,  4,  4, -8, -3])

>
>> I found it a bit confusing, that every package needs it's own correlate function
>> numpy, signal, ndimage, stsci.
>>
>
> Yes, it is. Ndimage and stsci are geared toward particular usages,
> signal and numpy correlate are the general functions (or maybe I am too
> biased toward signal processing - let's say signal.correlate is the
> matlab correlate).
>
> numpy.correlate only handles 1d arrays (what matlab xcorr does), whereas
> signal's one handle arbitrary dimension (what matlab image toolbox
> imfilter does). It is already quite hard to handle arbitrary dimension,
> but doing so in a fast way is very difficult, so this justifies at least
> both implementations (i.e. replacing numpy.correlate with scipy.signal
> one is not feasible IMHO, at least not with the current implementation).
>
> ndimage's one is much more complete, by handling many boundaries
> conditions (zero and constant padding, mirroring and repeating). This
> makes things even slower without specialized codepaths.
>
> I find it annoying that numpy.correlate and scipy.signal.correlate do
> not use the same defaults. Ideally, I would like that for every
> supported input x/y, np.correlate(x, y) == scipy.signal.correlate(x, y)
> (precision problems aside), but I guess this won't change.
>
> I should also note that my replacement for scipy correlate (not
> committed yet) is based on a new "neighborhood iterator" I have created
> for that purpose, so the code is much easier to follow I believe
> (without being much slower than the current code). This should enables
> implementation of faster codepaths for common cases in
> scipy.signal.correlate much easier (scipy.signal.correlate is often
> unusable for very large arrays because it is either too memory hungry or
> too slow - we should care about the cases where fft-based convolution is
> not usable, though, as fft is almost always the way to go for large
> arrays of comparable size).
>

I looked at it only for examples to calculate auto-correlation and
cross-correlation in time series, and had to try out to see which
version works best.

Having a comparison, as you gave, is very helpful in choosing between
the versions, at least when not working directly with signals or
ndimages, i.e. the data the package is designed for.

Are the convolve in all cases compatible, identical (through
delegation) to correlate?

Thanks for the info
Josef


More information about the Scipy-dev mailing list