[SciPy-dev] Fixing correlate: handling api breakage ?
David Cournapeau
david@ar.media.kyoto-u.ac...
Sun May 24 07:26:34 CDT 2009
josef.pktd@gmail.com wrote:
> On Sun, May 24, 2009 at 7:38 AM, David Cournapeau
> <david@ar.media.kyoto-u.ac.jp> wrote:
>
>> josef.pktd@gmail.com wrote:
>>
>>> On Sun, May 24, 2009 at 6:16 AM, David Cournapeau
>>> <david@ar.media.kyoto-u.ac.jp> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have taken a look at the correlate function in scipy.signal. There
>>>> are several problems with it. First, it is wrong on several accounts:
>>>> - It assumes that the correlation of complex numbers corresponds
>>>> to complex multiplication, but this is not the definition followed by
>>>> most textbooks, at least as far as signal processing is concerned.
>>>> - More significantly, it is wrong with respect to the ordering:
>>>> it assumes that correlate(a, b) == correlate(b, a), which is not true in
>>>> general.
>>>>
>>>>
>>> I don't see this in the results. There was recently the report on the
>>> mailing list that np.correlate
>>> and signal.correlate switch arrays if the second array is longer.
>>>
>>>
>>>
>>>>>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0])
>>>>>>
>>>>>>
>>> array([0, 0, 1, 2, 0, 0, 0, 0, 0])
>>>
>>>
>>>>>> signal.correlate([0, 0, 1, 0, 0],[1, 2, 0, 0, 0] )
>>>>>>
>>>>>>
>>> array([0, 0, 0, 0, 0, 2, 1, 0, 0])
>>>
>>>
>> Well, you just happened to have very peculiar entries :)
>>
>> signal.correlate([-1, -2, -3], [1, 2, 3])
>> -> array([ -3, -8, -14, -8, -3])
>>
>> signal.correlate([1, 2, 3], [-1, -2, -3])
>> -> array([ -3, -8, -14, -8, -3])
>>
>
> One of your arrays is just the negative of the other, and correlate is
> the same in this case. For other cases, the results differ
>
Gr, you're right of course :) But it still fails for arrays where the
second argument has any dimension which is larger than the first one. I
don't know if the assumption is relied on in the C implementation (the
arrays are inverted in the C code in that case - even though they seem
to be already inverted in python).
> I looked at it only for examples to calculate auto-correlation and
> cross-correlation in time series, and had to try out to see which
> version works best.
>
Yes, it depends. I know that correlate is way too slow for my own usage
in speech processing, for example for linear prediction coding. As only
a few lags are necessary, direct implementation is often faster than FFT
one - I have my own straightfoward autocorrelation in scikits.talkbox; I
believe matlab xcorr (1d correlation) always uses the FFT.
I know other people have problems with the scipy.signal correlate as
well for large arrays (the C code does a copy if the inputs are not
contiguous, for example - my own code using iterators should not need
any copy of inputs).
> Are the convolve in all cases compatible, identical (through
> delegation) to correlate?
>
I don't think so - I don't think convolution uses the conjugate for
complex values. But I don't know any use of complex convolution,
although I am sure there is. Correlation is always defined with the
complex conjugate of the second argument AFAIK. For real cases,
convolutions should always be implementable as correlation, at least
when considering 0 padding for boundaries. There may be problem for
huge arrays, though - doing convolution from correlation without using
copies while staying fast may not always be easy, but I have never tried
to do so.
cheers,
David
More information about the Scipy-dev
mailing list