[SciPy-dev] Fixing correlate: handling api breakage ?

josef.pktd@gmai... josef.pktd@gmai...
Sun May 24 08:46:08 CDT 2009


On Sun, May 24, 2009 at 8:26 AM, David Cournapeau
<david@ar.media.kyoto-u.ac.jp> wrote:
> josef.pktd@gmail.com wrote:
>> On Sun, May 24, 2009 at 7:38 AM, David Cournapeau
>> <david@ar.media.kyoto-u.ac.jp> wrote:
>>
>>> josef.pktd@gmail.com wrote:
>>>
>>>> On Sun, May 24, 2009 at 6:16 AM, David Cournapeau
>>>> <david@ar.media.kyoto-u.ac.jp> wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>>    I have taken a look at the correlate function in scipy.signal. There
>>>>> are several problems with it. First, it is wrong on several accounts:
>>>>>       - It assumes that the correlation of complex numbers corresponds
>>>>> to complex multiplication, but this is not the definition followed by
>>>>> most textbooks, at least as far as signal processing is concerned.
>>>>>       - More significantly, it is wrong with respect to the ordering:
>>>>> it assumes that correlate(a, b) == correlate(b, a), which is not true in
>>>>> general.
>>>>>
>>>>>
>>>> I don't see this in the results. There was recently the report on the
>>>> mailing list that np.correlate
>>>> and signal.correlate switch arrays if the second array is longer.
>>>>
>>>>
>>>>
>>>>>>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0])
>>>>>>>
>>>>>>>
>>>> array([0, 0, 1, 2, 0, 0, 0, 0, 0])
>>>>
>>>>
>>>>>>> signal.correlate([0, 0, 1, 0, 0],[1, 2, 0, 0, 0] )
>>>>>>>
>>>>>>>
>>>> array([0, 0, 0, 0, 0, 2, 1, 0, 0])
>>>>
>>>>
>>> Well, you just happened to have very peculiar entries :)
>>>
>>> signal.correlate([-1, -2, -3], [1, 2, 3])
>>> -> array([ -3,  -8, -14,  -8,  -3])
>>>
>>> signal.correlate([1, 2, 3], [-1, -2, -3])
>>> -> array([ -3,  -8, -14,  -8,  -3])
>>>
>>
>> One of your arrays is just the negative of the other, and correlate is
>> the same in this case. For other cases, the results differ
>>
>
> Gr, you're right of course :) But it still fails for arrays where the
> second argument has any dimension which is larger than the first one. I
> don't know if the assumption is relied on in the C implementation (the
> arrays are inverted in the C code in that case - even though they seem
> to be already inverted in python).

I read somewhere in the description for one of the correlate that the
code is faster if the first array is longer than the second. I thought
that maybe the python code forgot to switch it back, when the two
arrays are switched for the c-code.


>
>> I looked at it only for examples to calculate auto-correlation and
>> cross-correlation in time series, and had to try out to see which
>> version works best.
>>
>
> Yes, it depends. I know that correlate is way too slow for my own usage
> in speech processing, for example for linear prediction coding. As only
> a few lags are necessary, direct implementation is often faster than FFT
> one - I have my own straightfoward autocorrelation in scikits.talkbox; I
> believe matlab xcorr (1d correlation) always uses the FFT.
>
> I know other people have problems with the scipy.signal correlate as
> well for large arrays (the C code does a copy if the inputs are not
> contiguous, for example - my own code using iterators should not need
> any copy of inputs).

I'm new to this, but following the discussion on the cython list, I
thought that to have fast c code you always need to copy to a
contiguous array.

Are there performance differences  between calling it once with a huge
array versus calling it very often (eg. for sequential "online"
computation) for medium sized arrays.

What is huge in your talk box applications?

>
>> Are the convolve in all cases compatible, identical (through
>> delegation) to correlate?
>>
>
> I don't think so - I don't think convolution uses the conjugate for
> complex values. But I don't know any use of complex convolution,
> although I am sure there is. Correlation is always defined with the
> complex conjugate of the second argument AFAIK. For real cases,
> convolutions should always be implementable as correlation, at least
> when considering  0 padding for boundaries. There may be problem for
> huge arrays, though - doing convolution from correlation without using
> copies while staying fast may not always be easy, but I have never tried
> to do so.
>

I was only looking at some implementation where convolve is just
correlate with second array reversed, as in the description of
numpy.correlate. So I thought that convolve and correlate should have
the same interface, whether they are implemented only once or
separately.

If we have different implementations based on performance for
different use cases, it would we very helpful to add your notes to the
docs.

Josef


More information about the Scipy-dev mailing list