# [Numpy-discussion] Correlation filter

josef.pktd@gmai... josef.pktd@gmai...
Mon Nov 23 00:32:03 CST 2009

```On Fri, Nov 20, 2009 at 4:27 PM,  <josef.pktd@gmail.com> wrote:
> On Fri, Nov 20, 2009 at 2:28 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Fri, Nov 20, 2009 at 11:17 AM,  <josef.pktd@gmail.com> wrote:
>>> On Fri, Nov 20, 2009 at 1:51 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>> def corr3(x, y):
>>>>    x = x - x.mean()
>>>>    x /= x.std()
>>>>    nx = x.size
>>>>    one = np.ones(nx)
>>>>    xy = lfilter(x, 1, y)
>>>>    sy = lfilter(one, 1, y)
>>>>    sy2 = lfilter(one, 1, y*y)
>>>>    d = xy / np.sqrt(nx * sy2 - sy * sy)
>>>>    return d
>>>
>>> Is this correct? xy uses the original y and not a demeaned y.
>>
>> I wouldn't be surprised if I made mistakes. But I don't think I need
>> to demean y. One way to write the numerator of the correlation
>> coefficent is
>>
>> N*sum(xy) - sum(x)*sum(y)
>>
>> but sum(x) is zero, so it becomes
>>
>> N*sum(xy)
>>
>> So I think I only need to demean x. But I'd better test the code. Plus
>> I have no idea how lfilter will handle my missing values (NaN).
>
> I think nans fully propagate, trick might be to fill with zero and use
> another convolution/correlation with the non-nan mask to get the
> number of observations included (I think I saw it as a trick by Pierre
> for autocorrelation with missing observations).

Just an update, I used this for moving correlation between x and y
of equal shape, just posted to the scipy-user mailing list.
In the test case it works correctly.

Josef

>
> Here is a moving slope function which estimate the slope coefficient
> for a linear trend if x is arange()
> I compared the ranking with your moving correlation and it is quite different.
>
>
> def moving_slope(x,y)
>    '''estimate moving slope coefficient of regression of y on x
>    filters along axis=1, returns valid observations
>    Todo: axis and lag options
>    '''
>    xx = np.column_stack((np.ones(x.shape), x))
>    pinvxx = np.linalg.pinv(xx)[1:,:]
>    windsize = len(x)
>    lead = windsize//2 - 1
>    return signal.correlate(y, pinvxx, 'full'
>
>
> Josef
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
```