[Numpy-discussion] Correlation filter
Fri Nov 20 13:17:09 CST 2009
On Fri, Nov 20, 2009 at 1:51 PM, Keith Goodman <firstname.lastname@example.org> wrote:
> On Fri, Nov 20, 2009 at 8:53 AM, <email@example.com> wrote:
>> scipy.signal.correlate would be fast, but it will not be easy to
>> subtract the correct moving mean. Subtracting a standard moving mean
>> would subtract different values for each observation in the window.
>> One possibility would be to look at a moving regression and just take
>> the estimated slope parameter. John D'Errico
>> (BSD licensed)
>> uses a very nice trick with pinv to get the filter to calculate a
>> moving slope coefficient. I read the source but didn't try it out and
>> didn't try to figure out how the pinv trick exactly works.
>> If this can be adapted to your case, then this would be the fastest I
>> can think of (pinv and scipy.signal.correlate would do everything in
>> C, or maybe one (500) loop might still be necessary)
>> For just getting a ranking on a linear relationship, there might be
>> other tricks possible, local linear regression, ... (?), but I never
>> tried. Also, I think with one time loop, you can do all cross section
>> regressions at the same time, without tricks.
> Those sound like good ideas.
> I was able to get rid of the for loop (and cut the time in half) by
> using lfilter from scipy:
> def corr3(x, y):
> x = x - x.mean()
> x /= x.std()
> nx = x.size
> one = np.ones(nx)
> xy = lfilter(x, 1, y)
> sy = lfilter(one, 1, y)
> sy2 = lfilter(one, 1, y*y)
> d = xy / np.sqrt(nx * sy2 - sy * sy)
> return d
Is this correct? xy uses the original y and not a demeaned y.
I think your use of lfilter only uses features that are also available
with correlate, and signal correlate can handle ndim arrays.
I use correlate in my moving moment calculations
but I think I only tested 1d.
> (Somehow I managed to flip the sign of the correlation in corr3.)
> I can trim a little more time by doing some of the operations in place
> but then the code becomes hard to read. This solution will work well
> for me since I can move much of the calculation outside of the
> function and reuse it across many calls (y does not change often
> compared to x).
> NumPy-Discussion mailing list
More information about the NumPy-Discussion