[SciPy-User] synchronizing timestamps from different systems; unpaired linear regression

Nathaniel Smith njs@pobox....
Wed Apr 11 11:37:39 CDT 2012


On Wed, Apr 11, 2012 at 5:16 AM, Chris Rodgers <xrodgers@gmail.com> wrote:
> Re Nathaniel's suggestion:
> I think this is pretty similar to the algorithm I'm currently using. Pseudocode:
>
> current_guess = estimate_from_correlation(x, y)
> for timescale in decreasing_order:
>  xm, ym = find_matches(
>    x, y, current_guess, within=timescale)
>  current_guess = linfit(xm, ym)
>
> The problem is the local minima caused by mismatch errors. If the
> clockspeed estimate is off, then late events are incorrectly matched
> with a delay of one event. Then the updated guess moves closer to this
> incorrect solution. So by killing off the points that disagree, we
> reinforce the current orthodoxy!

Yes, that's why I was suggesting doing something similar to your
current algorithm, but different in ways that might avoid this problem
:-).

> Actually the truest objective function would be the number of matches
> within some specified error.
> ERR = .1
> def objective(offset, clockspeed):
>   # adjust parametrization to suit
>   adj_y_times = y_times * clockspeed + offset
>   closest_x_times = np.searchsorted(x_midpoints, adj_y_times)
>   pred_err = abs(adj_y_times - x_midpoints[closest_x_times])
>   closest_good = closest_x_times[pred_err < ERR]
>   return len(unique(closest_good))
>
>
> That function has some ugly non-smoothness due to the
> len(unique(...)). Would something like optimize.brent work for this or
> am I on the wrong track?

That also has the problem that the objective function is totally flat
whenever the points are all within ERR, so you are guaranteed to get
an offset inaccuracy on the same order of magnitude as ERR. It sounds
like your clocks have a lot of jitter, though, if you can't do better
than 10 ms agreement, so maybe you can't get more accurate than that
anyway.

The "perfect" solution to this would involve writing down a
probabilistic model that had some distribution of per-event jitter,
some likelihood of events being lost, etc., and then maximize the
likelihood of the data. But calculating this likelihood would probably
require considering all possible pairings of X and Y values, which
would be extremely expensive to compute. But the approximation where
you assume you have the correct matching is not so good, as you
notice. So if I were you I'd just fiddle around with these different
suggestions until you got something that worked :-).

-- Nathaniel


More information about the SciPy-User mailing list