[SciPy-User] synchronizing timestamps from different systems; unpaired linear regression
Tue Apr 10 16:18:02 CDT 2012
On Tue, Apr 10, 2012 at 10:27 AM, Chris Rodgers <firstname.lastname@example.org> wrote:
> I have what seems like a straightforward problem but it is becoming
> more difficult than I thought. I have two different computers
> recording timestamps from the same stream of events. I get lists X and
> Y from each computer and the question is how to figure out which entry
> in X corresponds to which entry in Y.
> 1) There are an unknown number of missing or spurious events in each
> list. I do not know which events in X match up to which in Y.
> 2) The temporal offset between the two lists is unknown, because each
> timer begins at a different time.
> 3) The clocks seem to run at slightly different speeds (~0.3%
> difference adds up to about 10 seconds over my 1hr recording time).
Tricky! Others have given you plenty of advice for how to change the
setup in ways that might help, but my guess is you can solve it with
the data at hand.
There's a 2-parameter space of (offset, relative clockspeed) that
you're trying to search. You need a smooth and quick-to-evaluate
function that gets larger when these numbers are closer to accurate,
that you can hand to your optimizer. (Smooth to avoid the local minima
problem you mention; quick-to-evaluate for the obvious reason.) How
about, given a candidate offset + clockspeed, remap your Y events into
the the purported X clock domain, and score each event by its squared
distance from the nearest X event. This ignores the issue of matching,
on the assumption that mismatches between the lists are rare enough
that they won't matter. Given that you're trying to extract 2 numbers
worth of information from 36000 samples, you should be able to get
away with a fair amount of sloppiness.
And it's very concise to write down and fast to compute (untested code):
x_times = np.array([x_time1, x_time2, x_time3, ...], dtype=float)
y_times = np.array([y_time1, y_time2, y_time3, ...], dtype=float)
x_midpoints = x_times[:-1] + np.diff(x) / 2.
def objective(offset, clockspeed):
# adjust parametrization to suit
adj_y_times = y_times * clockspeed + offset
closest_x_times = np.searchsorted(x_midpoints, adj_y_times)
return np.sum((y_times - x_times[closest_x_times]) ** 2)
Each evaluation is O(n log n). Worth a try, anyway...
> I know this problem is solvable because once you find the temporal
> offset and clock-speed ratio, the matching timestamps agree to within
> 10ms. That is, there is a strong linear relationship between some
> unknown X->Y mapping.
> Basically, the problem is: given list X and list Y, and specifying a
> certain minimum R**2 value, what is the largest set of matched points
> from X and Y that satisfy this R**2 value? I have tried googling
> "unmatched linear regression" but this must not be the right search
> One approach that I've tried is to create an analog trace for X and Y
> with a Gaussian centered at each timestamp, then finding the lag that
> optimizes the cross-correlation between the two. This is good for
> finding the temporal offset but can't handle the clock-speed
> difference. (Also it takes a really long time because the series are
> 1hr of data sampled at 10Hz.) Then I can choose the closest matches
> between X and Y and fit them with a line, which gives me the
> clock-difference parameter. The problem is that there are a ton of
> local minima created by how I choose to match up the points in X and
> Y, so it gets stuck on the wrong answer.
> Any tips?
> PS: my current code and test data is here:
> Chris Rodgers
> Graduate Student
> Helen Wills Neuroscience Institute
> University of California - Berkeley
> SciPy-User mailing list
More information about the SciPy-User