Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

They should change the name to “Clockroach” considering they lose correctness guarantees when the clock drifts beyond a preconfigured uncertainty window.


sort by: page size:

If the clock is drifting substantially then your high-precision application will already be in serious trouble.

> note that software clock glitching was done a few years back

This seems to be a somewhat common type of problem. I wonder if companies should routinely test on machines with the clock set one year into the future to catch them before they hit customers.

Cockroach trusts the MaxOffset, and if your clocks don't live up to the promise, you might get some stale reads. By the way, Spanner breaks in the same way if their clock offset (via their TrueTime API) fails them. But Spanner has to wait out the MaxOffset on every commit, we don't - so we get away with having it high enough for off-the-shelf clock synchronization and save you the atomic clocks, at similar guarantees. That's a very good deal. If you happen to have atomic clocks around and you have strong guarantees on your uncertainty like Spanner does, you get linearizability at the same price.

For a more in-depth explanation of the above, see https://gist.github.com/tschottdorf/57bcccc379b151456044.


Cockroach uses hybrid logical clocks and should generally tolerate reasonable amounts of clock skew. Atomic clocks can improve performance in some cases by putting a tighter bound on clock skew, but they're not necessary for correctness.

> It is completely irrelevant that their one tenth of a second was not exactly one tenth of a second.

Its very relevant when the module that is off is trying to make telemetry calculations based on target Doppler velocity, which is given with real, ISO standard seconds. There is no clock involved in that. Diverging module clocks amplifies the problem.

Also, the ultimate reference is the true definition of a second. All modules are expected to be using it, as it is used to synchronize modules. It is the clock and at some level a clock that has a faulty definition will be drifting off another clock. Your distinction is irrelevant as far as real-time systems are concerned.


Correcting drift is exactly what a disciplined clock does. It's the solution to the problem you mentioned.

It wasn’t clock drift, it was an error in calculation leading to separate parts of the system, that were calibrated to the same common clock, to drift out of synchronization. Using a different clock, like GPS wouldn’t help with this.

But the rest of your point boils down to ’if you know your system has a flaw why not mitigate it’? But of course at design time they didn’t know it had this flaw.


> I wish that the ULID spec checked for microsecond collisions instead of millisecond

Most computers don't have wall clocks with even more millisecond precision, much less microsecond. The wall clocks are generally OS tick precision (~5-10ms ballpark) with poorer still accuracy.

Seems like the use case is one that is niche. Maybe an extension?


> This is not slowing down a clock for a while because it runs fast, but slowing it down to make it run too slow, and then skipping a leap second.

Ok, I see the difference. The issue here is not preserving monotonicity but keeping the semantics of the "wall clock".


> I agree it is unlikely to be the RTC clock at issue unless it runs a watchdog of done form.

Fairly standard technique to run the watchdog off the RTC clock, because that might still work if the main clock is wonky.


You run the risk of the computational equivalent of speedo not beeing 100% accurate, e.g. clock innacuracy.

They are using it only to generate UUIDs. They could have avoided collisions using the clock or even a counter as part of the uuid. But for some reason systemd is always creating a lot of drama before they accept they were wrong and fix the bug (which doesn’t always happen).

> Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice.

I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.


This makes me worry about clock skew and asynchronous code in general. For example, a timestamp might lead you to believe things happened in a different order than they did because the timestamping process isn't atomic.


I find it interesting/satisfying that clock tick is one big bold configuration parameter that can be tweaked to basically change the interpretation of the results.

I would understand if the complaint was that Spanner is too slow without expensively accurate clocks and synchronization. But the complaint is that Spanner fails to guarantee consistency, which doesn't make sense to me. The requirements clearly include giving a valid clock bound, so if you give an invalid clock bound, it's clearly your fault for getting incorrect results, not Spanner's!

If they fixed the clock it would have to be lower which would lead to lower performance.
next

Legal | privacy