Hacker Read

jchanimal · 2019-05-11 08:46:07

They should change the name to “Clockroach” considering they lose correctness guarantees when the clock drifts beyond a preconfigured uncertainty window.

pjc50 | karma 93685 | avg karma 3.72 · | 2014-07-22 08:48:29+00:00

If the clock is drifting substantially then your high-precision application will already be in serious trouble.

ljhsiung | karma 644 | avg karma 6.64 · | 2019-12-11 14:02:57+00:00

> note that software clock glitching was done a few years back

mikeash | karma 74524 | avg karma 3.52 · | 2018-03-08 01:01:25+00:00

This seems to be a somewhat common type of problem. I wonder if companies should routinely test on machines with the clock set one year into the future to catch them before they hit customers.

tschottdorf | karma 95 | avg karma 4.13 · | 2015-06-05 18:04:12

Cockroach trusts the MaxOffset, and if your clocks don't live up to the promise, you might get some stale reads. By the way, Spanner breaks in the same way if their clock offset (via their TrueTime API) fails them. But Spanner has to wait out the MaxOffset on every commit, we don't - so we get away with having it high enough for off-the-shelf clock synchronization and save you the atomic clocks, at similar guarantees. That's a very good deal. If you happen to have atomic clocks around and you have strong guarantees on your uncertainty like Spanner does, you get linearizability at the same price.

For a more in-depth explanation of the above, see https://gist.github.com/tschottdorf/57bcccc379b151456044.

reply

bdarnell | karma 1126 | avg karma 6.51 · | 2015-03-25 18:26:52

Cockroach uses hybrid logical clocks and should generally tolerate reasonable amounts of clock skew. Atomic clocks can improve performance in some cases by putting a tighter bound on clock skew, but they're not necessary for correctness.

vonmoltke | karma 10224 | avg karma 2.49 · | 2014-03-24 20:15:10+00:00

> It is completely irrelevant that their one tenth of a second was not exactly one tenth of a second.

Its very relevant when the module that is off is trying to make telemetry calculations based on target Doppler velocity, which is given with real, ISO standard seconds. There is no clock involved in that. Diverging module clocks amplifies the problem.

Also, the ultimate reference is the true definition of a second. All modules are expected to be using it, as it is used to synchronize modules. It is the clock and at some level a clock that has a faulty definition will be drifting off another clock. Your distinction is irrelevant as far as real-time systems are concerned.

reply

sheepshear | karma 149 | avg karma 0.91 · | 2023-11-17 14:36:01

Correcting drift is exactly what a disciplined clock does. It's the solution to the problem you mentioned.

simonh | karma 32703 | avg karma 2.95 · | 2018-02-28 22:08:16

It wasn’t clock drift, it was an error in calculation leading to separate parts of the system, that were calibrated to the same common clock, to drift out of synchronization. Using a different clock, like GPS wouldn’t help with this.

But the rest of your point boils down to ’if you know your system has a flaw why not mitigate it’? But of course at design time they didn’t know it had this flaw.

reply

wyldfire | karma 20828 | avg karma 4.03 · | 2018-12-28 03:03:49+00:00

> I wish that the ULID spec checked for microsecond collisions instead of millisecond

Most computers don't have wall clocks with even more millisecond precision, much less microsecond. The wall clocks are generally OS tick precision (~5-10ms ballpark) with poorer still accuracy.

Seems like the use case is one that is niche. Maybe an extension?

reply

drewhk | karma 139 | avg karma 2.67 · | 2013-10-15 07:42:06+00:00

> This is not slowing down a clock for a while because it runs fast, but slowing it down to make it run too slow, and then skipping a leap second.

Ok, I see the difference. The issue here is not preserving monotonicity but keeping the semantics of the "wall clock".

reply

blattimwind | karma 8132 | avg karma 3.26 · | 2018-10-30 21:01:53+00:00

> I agree it is unlikely to be the RTC clock at issue unless it runs a watchdog of done form.

Fairly standard technique to run the watchdog off the RTC clock, because that might still work if the main clock is wonky.

reply

philjohn | karma 3819 | avg karma 2.19 · | 2019-10-03 15:56:34+00:00

You run the risk of the computational equivalent of speedo not beeing 100% accurate, e.g. clock innacuracy.

lrossi | karma 1124 | avg karma 5.02 · | 2021-01-11 22:53:47+00:00

They are using it only to generate UUIDs. They could have avoided collisions using the clock or even a counter as part of the uuid. But for some reason systemd is always creating a lot of drama before they accept they were wrong and fix the bug (which doesn’t always happen).

kbenson | karma 31158 | avg karma 2.98 · | 2018-10-31 21:54:58+00:00

> Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice.

I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.

reply

throw149102 | karma | avg karma · | 2021-04-24 11:00:45+00:00

This makes me worry about clock skew and asynchronous code in general. For example, a timestamp might lead you to believe things happened in a different order than they did because the timestamping process isn't atomic.

JonMR | karma 96 | avg karma 6.0 · | 2023-05-18 04:46:19

Mind the clock drift. https://github.com/abiosoft/colima/issues/274

noisy_boy | karma 4816 | avg karma 2.79 · | 2021-01-31 23:13:46

I find it interesting/satisfying that clock tick is one big bold configuration parameter that can be tweaked to basically change the interpretation of the results.

mehrdadn | karma 286 | avg karma 0.06 · | 2018-09-21 15:57:09+00:00

I would understand if the complaint was that Spanner is too slow without expensively accurate clocks and synchronization. But the complaint is that Spanner fails to guarantee consistency, which doesn't make sense to me. The requirements clearly include giving a valid clock bound, so if you give an invalid clock bound, it's clearly your fault for getting incorrect results, not Spanner's!

wmf | karma 46152 | avg karma 2.46 · | 2020-03-19 19:32:40+00:00

If they fixed the clock it would have to be lower which would lead to lower performance.