Hacker Read

Gollapalli · 2020-07-05 22:38:48+00:00

I'm trying to run jobs on a timer when the jobs are on a cluster and I don't know which server the job is assigned to.

It's essentially a problem of distributed timers and distributed transactions.

(If anyone has any resources on how similar problems have been solved in the past, I'd appreciate it.)

zorrowl | karma 1 | avg karma 1.0 · | 2016-12-27 11:17:49+00:00

like the random server assignment for job execution

erik_seaberg | karma 4377 | avg karma 1.56 · | 2023-10-02 01:37:30

Usually Hive or Dremel, with rows ingested from frontend instance logs. I get that it would actually work on a smaller system with a single server assigning times (or a quorum, if leader election ensures a monotonic clock).

halayli | karma 2473 | avg karma 2.69 · | 2012-03-03 23:20:56+00:00

I have a requirement that I couldn't get rid of yet, which requires me to know which scheduler it is going to run on before I let go of it. Once I manage to find a way around it I'll move to a global queue model.

JhGhnbgHJ | karma 1 | avg karma 0.5 · | 2017-12-18 16:09:19

Been looking for something like this forever. Thanks!

I’m running batch jobs of long running single threaded tasks that depend on each other in a somewhat complicated manner.

reply

Rapzid | karma 5572 | avg karma 1.75 · | 2015-03-16 07:11:01

That sounds like a job for a scheduler, in the cluster sense. I'm actually looking around for a job framework in .NET that supports custom schedulers, but have yet to find something that supports resource constrained scheduling. It's all either about en-queueing something to be done NOW, or at a future date. I haven't seen anything that supports custom scheduler implementations on a per-job type basis. They don't really distinguish between logging work to be done and deciding whether or not it can be executed NOW.

samsquire | karma 2621 | avg karma 2.37 · | 2023-05-26 02:46:50

This is an interesting space.

I think it is interesting that job scheduling, dependency graphs, dirty refresh logic, mutually exclusive execution are all relevant in the same space.

I recently working on some Java code to schedule mutually exclusively across threads. I wanted to schedule A if B is not running and then A if B is not running. Then alternately run the two. I think it's traditionally solved with a lock in distributed systems.

I think there is inspiration from GUI update logic too and potential for cache invalidation or cache refresh logic in background job processing systems.

How does JobRunr persist jobs? If I schedule a background job from an inflight request handler, does the job get persisted if there is a crash?

reply

welder | karma 4493 | avg karma 3.92 · | 2022-09-05 19:11:03

> Say I start two identical workers that have the same schedules set up.

Workers don't run the scheduled tasks, only a scheduler does. You should start only one scheduler but as many workers as you want. wakaq-scheduler and wakaq-worker are the command line interfaces.

> Is there any communication between workers?

Someone I know is working on a solution for this as an open source project. I don't have the link off hand but I'll send him your comment.

reply

mnx | karma 711 | avg karma 2.47 · | 2017-12-05 16:00:18+00:00

Well, a multi-machine scheduler.

candiddevmike | karma 12368 | avg karma 4.09 · | 2022-08-24 12:54:20

Ha, I had a similar thought and started creating exactly that: clusterd, a way to cluster and schedule things on systemd instances. I haven't gotten very far, but if you're interested in helping me please reach out (email in profile).

paddy_m | karma 1874 | avg karma 3.9 · | 2018-06-19 16:18:20

I would be really interested in a distributed job running system with a scheduler. I want a UI where I can set jobs via a schedule that run on one of many workers. In that same UI I want to be able to look at failed jobs and see all of the output from that job. I don't want to have to ssh over to the box where the job ran to retrieve a log file. I once had a job where we used Tidal Enterprise Scheduler which poorly implemented this. What solutions exist for this workflow in the open source world?

eb0la | karma 1580 | avg karma 1.68 · | 2016-12-20 00:09:56+00:00

Very interesting. I would like to see something in the future with support for approximate times, like launching something about five minutes.

Why? Because sometimes you don't want all your jobs to start at once. Just moving one job some miliseconds ahead of time will make a whole server to handle load better

reply

BukhariH | karma 674 | avg karma 5.96 · | 2019-11-09 15:11:16+00:00

That's a very similar pattern to what I've been using at TW.

At the core of our implementation is this library to schedule the jobs to different workers: https://github.com/kagkarlsson/db-scheduler

Would highly recommend that library!

reply

vidarh | karma 41717 | avg karma 2.6 · | 2015-10-17 01:36:10+00:00

Most of what he's writing about, and much more, is made substantially easier with systemd timers.

E.g. want errors to cause e-mails, but everything else to just go to logs? Use a timer to activate a service, and make systemd activate another service on failure.

Want to avoid double execution? That's the default (timers are usually used to activate another unit, as long as that unit doesn't start something that doubleforks, it won't get activated twice).

(Some) protection against thundering herd is built in: You specify the level of accuracy (default 1m), and each machine on boot will randomly select a number of seconds to offset all timers on that host with. You can set this per timer or for the entire host.

And if you're using fleet, you can use fleet to automatically re-schedule cluster-wide jobs if a machine fails.

And the journal will capture all the output and timestamp it.

systemctl list-timers will show you which timers are scheduled, when they're scheduled to run next, how long is left until then, when they ran last, how long that is ago:

     $ systemctl list-timers
    NEXT                         LEFT     LAST                         PASSED       UNIT                      
    Sat 2015-10-17 01:30:15 UTC  51s left Sat 2015-10-17 01:29:15 UTC  8s ago       motdgen.timer             
    Sat 2015-10-17 12:00:34 UTC  10h left Sat 2015-10-17 00:00:33 UTC  1h 28min ago rkt-gc.timer              
    Sun 2015-10-18 00:00:00 UTC  22h left Sat 2015-10-17 00:00:00 UTC  1h 29min ago logrotate.timer           
    Sun 2015-10-18 00:15:26 UTC  22h left Sat 2015-10-17 00:15:26 UTC  1h 13min ago systemd-tmpfiles-clean.timer

And the timer specification itself is extremely flexible. E.g. you can schedule a timer to run x seconds after a specific unit was activated, or x seconds after boot, or x seconds after the timer itself fired, or x seconds after another unit was deactivated. Or combinations.

timfsu | karma 163 | avg karma 2.81 · | 2022-05-03 21:46:20

Not OP, but if you're building a small scheduler for batching changes, it should be pretty straightforward:

- `jobs` is an array of functions

- enqueue adds a job to `jobs` and starts running jobs if not running

- job running can be triggered by animation frames or setTimeout

reply

foota | karma 6719 | avg karma 2.15 · | 2016-07-15 06:39:01

Oh wow, that's cool. Do you know if servers currently support this? Would this mostly be useful on a network level or do you think it would also be useful for like trying to be more intelligent about scheduling?

lobocinza | karma 817 | avg karma 0.83 · | 2022-07-03 20:07:40

I think you can create ephemeral timers with Systemd if you're on Linux.

geezerjay | karma 3623 | avg karma 1.89 · | 2018-12-01 12:48:48

> 2. Handover the task to a job scheduler and return the jobid.

That's the approach described in the article. Where does your suggestion differ from it?

reply

bravetraveler | karma | avg karma · | 2023-07-31 11:23:23

This thread is one of those cases where you read something and realize you've been completely missing something. I don't monitor these as much as I should

Servers/services? Definitely - take your pick. Timers/jobs, particularly those on my system? Nothing!

With the right directives laid out ('Wants/Requires/Before/After'), they can be pretty robust/easily forgotten.

I've been lucky in this regard; I check 'systemctl list-timers' just to be sure - but they always run

reply

cfq | karma 44 | avg karma 2.93 · | 2011-05-17 11:17:34+00:00

I saw this and thought "how does it know who I am?":

io scheduler cfq registered (default)

reply