Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I'm trying to run jobs on a timer when the jobs are on a cluster and I don't know which server the job is assigned to.

It's essentially a problem of distributed timers and distributed transactions.

(If anyone has any resources on how similar problems have been solved in the past, I'd appreciate it.)



sort by: page size:

like the random server assignment for job execution

Usually Hive or Dremel, with rows ingested from frontend instance logs. I get that it would actually work on a smaller system with a single server assigning times (or a quorum, if leader election ensures a monotonic clock).

I have a requirement that I couldn't get rid of yet, which requires me to know which scheduler it is going to run on before I let go of it. Once I manage to find a way around it I'll move to a global queue model.

Been looking for something like this forever. Thanks!

I’m running batch jobs of long running single threaded tasks that depend on each other in a somewhat complicated manner.


That sounds like a job for a scheduler, in the cluster sense. I'm actually looking around for a job framework in .NET that supports custom schedulers, but have yet to find something that supports resource constrained scheduling. It's all either about en-queueing something to be done NOW, or at a future date. I haven't seen anything that supports custom scheduler implementations on a per-job type basis. They don't really distinguish between logging work to be done and deciding whether or not it can be executed NOW.

This is an interesting space.

I think it is interesting that job scheduling, dependency graphs, dirty refresh logic, mutually exclusive execution are all relevant in the same space.

I recently working on some Java code to schedule mutually exclusively across threads. I wanted to schedule A if B is not running and then A if B is not running. Then alternately run the two. I think it's traditionally solved with a lock in distributed systems.

I think there is inspiration from GUI update logic too and potential for cache invalidation or cache refresh logic in background job processing systems.

How does JobRunr persist jobs? If I schedule a background job from an inflight request handler, does the job get persisted if there is a crash?


> Say I start two identical workers that have the same schedules set up.

Workers don't run the scheduled tasks, only a scheduler does. You should start only one scheduler but as many workers as you want. wakaq-scheduler and wakaq-worker are the command line interfaces.

> Is there any communication between workers?

Someone I know is working on a solution for this as an open source project. I don't have the link off hand but I'll send him your comment.


Well, a multi-machine scheduler.

Ha, I had a similar thought and started creating exactly that: clusterd, a way to cluster and schedule things on systemd instances. I haven't gotten very far, but if you're interested in helping me please reach out (email in profile).

I would be really interested in a distributed job running system with a scheduler. I want a UI where I can set jobs via a schedule that run on one of many workers. In that same UI I want to be able to look at failed jobs and see all of the output from that job. I don't want to have to ssh over to the box where the job ran to retrieve a log file. I once had a job where we used Tidal Enterprise Scheduler which poorly implemented this. What solutions exist for this workflow in the open source world?

Very interesting. I would like to see something in the future with support for approximate times, like launching something about five minutes.

Why? Because sometimes you don't want all your jobs to start at once. Just moving one job some miliseconds ahead of time will make a whole server to handle load better


That's a very similar pattern to what I've been using at TW.

At the core of our implementation is this library to schedule the jobs to different workers: https://github.com/kagkarlsson/db-scheduler

Would highly recommend that library!


Most of what he's writing about, and much more, is made substantially easier with systemd timers.

E.g. want errors to cause e-mails, but everything else to just go to logs? Use a timer to activate a service, and make systemd activate another service on failure.

Want to avoid double execution? That's the default (timers are usually used to activate another unit, as long as that unit doesn't start something that doubleforks, it won't get activated twice).

(Some) protection against thundering herd is built in: You specify the level of accuracy (default 1m), and each machine on boot will randomly select a number of seconds to offset all timers on that host with. You can set this per timer or for the entire host.

And if you're using fleet, you can use fleet to automatically re-schedule cluster-wide jobs if a machine fails.

And the journal will capture all the output and timestamp it.

systemctl list-timers will show you which timers are scheduled, when they're scheduled to run next, how long is left until then, when they ran last, how long that is ago:

     $ systemctl list-timers
    NEXT                         LEFT     LAST                         PASSED       UNIT                      
    Sat 2015-10-17 01:30:15 UTC  51s left Sat 2015-10-17 01:29:15 UTC  8s ago       motdgen.timer             
    Sat 2015-10-17 12:00:34 UTC  10h left Sat 2015-10-17 00:00:33 UTC  1h 28min ago rkt-gc.timer              
    Sun 2015-10-18 00:00:00 UTC  22h left Sat 2015-10-17 00:00:00 UTC  1h 29min ago logrotate.timer           
    Sun 2015-10-18 00:15:26 UTC  22h left Sat 2015-10-17 00:15:26 UTC  1h 13min ago systemd-tmpfiles-clean.timer
And the timer specification itself is extremely flexible. E.g. you can schedule a timer to run x seconds after a specific unit was activated, or x seconds after boot, or x seconds after the timer itself fired, or x seconds after another unit was deactivated. Or combinations.

Not OP, but if you're building a small scheduler for batching changes, it should be pretty straightforward:

- `jobs` is an array of functions

- enqueue adds a job to `jobs` and starts running jobs if not running

- job running can be triggered by animation frames or setTimeout


Oh wow, that's cool. Do you know if servers currently support this? Would this mostly be useful on a network level or do you think it would also be useful for like trying to be more intelligent about scheduling?

I think you can create ephemeral timers with Systemd if you're on Linux.

> 2. Handover the task to a job scheduler and return the jobid.

That's the approach described in the article. Where does your suggestion differ from it?


This thread is one of those cases where you read something and realize you've been completely missing something. I don't monitor these as much as I should

Servers/services? Definitely - take your pick. Timers/jobs, particularly those on my system? Nothing!

With the right directives laid out ('Wants/Requires/Before/After'), they can be pretty robust/easily forgotten.

I've been lucky in this regard; I check 'systemctl list-timers' just to be sure - but they always run


I saw this and thought "how does it know who I am?":

io scheduler cfq registered (default)

next

Legal | privacy