Usually Hive or Dremel, with rows ingested from frontend instance logs. I get that it would actually work on a smaller system with a single server assigning times (or a quorum, if leader election ensures a monotonic clock).
I have a requirement that I couldn't get rid of yet, which requires me to know which scheduler it is going to run on before I let go of it. Once I manage to find a way around it I'll move to a global queue model.
That sounds like a job for a scheduler, in the cluster sense. I'm actually looking around for a job framework in .NET that supports custom schedulers, but have yet to find something that supports resource constrained scheduling. It's all either about en-queueing something to be done NOW, or at a future date. I haven't seen anything that supports custom scheduler implementations on a per-job type basis. They don't really distinguish between logging work to be done and deciding whether or not it can be executed NOW.
I think it is interesting that job scheduling, dependency graphs, dirty refresh logic, mutually exclusive execution are all relevant in the same space.
I recently working on some Java code to schedule mutually exclusively across threads. I wanted to schedule A if B is not running and then A if B is not running. Then alternately run the two. I think it's traditionally solved with a lock in distributed systems.
I think there is inspiration from GUI update logic too and potential for cache invalidation or cache refresh logic in background job processing systems.
How does JobRunr persist jobs? If I schedule a background job from an inflight request handler, does the job get persisted if there is a crash?
> Say I start two identical workers that have the same schedules set up.
Workers don't run the scheduled tasks, only a scheduler does. You should start only one scheduler but as many workers as you want. wakaq-scheduler and wakaq-worker are the command line interfaces.
> Is there any communication between workers?
Someone I know is working on a solution for this as an open source project. I don't have the link off hand but I'll send him your comment.
Ha, I had a similar thought and started creating exactly that: clusterd, a way to cluster and schedule things on systemd instances. I haven't gotten very far, but if you're interested in helping me please reach out (email in profile).
I would be really interested in a distributed job running system with a scheduler. I want a UI where I can set jobs via a schedule that run on one of many workers. In that same UI I want to be able to look at failed jobs and see all of the output from that job. I don't want to have to ssh over to the box where the job ran to retrieve a log file. I once had a job where we used Tidal Enterprise Scheduler which poorly implemented this. What solutions exist for this workflow in the open source world?
Very interesting.
I would like to see something in the future with support for approximate times, like launching something about five minutes.
Why? Because sometimes you don't want all your jobs to start at once. Just moving one job some miliseconds ahead of time will make a whole server to handle load better
Most of what he's writing about, and much more, is made substantially easier with systemd timers.
E.g. want errors to cause e-mails, but everything else to just go to logs? Use a timer to activate a service, and make systemd activate another service on failure.
Want to avoid double execution? That's the default (timers are usually used to activate another unit, as long as that unit doesn't start something that doubleforks, it won't get activated twice).
(Some) protection against thundering herd is built in: You specify the level of accuracy (default 1m), and each machine on boot will randomly select a number of seconds to offset all timers on that host with. You can set this per timer or for the entire host.
And if you're using fleet, you can use fleet to automatically re-schedule cluster-wide jobs if a machine fails.
And the journal will capture all the output and timestamp it.
systemctl list-timers will show you which timers are scheduled, when they're scheduled to run next, how long is left until then, when they ran last, how long that is ago:
$ systemctl list-timers
NEXT LEFT LAST PASSED UNIT
Sat 2015-10-17 01:30:15 UTC 51s left Sat 2015-10-17 01:29:15 UTC 8s ago motdgen.timer
Sat 2015-10-17 12:00:34 UTC 10h left Sat 2015-10-17 00:00:33 UTC 1h 28min ago rkt-gc.timer
Sun 2015-10-18 00:00:00 UTC 22h left Sat 2015-10-17 00:00:00 UTC 1h 29min ago logrotate.timer
Sun 2015-10-18 00:15:26 UTC 22h left Sat 2015-10-17 00:15:26 UTC 1h 13min ago systemd-tmpfiles-clean.timer
And the timer specification itself is extremely flexible. E.g. you can schedule a timer to run x seconds after a specific unit was activated, or x seconds after boot, or x seconds after the timer itself fired, or x seconds after another unit was deactivated. Or combinations.
Oh wow, that's cool. Do you know if servers currently support this? Would this mostly be useful on a network level or do you think it would also be useful for like trying to be more intelligent about scheduling?
This thread is one of those cases where you read something and realize you've been completely missing something. I don't monitor these as much as I should
Servers/services? Definitely - take your pick. Timers/jobs, particularly those on my system? Nothing!
With the right directives laid out ('Wants/Requires/Before/After'), they can be pretty robust/easily forgotten.
I've been lucky in this regard; I check 'systemctl list-timers' just to be sure - but they always run
It's essentially a problem of distributed timers and distributed transactions.
(If anyone has any resources on how similar problems have been solved in the past, I'd appreciate it.)
reply