Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

irl_chad 2023-05-26 01:24:57 | [–] update item (on: JobRunr: A library for background processing in Java )

We came to the exact same conclusion. EventBridge time triggers a Fargate task. The job automatically terminates the process after execution, so the container shuts down and all is good.

sort by:

page size:

jayd16 | karma 10887 | avg karma 2.09 | 2022-05-18 10:30:46 | [–] similar comments (on: We reduced 502 errors by caring about PID 1 in Kubernetes )

Seems like they were not seeing the race, and instead only saw that open connections would stay open if not explicitly closed during the graceful shutdown.

Once the graceful shutdown was properly executed, it closed any open connections to that pod and stopped the 502s they were seeing. Sounds like either the race wasn't happening or they didn't see/care about it.

bajsejohannes | karma 2504 | avg karma 4.92 | 2021-07-22 09:42:41 | [–] similar comments (on: Kubernetes is our generation's Multics )

> You tell your applications to drain connections and gracefully exit on SIGTERM.

The problem is that k8s will send requests to your application after SIGTERM. So you have to wait some amount of time before shutting down to allow for that.

This was at least the case last time I used k8s, and it seemed like it was due to the distributed architecture, so something that was more than a mere bugfix away.

asveikau | karma 11849 | avg karma 2.56 | 2024-06-23 16:25:14 | [–] similar comments (on: I found an 8 years old bug in Xorg )

> Basically when a window is created, we receive an event. After getting that event, we lock the X server, then ask it about the new window. And sometimes, the window is just not there

Relying on this sounds like a race condition even if the lock is working. In the time between you process the event and getting the lock, the window could have been destroyed.

dylanz | karma 1767 | avg karma 3.26 | 2014-06-27 15:57:24+00:00 | [–] similar comments (on: Too Many Signals – Resque on Heroku )

Great post, and this is something we've faced as well. Luckily our jobs are mainly idempotent, and the ones which are not, aren't that critical. This is a pretty nice solution! Ethan, the errors you still see from jobs that take more than PRE_TERM_TIMEOUT seconds... I'm assuming that's a separate, job specific issue, like talking to timing out external services/etc?

I noticed the "wait 5 seconds, and then a KILL signal if it has not quit" comment in the code above the new_kill_child method. Without jumping into the code, is the normal process sending a TERM, then forcing a KILL after 5 seconds? Just curious.

spullara | karma 9487 | avg karma 3.26 | 2021-10-27 16:51:31 | [–] similar comments (on: OpenTelemetry )

I hate that this specification and most of the other ones use spans that have a beginning and end rather than events that start and end the span. What if it crashes before it sends out the span? What if it is taking a very long time to complete?

rbanffy | karma 158565 | avg karma 2.97 | 2010-07-05 00:24:58+00:00 | [–] similar comments (on: Python 2.7 released )

> You _can_ block the whole process with a long lived handler.

Or you could handle the event out-of-process.

dasyatidprime | karma 615 | avg karma 2.32 | 2024-06-03 13:54:11 | [–] similar comments (on: Why Ruby's Timeout is dangerous (and Thread.raise is terrifying) (2015) )

Task.await tries to exit the calling process when the timeout hits, but IEx traps the exit in that process, so it doesn't terminate and thus the linked task process doesn't either, I think? If I do all of this wrapped in another task, rather than directly in IEx, then I observe the innermost process get terminated by the process link after the intervening one doesn't trap the exit.

Relevant from https://hexdocs.pm/elixir/1.4.5/Task.html, which you've probably already seen:

> If the timeout is exceeded, await will exit; however, the task will continue to run. When the calling process exits, its exit signal will terminate the task if it is not trapping exits.

cube2222 | karma 8803 | avg karma 6.36 | 2017-11-10 23:53:50+00:00 | [–] similar comments (on: Eight years of Go )

No, it's saying: if my goroutines crash it's the same as if my main thread crashed, which means: game over, application down.

Which gets handled by the container scheduler

tomatowurst | karma 627 | avg karma 1.72 | 2022-03-28 11:15:17 | [–] similar comments (on: A *magical* AWS serverless developer experience )

what if a process takes like an hour or two to finish, can apprunner handle this?

Often I find myself panicking because I can't finish a task in 15 minute limit, and so I end up spinning up Lightsail server to process long running tasks, which means I need to create SQS queue to manage pending jobs, and its a wheel I seem to reinvent constantly.

einhverfr | karma 5546 | avg karma 1.68 | 2017-02-14 03:17:45 | [–] similar comments (on: Is PostgreSQL good enough? )

Not quite.

In my experience you have several critical issues:

1. What happens when a job silently fails?

2. What happens when a job takes a lot longer than expected to succeed?

If you solve the first with a timeout, the second leads to a job rerun. The best (only?) solution I have found is to have some awareness in the job queue of the fact that the job is currently being processed. In my previous work we used advisory locks for that.

ptx | karma 4263 | avg karma 2.44 | 2024-02-24 21:14:22 | [–] similar comments (on: V Language Review (2023) )

> it will process events while waiting for syscall

How does that work?

According to the source code quoted in the article, there is a separate "coroutine-safe version of time.sleep", which seems like it shouldn't be needed if V has a general solution for unblocking blocking syscalls.

mgaunard | karma 1512 | avg karma 1.05 | 2022-01-02 12:07:17 | [–] similar comments (on: Fixing Stutters in Papers Please on Linux )

close can take arbitrarily long, it's a blocking operation.

Don't ever call close on the hot path.

ryanworl | karma 1448 | avg karma 3.59 | 2020-11-11 21:25:05+00:00 | [–] similar comments (on: How we designed Dropbox’s ATF – an async task framework )

"To avoid this situation, there is a termination logic in the Executor processes whereby an Executor process terminates itself as soon as three consecutive heartbeat calls fail. Each heartbeat timeout is large enough to eclipse three consecutive heartbeat failures. This ensures that the Store Consumer cannot pull such tasks before the termination logic ends them—the second method that helps achieve this guarantee."

Neither this or the first method guarantees a lack of concurrent execution. A long GC pause or VM migration after the second check could allow the job to get rescheduled due to timeout. The first worker could resume thinking it still had one heartbeat left to execute before giving up on the job and it could've already been handed out to another worker in the meantime.

exprA | karma 51 | avg karma 1.38 | 2017-03-07 07:06:25 | [–] similar comments (on: /usr/bin/time: not the command you think you know )

The sleep sub-process of time is not part of the pipeline, however. It will exit after the requested time has passed.

cbo100 | karma 173 | avg karma 2.47 | 2021-06-12 12:29:53+00:00 | [–] similar comments (on: Do you really need Redis? How to get away with just PostgreSQL )

Same for any queuing system. You need to set the expiry time long enough for the expected task duration.

In SQS, for example, you use a visibility timeout set high enough that you will have time to finish the job and delete the message before SQS hand it off to another reader.

You won’t always finish in time though, so ideally jobs are idempotent.

felixgallo | karma 1914 | avg karma 3.2 | 2016-02-09 20:57:53 | [–] similar comments (on: Erlang Scheduler Details and Why They Matter )

In this case just taking more than a millisecond can cause scheduler collapse. So it's a pretty easy mistake to make.

magnat | karma 1441 | avg karma 7.03 | 2018-05-07 07:29:24 | [–] similar comments (on: The Various Kinds of IO – Blocking, Non-Blocking, Multiplexed and Async )

> On the IO completing, the OS will suspend your thread, and execute your callback.

On Windows, you have to put your thread to sleep to receive any callbacks [1]. If OS would suspend your thread at random point to execute a callback, that could lead to hard-to-detect/debug deadlocks and race conditions.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

addaon | karma 2930 | avg karma 3.53 | 2024-02-02 21:45:58 | [–] similar comments (on: Aerugo – RTOS for aerospace uses written in Rust )

> What happens when a tasklet take too long?

The same thing that happens when 1+1 == 3, or when a task tries to write to memory that it doesn't have permissions for. The static analysis that your system relies on for correct behavior is no longer valid, so a hardware belt-and-suspender mechanism (a schedule overrun timer interrupt, a lockstep core check failure, or an MPU fault, respectively) resets or otherwise safe-states the failed ECU and safety is assured higher up in the system analysis.

yufw | karma 37 | avg karma 2.31 | 2021-01-01 18:27:21+00:00 | [–] similar comments (on: Show HN: SharePad.io – A collaborative code editor and compiler )

I have running time limit on the containers.

The first snippet is a fork bomb and will cause the container to run out of memory before the timeout. It does terminate since I have set a 256M memory limit. However, it is not sending the correct response in this case and the message in the output tab is not updated properly.

Fork works just fine, https://sharepad.io/p/aBn5Oxu

Legal | privacy