Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Or in other words: the target command is invoked in an isolated exec context, freshly forked off PID 1, ...

Of course. The solution to every Linux "problem" is, of fucking course, to have the PID1 spread is tentacles to yet more part of Linux.

Every single problem can be solved by giving yet more power to PID1... Except the problem of PID1 having too much power.



sort by: page size:

> If you want to prevent fork/exec, that's super easy to do in a conventional Linux application:

And mmap(), mprotect(), ptrace(), ..., $syscall_du_jour(). It's not about fork/exec, it's about limiting the APIs available to the unikernel by default and by design.

Yes, you can do this for conventional Linux applications. But I wouldn't call it super easy, at least not for the vast majority of developers out there.


> So why is there not a fork-exec combo?

posix_spawn

> Why would anyone ever want fork as a primitive?

With fork you can very easily write a sever like mini_httpd:

https://acme.com/software/mini_httpd/

Or, in Unix shells:

  # function1 and funtion2 are shell functions

  $ function1 | grep foo | function2 
here, the shell must fork a process (without exec) to run one of these functions.

For instance function1 might run in a fork, the grep is a fork and exec of course, and function2 could be in the shell's primary process.

In the POSIX shell language, fork is so tightly integrated that you can access it just by parenthesizing commands:

  $ (cd /path/to/whatever; command) && other command
Everything in the parentheses is a sub-process; the effect of the cd, and any variable assignments, are lost (whether exported to the environment or not).

In Lisp terms, fork makes everything dynamically scoped, and rebinds it in the child's context: except for inherited resources like signal handlers and file descriptors.

Imagine every memory location having *earmuffs* like a defvar, and being bound to its current value by a giant let, and imagine that being blindingly efficient to do thanks to VM hardware.


>So why is there not a fork-exec combo?

There is, posix_spawn.


> if a big one wants to spawn children

Should this be considered poor behavior or design?

It's been quite a while since I wrote explicit fork/ exec code, but wouldn't a better approach be to have a small master process that spawns off the necessary children and then either links them up or mediates communication?

I mean, on a unix-like system, init is ultimately the spawner of everything else and it's not a particularly large process.


> the kernel has an easier time of single-handedly cleaning up all file descriptors, memory mappings, etc. in one fell swoop inside execve()

fork() / execve() isn't a good mechanism for launching new processes; it's one of the worst aspects of early Unix design.

A properly designed syscall should require the caller to specify the desired behavior explicitly:

1. Do I want to inherit file descriptors, etc or do I want a clean slate? 2. Do I want a separate address space or to share address space? 3. What happens to threads, mutexes, locks, etc in the new child process? 4. If a separate address space should I inherit my parent's VM mappings or should I just load and execute a new binary image?

That gives flexibility to the caller and maximum information to the kernel to optimize. Instead we have a bunch of ad-hoc patches like vfork(), clone(), and posix_spawn() (+ non-standard attributes like POSIX_SPAWN_CLOEXEC_DEFAULT, POSIX_SPAWN_SETEXEC, and POSIX_SPAWN_USEVFORK).


> Even Unix follows this pattern for process creation! To create a new process, you clone an existing one (fork) and then delete everything inside it (exec).

I always thought fork was one of the biggest design blunders in unix.


> which had a tight loop to detect when a process was forked.

This sounds so mad it needs more context. You don't need to detect when a process is forked, it's an action issued from inside the process?


> I think he's suggesting that the fork itself should fail if the subsequent overwrite pass to defeat CoW would cause problems.

That'd put you in the unusual position of forbidding any sufficiently large process from ever creating children. The kernel has no way of knowing whether a fork() will be followed by exec().


> why would anyone ever want fork as a primitive

Long ago in the far away land of UNIX, fork was a primitive because the primary use of fork was to do more work on the system. You likely were one of thee or four other people, at any given moment vying for CPU time, and it wasn't uncommon to see loads of 11 on a typical university UNIX system.

> so why is there not a fork-exec combo

you're looking for system(3). Turns out, most people waitpid(fork()). Windows explicitly handles this situation with CreateProcess[0] which does a way better job of it than POSIX does (which, IMO, is the standard for most of the win32 API, but that's a whole can of worms I won't get into).

> why would anyone ever use vfork?

Small shells, tools that need the scheduling weight of "another process" but not for long, etc. See also, waitpid(fork()).

When you have something with MASSIVE page tables, you don't want to spend the time copying the whole thing over. There's a huge overhead to that.

[0] https://docs.microsoft.com/en-us/windows/win32/api/processth...


> There is no sane way of forking a multithreaded program

The sane way of forking a multithreaded process is to exec immediately after.


> Why would anyone ever want fork as a primitive?

fork() without exec() can make sense in the context of a process-per-connection application server (like SSH). I've also used it quite effectively as a threading alternative in some scripting languages.

> So why is there not a fork-exec combo?

There is; it's called posix_spawn(). Like a lot of POSIX APIs, it's kind of overcomplicated, but it does solve a lot of the problems with fork/exec.

> And as long as I'm asking stupid questions, why would anyone ever use vfork?

For processes with a very large address space, fork() can be an expensive operation. vfork() avoids that, so long as you can guarantee that it'll immediately be followed by an exec().


> Even Unix follows this pattern for process creation! To create a new process, you clone an existing one (fork) and then delete everything inside it (exec).

I love this observation.


> that seems useful.

It's useful like an atom bomb is useful - a really heavy-handed approach. Usually this is a smell that indicates forking processes are not communicating well.

It's much better to have well-behaved processes. Send SIGTERM to the parent process, allow the parent to perform cleanup of its own children recursively.


> My understanding is that `fork()` is a nice wrapper around `posix_spawn` .

Not really, you can't implement fork on top of POSIX spawn, but you can do the reverse. On linux they are both implemented on top of a lower lower level primitive (clone).


> Why would anyone ever want fork as a primitive?

> So why is there not a fork-exec combo?

There are so many variations to what you can do with fork+exec that designing a suitable "fork-exec combo" API is really difficult, so any attempts tend to yield a fairly limited API or a very difficult-to-use API, and that ends up being very limiting to its consumers.

On the flip side, fork()+exec() made early Unix development very easy by... avoiding the need to design and implement a complex spawn API in kernel-land.

Nowadays there are spawn APIs. On Unix that would be posix_spawn().

> And as long as I'm asking stupid questions, why would anyone ever use vfork? If the child shares the parent's address space and uses the same stack as the parent, and the parent has to block, how is that different from a function call (other than being more expensive)?

(Not a stupid question.)

You'd use vfork() only to finish setting up the child side before it execs, and the reason you'd use vfork() instead of fork() is that vfork()'s semantics permit a very high performance implementation while fork()'s semantics necessarily preclude a high performance implementation altogether.


> without using fork, exec or posix_spawn ?

Did you manage to forget what you'd read two paragraphs earlier when you reached this bit? Because the essay's first recommendation is literally:

> Only use fork to immediately call exec (or just use posix_spawn).

It seems difficult to infer "don't use exec or posix_spawn" from this.


> You pretty much need to do everything from a separate process, which is the main issue.

I’m confused. fork()/spawn() are the proper means to launch child processes on Linux. What language would do things differently if the syscall determines the means here?


> but it used to be that forking could be almost as efficient under Linux as starting a new thread in an existing process.

That's because internally it's nearly the same thing. Both forking and starting a new thread on Linux is a variant of the clone() system call, the only difference being which things are shared between parent and child.


> I speculate [fork] isn't common, or at least it isn't required for ordinary desktop apps.

From a science perspective, the ability to fork is really convenient for parallel processing. Load some data and then fork a pool of worker processes, and they can all read almost free copies of that data. This is much easier than setting up shared memory. It's a bigger deal on an HPC machine with 128 cores than a laptop with 8 cores, but even on a laptop it's a significant point in favour of Linux.

Getting rid of it in Linux would also be going against Linus' rule that the kernel never breaks userspace.

next

Legal | privacy