As I see it from my own programming experience, the main value of "everything is a file" philosophy is in the unified namespace. Except sockets are file descriptors but not in the filesystem namespace... except when they are "unix domain". The process identifiers aren't have their own namespace but they are sorta in the filesystem via the linux specific /proc. And of course there's the sysv shared memory which comes with it's own namspace (except if you mmap an actual filesystem node and just... ugh... open it from two different process). Sigh.
> How you handle a file no longer existing vs. a socket disconnection are not likely to be very similar. I'm sure I'll get counter arguments to this,
That's my cue! I think the "everything is a file" is somewhat misunderstood. I might even rephrase it as "everything is a file descriptor" first, but then if you need to give a name to it, or ACL that thing, that's what the file-system is for: that all the objects share a common hierarchy, and different things that need names can be put in say, the same directory. I.e., that there is one hierarchy, for pipes, devices, normal files, etc.
I'd argue that the stuff that "is" a file (named or anonymous) or a file descriptor is actually rather small, and most operations are going to require knowing what kind of file you have.
E.g., in Linux, read()/write() behave differently when operating on, e.g., an eventfd, read() requires a buffer of at least 8 B.
Heck, "close" might really be the only thing that makes sense, generically. (I'd also argue that a (fallible¹) "link" call should be in that category, but alas, it isn't on Linux, and potentially unlink for similar reasons — i.e., give and remove names from a file. I think this is a design mistake personally, but Linux is what it is. What I think is a bigger mistake is having separate/isolated namespaces for separate types of objects. POSIX, and hence Linux, makes this error in a few spots.)
But if you're just writing a TCP server, yeah, that probably doesn't matter.
> and that you should write your applications to treat these the same.
But I wouldn't argue that. A socket likely is a socket for some very specific purpose (whatever the app is, most likely), and that specific purpose is going mean it will be treated as that.
In OOP terms, just b/c something might have a base class doesn't mean we always treat it as only an instance of the base class. Sometimes the stuff on the derived class is very important to the function of the program, and that's fine.
¹e.g., in the case of an anonymous normal file (an unlinked temp file, e.g., O_TMPFILE), attempting to link it on a different file-system would have to fail.
Linux interpretation of "everything is a file" includes that things like sockets (including netlink) are files, because you interact with them through file descriptors.
This is an absurd interpretation of "everything is a file." The kind that makes you go "arghwhat?!"
A file descriptor is exactly nothing like a file. You cannot write to a pidfd. You cannot waitid an eventfd. You cannot getsockopt on a regular file. The only operations that all file descriptors have in common is close, dup, poll and some other basic operations.
So "file descriptor" basically just means "kernel interface object" and the available operations depend on the type of object.
A file is a container for arbitrary data that I can read from, write to, and reposition the read/write cursor in. If you call anything else a "file" then you haven't made everything a file, you've just redefined "file" to mean "thing."
What business does a socket have in a physical, on-disk filesystem? (Let alone a clunky hack to invoke the much simpler `signal` syscall in a roundabout way?) The socket "file" is completely meaningless unless the process that opened it is currently alive and still listening on it. So why the fuck should it get written to a persistent storage device?
How do I specify the socket type, which is a meaningless concept for an actual file, when I open a socket "file"? Oh that's right, I don't. Because I don't open a socket. I bind or connect. I don't use "file" APIs because they're not applicable. I use a dedicated socket API that's fit for the purpose.
Pidfd was not introduced to treat processes as "files," it was introduced so they could share the operations that they do meaningfully share with other kernel objects (e.g. poll).
The overloaded ioctl syscall is bad design. The proc "filesystem" is bad design. /dev/shm is ridiculous design. So I have to mock a fake filesystem in memory so I can create a fake file in that "filesystem" just so I can get the same memory pages mapped into my virtual address space as some other process, all of which has absolutely nothing to do with files or a filesystem (and is much lower level than that). lolwat?
That's the Plan 9 way. The "everything is a file" model mostly relates to exposing all objects as file descriptors that support read and write operations--specifically the read and write syscalls. IOW, everything is a file means the universal API for all objects[1] is read and write. How you acquire the descriptor is a fuzzy area. Using open on virtual namespaces makes alot of sense, but not always, and in any event no incarnation of Unix, not even Linux, supports virtual namespaces as comprehensively and seamlessly as Plan 9.
[1] From a systems programming standpoint--kernel resources, IPC, etc. But there's an obvious relationship to, e.g., Smalltalk objects. (Just don't ask me what it is ;)
I don't think "everything is a file" was ever supposed to mean "everything works just like a disk file", or even "everything has a name in the filesystem". It really meant "every kernel-managed resource is accessed through a file descriptor". Of course, even this formulation isn't always true (network interfaces and processes are the classic exceptions), but it's a true in a lot more cases. In this sense, "file descriptors" are really just the UNIX name for what Windows calls a "handle".
This is still quite a good paradigm to follow - the semantics for waiting on, duplicating, closing and sending file descriptors to other processes are generally well-defined and well understood. For example, Linux exposes resources like timers, signals and namespaces through file descriptors.
I liked the idea of filerefs for everything precisely because I hate the sockets() semantics. But, most of this is about the awful pain of the ioctl() stuff you either have to call as setup magic on the FD, or pass as setsockopts() because you cant coerce enough into the limited modality of file-like moments opening the (pseudo) file.
Really it feels more like 'what is the hierarchical structure of my namespace' because once you nail that down, the file semantics become clearer. if its async io under the class of io its open("/io/async/my-thing", ...)
So its one of those yes.. but its so hard to nail it down moments.
To elaborate on those points a bit further, past what I already said about pidfd being introduced exactly to treat processes as files:
1. ioctls can make any "syscall" on a file.
2. a process does not have to be a singular file. All processes have most if not all their attributes exposed as files /proc/$PID/ as files, and can have this arbitrarily extended. In plan9, passing a signal (technically a note) is done by writing to /proc/$PID/note, and there is no technical reason for not allowing the same on Linux.
3. the entire concept of memory mapping is based around files, with anonymous memory - i.e., non-disk-backed memory - just being a subset of this. POSIX shared memory (shm_open) is provided through /dev/shm, which is a tmpfs folder and is indeed just files.
4. sockets are file descriptors, and file descriptors is what makes a file, and as such you can get a peers address of a file descriptor when such is present. Ways to expose creating sockets in the filesystem also exist, and not just for plan. The special socket-bits could easily be made less special, with the only justification for the current BSD socket API being that it became dominant and so everyone copied it.
Both pipes and sockets are files, in the sense that they can appear on the filesystem. Some people refer to pipe(2) pipes as anonymous pipes — they're anonymous because they're not in the filesystem. (One in there is a "named" pipe.)
(I do sort of wish most object creation in the FS was a pair of,
int fd = create_a_file_descriptor_of_some_type();
link_fd(fd, "/the/name/for/it");
I think that'd be a cleaner abstraction, and then introduce syscalls on top of that for doing them together in case you think that's too much user/kernel traffic.)
I don't think using the term "file" for any kernel-managed resource is common outside of Unix terminology. It is certainly not the terminology of Windows. It is not even really true on modern Linux either, nor was it ever fully true in Unix land - Plan9 is probably the only OS that really went for "everything is a file" to the utmost degree.
For example, in either Unix or Linux, a socket is not really a file, even though an open socket has an associated fd. But you can't open a new connection by using open()/read()/write() on any Unix or Linux system (you can in Plan9). So these are not really files. Similarly, most devices expose special ioctl() codes to perform operations on them and barely interact with read() or write().
He said it creates two pipes, and later that each pipe creates two descriptors.
He never said there was no reason that sockets behaved differently than other descriptors, only that the reasons didn't warrant the broken abstraction.
And finally, I think you missed the place where he claims the abstraction breaks down. He's not complaining that you use "open" to get a file's descriptor and "pipe" to get a pipe's two descriptors, any more than it's weird to get a socket descriptor for "socket". He's saying that once you have the descriptor, the single file descriptor for sockets forces the OS to implement a socket-specific call simply to get a FIN sent properly.
The reality is that sockets could be represented not with integer descriptors but with character arrays or structs and the Unix interface wouldn't be any less incoherent than it is now with accepting descriptors, a select call that works for sockets and not files, setsockopt, shutdown, recv, connected Unix domain sockets, ioctls, and so on and so forth.
> A file descriptor is a handle to a file, and anything you have an fd to is a file. This file carries a vfs implementation, such as that of pidfd, a device driver, or disk storage. The kernel does not distinguish between these.
Which is what I said. It's a "file" in name only.
> If not being able to write makes it not a file, then files stop existing when a disk is full, and means that /dev/zero and /dev/null are not files - despite being at the heart of the whole "everything is a file" paradigm.
Great example that showcases the idiocy of Everything Is A File. /dev/null and /dev/zero are basic parts of the Unix API, so the basic OS API is broken-by-default at boot until one mounts a file system that had these dummy "device" nodes at a specific path that's hardcoded everywhere.
Instead of providing a sensible API like memfd_create or timerfd_create, for example.
> A streaming socket is exactly like a normal file. You read, write and poll.
That doesn't make it a file, that makes it an object that shares common traits with file objects. Datagram sockets do not read/write because they're not bound to a fixed remote address. And that's perfectly fine. They're not files.
> The only thing that is special is how to create it, but that is a design decision, not a technical limitation
And it's a good design decision. The socket API is pretty decent except for the dumb "file" nodes it creates when listening on standard unix sockets.
> see the plan9 file based API for making TCP sockets, which is trivially implementable in Linux.
But thank God it's not implemented in Linux.
> ioctls are not themselves bad design. In fact, scoping kernel functionality onto file handles is a great design
Agreed, the only bad part is that too much was shoehorned into the same syscall. It's certainly better than magic "files" that pretend to be "files" by having you read and write structs from, but you're only allowed to read and write whole structs per syscall, which has no resemblance whatsoever to how reading from and writing to a file work. (Maybe Plan9 doesn't have this limitation and tries harder to keep up the charade, I wouldn't know.)
There are named files (which have a file path) and anonymous files (which do not). You can see these in /proc/$PID/fd/$FD if you're curious - when the link doesn't start with '/', it's anonymous. Even process memory is just an anonymous file on Linux, and arguably a cleaner one as it operates on proper fds, instead of plan9 where a string "class name" (not a path) is used to access the magical '#g' filesystem.
The difference to plan9 is not the files, but the way plan9 uses text protocols with read/write to ctl files. To open a TCP connection - if memory serves me right - you first have to write to a ctl file, which creates a new file for the connection. Then, you write the dial command to the ctl file of that connection, and after which you can open the connection file. On Linux, a syscall creates an anonymous file, and then everything after is operations on this anonymous file.
There's some ideological benefits, but plan9 creates a mess of implicit text protocols, ugly string handling, syscall storms and memory inefficiencies. Their design is pretty much solely a limitation caused by the idea that all filesystems should exist through the 9p protocol, which as a networked protocol cannot share data (fds, structs), only copy (payloads to/from read, write). With the idea that all functionality had to be replaceable and mountable from remote machines, the only possible API became read/write for everything.
I'd argue that fd-using syscalls and ioctls - basically per-file syscalls - is a superior approach to implement everything-as-a-file.
Really though, it's the file descriptor that matters. This is the abstraction that lets you do things like pass a socket between different processes in the same way that you'd pass an opened directory between different processes.
The few places where this abstraction is missing are painful. A good example is ptrace. You can't pass a ptrace between processes. Handling it nicely in poll or select isn't easy.
The use of file descriptors might not seem so amazing today, now that Windows has the HANDLE and MacOS X has the Mach port, but it was revolutionary when it was introduced. For about 15 years, it was just a UNIX thing. MS-DOS had separate ways to deal with everything: files, directories, each different vendor's network stack, etc. Every other OS was like that, more or less.
The OS kernel needs to provide user space processes with access to kernel-managed resources such as files, devices, network sockets, processes, threads, etc. In order to do so, there needs to be some way of identifying these individual resources to the kernel. And then the question is, should every type of resource have a distinct type of identifier? Or should we have a single type of identifier which could refer to an instance of any one of those types of resources?
The pure "everything is a file descriptor" ideology (or its Windows NT equivalent, "everything is a handle") says we should have a single type of identifier, the file descriptor (or handle), which can represent resources of any type for which the process can invoke kernel services – processes, threads, files, network sockets, etc.
Standardised POSIX does a poor job of living up to this ideology, since APIs for managing processes (kill, waitpid, fork, etc) take and return PIDs, not file descriptors. Since a process is not a file descriptor, I can't select()/poll() on it. Using PIDs is also prone to race conditions, whereas file descriptors are less prone to this problem (although not completely immune from it.)
The process descriptor functions such as pdfork provided by FreeBSD do a much better job of living up to "everything is a file descriptor" ideology than pure standardised POSIX does.
> I don't believe everything can be abstracted as a file considering that ioctl() breaks that abstraction. For example, what is it to "read from /proc/{pid for Chrome}"?
Why must every file support read() and write()? Some device files or other special files might only support ioctl(), and maybe also select() and poll(), and I see nothing in principle is wrong with that.
reply