Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Strict Posix Nonconformance (www.tedunangst.com) similar stories update story
69.0 points by ctoth | karma 9744 | avg karma 7.64 2013-08-04 19:31:49+00:00 | hide | past | favorite | 45 comments



view as:

Oh no mkstemp is already obsolete (in Linux), now we have O_TMPFILE [1] for creating temporary files that do not even have names...

[1] https://lwn.net/Articles/557314/


I don't think that makes mkstemp obsolete as it may be important to have named temp files.

(and of course though there is a slight race you can unlink the file you get through mkstemp while keeping it open, that seems to be the strategy used by python's tempfile.TemporaryFile)


I can't think of any case, except if you are too lazy to pass a file descriptor to another process, and its easier to pass the name (which cannot be ruled out, it is easier!). You can't safely close and reopen the file, or it might have been modified.

There is still a possibility that an attacker opens the file before you unlink it and starts writing to it, which O_TMPFILE gets rid of as it never has a name so can never be opened.


Does every program which accepts filenames also accept file descriptors? I can definitely see the use of having a name to pass around.

On Linux at least (not sure about OpenBSD), every open file has a name somewhere in /proc, so you could pass that.

Which defeats having a flag for an "invisible" temp file, since it'll still be visible (and accessible) in /proc.

The case where a name is required is where you want to rename or link the temporary file to a new, permanent name in the filesystem.

It's also useful for debugging. If the file has a name, you can look at the file while the program is stopped in a debugger.

How about whenever you're writing a non=temporary file? I do this all the time:

1. Create temporary file in the same directory as the target path.

2. Write and fsync the temporary file

3. Rename the temporary file to the target filename

Oh, and on Linux you can open even deleted files by using /proc/<N>/fd/<M>. It may look like a symlink, but it's not.


OK that's a good reason!

You can create files form the fd you get from O_TMPFILE it turns out http://lwn.net/SubscriberLink/562488/3c6d56bef275c0b4/

Even ignoring the fact that these are not exactly the same... How does introducing a new interface necessarily "obsolete" the old one? Particularly when the old one is part of a standard, and the new one is not.

Some part of me would love to see a new group - one with a more pragmatic approach - fix the POSIX spec in a way similar to what happened with HTML5 and WHATWG. The other part of me wonders if it's even worth the trouble - this would involve taking the time of knowledgeable people away from projects worthy of that time to do this. What value would we get out if it? Would it be worth the cost?

So you want many broken implementations with quirks of their own? (HTML5 + WHATWG's output).

There needs to be a single dictator with a single standard. The problem so far is that the dictator and standard need to be hit with the clue stick man times before rather than pushing corporate features and agendas. POSIX, HTML5, Java are all design-by-committee crapfests.

Golang is about right on this. Once core standard and reference implementation which is opinionated and built by people who know their shit.


As I understood it, HTML5 was specifically about getting away from the design-by-committee mode that was causing problems in XHTML. It was a group of people that got together and said, "Okay, what are peole actually doing right now, that is what needs to be documented in the standard." They did not want a perscriptive standard, but rather, a descriptive one.

Am I wrong about this?


You're not.

And it's not like they wanted a descriptive standard, it's that they knew a prescriptive one would not fly and would just be completely ignored due to the conflicting interests of the half-dozen actors.

A standard everybody ignores is utterly pointless.


Golang's stdlib is not the best example to use to back up your argument.

It's a type-casted ugly hackfest that serves only to demonstrate the mis-design of the language itself.


Examples?

Unlike modern web browsers, unix variants are a very inequal market. Linux can and does introduce new APIs that programs then depend on, without consulting anyone else or trying to standardize them, rather like IE in the bad old days (see recent cgroups/systemd). Until another unixlike achieves firefox-like success we're unlikely to see any change - it's not in linux's interest to cooperate, and linux doesn't even bother staying source-compatible with previous versions of itself, never mind other systems.

There is another family of Unixlikes that has significant market share, and isn't chasing Linux's API changes.

One that people use a POSIXy API to program for?

  ~ » uname
  Darwin
  ~ » man shm_open
  SHM_OPEN(2)                 BSD System Calls Manual

There's another platform with a larger marketshare that has a POSIX API available too. The question is whether it's the API that developers for that platform actually use.

It doesn't really have much of a POSIX API though. It could but it is very incomplete, deliberately, having just enough to bootstrap the other environment...


I was curious whether the fact that shm_open() essentially calls open() is true, and in fact it is, at least with eglibc on Linux:

http://sources.debian.net/src/eglibc/2.17-3/sysdeps/posix/sh...


Cute, it doesn't even try to use the race-free O_CLOEXEC variant, which was introduced "only" 6 years ago. http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/sysdeps/... is the trunk version. Seems glibc trunk is the same

“It” (that is POSIX fall-back) don't try it because O_CLOEXEC is not POSIX. And you don't know how to look properly

http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix...



If you want a C library that takes that sort of thing seriously you should be using Musl[1] as they do care about things like that.

[1] http://git.musl-libc.org/cgit/musl/tree/src/mman/shm_open.c


Strangely this post managed to give me the exact opposite impression of what the poster intended.

Based on the description in this post, it seems to me that shm_open() has been a successful addition to the standard. It's basically a semantic annotation: "I intend to use this file descriptor to share memory between processes". The OpenBSD people looked at projects that use this API and discovered that they're overwhelmingly using it to share memory between processes owned by one user (e.g. WebKit), so they solidified that practice into a hard limit as part of their implementation.

Isn't this perfectly in the spirit of why we have all these Unix variants in the first place? Some of them will be at the bleeding edge of implementing new APIs, whereas others like OpenBSD take the cautious and security-minded approach. Thanks to the work of the OpenBSD crew and the (granted, apparently largely implicit) semantics of shm_open(), WebKit is now a bit safer to use in a multi-user scenario on OpenBSD than elsewhere. Maybe others will adopt this interpretation of shm_open() and everyone wins.


Posix is not the bleeding edge though, it is mostly fairly trailing edge...

I think bleeding edge was in reference to the Posix standard. The Posix standard has been updated as recently as this year (http://pubs.opengroup.org/onlinepubs/9699919799/).

Yes but it is codifying old stuff largely not new things...

It seems like this is only really a problem if someone uses mode = 0777. But not every caller is going to do that. Are there really such a large group of bad programmers who are still using low-level C interfaces? And can't OpenBSD just audit the handful of programs that are using shared memory?

[note: if your reply is going to be some variant of "all programmers are shit"-- not interested, heard it before.]


I think the attack goes something like: You figure out how you should name a temp file by checking to see if files with the same name already exist. Between when you decide on the name and when you open the file, a nefarious user creates her own temp file with mode = 0777. You open your file and write to it, not realizing that another user can now read all your temporary data. Because the file was already created when you opened it, whether your umask is set properly doesn't matter.

It sounds like the OpenBSD implementation would throw an error if the file was owned by someone else when you tried to shm_open() it, which mitigates this race attack. mkstemp mitigates this attack by atomically determining the name and opening the file without the opportunity for a nefarious process to touch the file system in between.


I suppose it depends on how good you are at coming up with random file names. You could also use O_EXCL | O_CREAT to fail if the file already existed. The more I think about it, though, the worse this whole interface is starting to smell.

As I see it from my own programming experience, the main value of "everything is a file" philosophy is in the unified namespace. Except sockets are file descriptors but not in the filesystem namespace... except when they are "unix domain". The process identifiers aren't have their own namespace but they are sorta in the filesystem via the linux specific /proc. And of course there's the sysv shared memory which comes with it's own namspace (except if you mmap an actual filesystem node and just... ugh... open it from two different process). Sigh.

From what I've always understood, the philosophy isn't "everything is a file" (the unified namespace variant), but "everything is a file descriptor" (the API kind).

Well my point was that I personally find the idea of a unified namespace for global system objects (files, pipes, sockets, processes etc) more powerful than the idea of a unified namespace of fd's within a single process.

/proc is not Linux specific it came from Solaris and is pretty widespread now.

Linux has bizarre named Unix domain sockets of two flavours one with names in the filesystem and one with names in another namespace...


The point of shm_open() is the relaxed guarantees as opposed to plain open() - primarily these two:

It is unspecified whether the name appears in the file system and is visible to other functions that take pathnames as arguments.

It is unspecified whether the name and shared memory object state remain valid after a system reboot.

That gives the implementation wider latitude in how it implements the function, for example shm_open() objects may be entirely in-memory.


I really liked QNX, when it fit on a couple floppies, and before it swallowed the POSIX anchor, and so .. sure.

Perhaps I should expand the history, for those who don't know. In the early 80s, QNX was the first fully multitasking operating system available for the IBM PC architecture. It was small, efficient, real-time, and somewhat idiosyncratic. It was fully ten years ahead of its time.

At the same time, in another sphere, POSIX was shaping up as factor in the emerging Open Systems wars. It drove compatibility between big vendors like IBM and HP and Sun (over other idiosyncratic offerings, like Apollo).

Perhaps POSIX as a user response to corporate power had value, but there was also collateral damage. As I understand the story, Canadian colleges standardized on POSIX, and so a great little Canadian OS (QNX) was left out in the cold. They had to become like others to survive.

Conformity served to reduce innovation and developer choice.

Now fast forward to 2013, and leaving aside shared memory details, what is POSIX driving today? Don't forget that ultimately a non-POSIX OS killed them all. And don't forget that users have their own means of bubbling up new features and architectures in the Open Source projects. Their control is far beyond what users got out of POSIX in the Open Systems age.


Legal | privacy