It's not really a 'bit beyond'. The proof is about the machine code that runs on the CPU! It's a huge abstraction gap from C and it is the result of years of formal proofs and PL research.
About the effort being worth it or not, I believe that security critical programs MUST have formal proofs. Vulnerabilities in this kind of software are extremely costly. The same goes for software whose failure can be of danger for humans (c.f. the Toyota debacle).
The proof states that the C code of the kernel is a refinement of an high-level "abstract" specification of the kernel. This means that the C code only exhibits behaviours that are also exhibited by the high-level specification. The high-level specification (by construction) doesn't have certain bad behaviours that you often see in C programs: NULL pointer dereferences, buffer overruns, signed integer overflow, etc. Thus this class of bugs is proven to be absent. A good primer is available at: http://www.ertos.nicta.com.au/research/l4.verified/proof.pml
The original proof didn't make any claims about the high-level behaviour of the kernel, such as security. So for instance, the original kernel might have had a bug where you set up a system so that two processes shouldn't be able to communicate with each other, but after some unexpected sequence of API calls they form a communications channel between them.
NICTA in the following years after the initial proof carried out two security proofs: an integrity proof and an information flow proof. These proofs state that the capabilities possessed by a process determine (i) what other processes a particular process can modify; and (ii) what other processes the process can communicate with. Like the original proof, these security proofs (especially the second) have assumptions. In particular, they can only talk about parts of the system that are modelled. Timing channels, hidden CPU state, etc. not modelled by the proof may still allow information flow between processes, for instance.
Sorry, I don't think you can claim you have a mathematical proof of the correctness of your kernel when you assume this much.
You need to assume something. Every operating system kernel makes these assumptions: seL4 is just explicit about them. Some of the assumptions are bigger than others (management of caches is a pain to get right and hard to test, for instance). Some assumptions remain just due to a lack of time, such as proving the initialisation code of the kernel.
Additionally, as time has gone on the assumptions have been reduced. For instance, the work of Thomas Sewell and Magnus Myreen mean that the compiler no longer needs to be trusted: https://www.cl.cam.ac.uk/~mom22/pldi13.pdf
And where is this proof? I can't seem to find it anywhere on their web site.
The proof will be part of the open-source release. It consists of around 200,000 lines of Isabelle/HOL proof script, which is machine-checked. Anyone will be able to inspect/modify/check the proofs after the release.
The end result is generated C code. So you could simply add the generated C code to the kernel like you would any other C code. And keep the proof chain separate from the kernel. My guess is that it is what Microsoft is doing with the kernel code.
Oh, I see. You said that it is “written in C”, and then later denied the claim that it is also written with that formal verification language. If you delete those annotations it will still adhere to the specification,[1] but it’s still false that it was purely written in C—your statement that “there are formally verified security guarantees” in fact relies on the existence of that proof code.
But fine. With that kind of dishonesty I could claim that the Linux kernel is written in nothing: I have compiled it and deleted the source code. Look ma, it’s no-code kernel.
[1] Kind of a bummer though that you cannot change any of the C code any more now that the verification code is gone.
Do you have a piece of code you feel confident about, such that you'd be able to say "here is an existence proof of carefully-written resilient C code of some significant length"? I'd like to take you up on an offer to validate that proof.
The entire QK[1] kernel, including all library code, is less that two thousand lines of C (and most of the library code isn't even used by the kernel itself, which is just 234 LOC). QK is a single-processor kernel (although still pre-emptive) and very well written and tested (in, literally, hundreds of millions of devices).
It was not difficult at all to run QK and it's supporting code through KLEE and exhaustively verify the properties of each function, thanks to a super-simple design and the many included assertions, preconditions, and postconditions, which KLEE helpfully proves are satisfied automatically. If I wanted a certified optimizing C compiler, I'd use CompCert[2] to compile QK, which would give me a certified kernel all the way to machine code. (I actually use Clang.)
I am familiar with the verified L4 kernel, and it is far more complex than the QK kernel I have verified. That people have already verified a kernel far more complex should be sufficient proof that a much simpler kernel can also have its correctness verified.
Stepping back, it's 2012: people should no longer be surprised when a small-but-meaningful codebase is certified correct. It's common enough now that you don't get published in a journal just because you did it.
Why all the skepticism? Of course you can't prove that it works how people imagine it, but the spec can be a lot higher level than the program itself.
You can already prove a lot of things about a large class of programs, depending on the language and tools, like the absence of buffer overruns, etc.
This project has a spec written in Haskell and then a proven 1:1 mapping at a certain level of detail between that spec and the kernel itself. You can prove an awful lot of things that way that do make a kernel like this more trustworthy and stable.
A quick search shows a bunch of papers about RCU possibly containing proofs. I don't know if there is a coq proof, but there are other dedicated tools to model lock-free algorithms, or possibly the proofs were done by hand.
Of course that's still not proving the actual C code, the translation probably still need to checked by hand.
what sel4 shows is that it's entirely preventable with a rewrite from scratch by a team of formal-methods ph.d.s over many years, if they invent a design that allows the kernel to only be a few thousand lines of code so that the titanic effort of formal verification becomes feasible, barely. it's not something you can do with a codebase of millions of lines of code or a codebase that wasn't written from scratch with formal verification in mind. yet, anyway
Those are interesting and extremely good ideas, thanks for pointing them out.
I do wonder if OpenBSD would accept a C compiler built on a small functional language amenable to these kinds of proofs. I suppose fulfilling the portability requirements is priority one, then stability. I just wonder if they would accept something not written in C at all.
The goal is to prove that the source code matches the machine code, not to prove that the code implements some intended higher-level semantics. This has nothing to do with formally proving the correctness of the Linux kernel.
It sure makes auditing that code conforms to an expected design a lot easier, which is most security bugs. This is a fantastic design choice for a security focused kernel.
I will grant that proving something was implemented as designed does not rule out design flaws so, fair enough.
The paper is talking about rigorous formal mathematical proofs of correctness like in the sense of of the seL4 kernel (https://sel4.systems/Info/FAQ/) where they have confirmed the kernel is completely free of bugs such as buffer overflows, null pointer exceptions, use-after-free, etc.
"Maybe when we make more progress with formal proof-generating languages, we can create a "friendly" C where the compiler refuses to compile the code until it's accompanied by formal proofs of UB-avoidance."
That's entirely feasible. I headed a team which did that for a dialect of Pascal over three decades ago.[1] It's since been done for Modula III, Java, Spec#, and Microsoft Windows drivers.
One reasonable thing to do is to have the compiler generate run-time assertions for every statement which has a precondition for defined behavior. All pointer dereferences get "assert(p != 0)". The shift problem in the original article gets "assert(n > 0); assert(n <= 32);" Now everything has run-time checks.
Then you try to optimize out the run-time checks, or at least hoist them out of loops. A simple prover can remove about 90% of the checks without much effort.
That's why you need to prove the whole chain end to end. From spec to C code. Hand-translating a proof to working C code will most likely introduce bugs. There are now proof tools that can prove the complete chain from spec to machine code (!)
reply