Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Some remotely exploitable Linux kernel WiFi vulnerabilities (lwn.net) similar stories update story
348 points by gundamdoubleO | karma 584 | avg karma 7.68 2022-10-14 01:13:08 | hide | past | favorite | 153 comments



view as:


I'm surprised there were not compiler errors for unreachable code, in the cases where the code was returning directly before a goto.

Edit: Looks like GCC removed the warning because it was unreliable. Clang and MSVC seem to be in better shape. https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html


> Edit: Looks like GCC removed the warning because it was unreliable. Clang and MSVC seem to be in better shape. https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html

That is an odd position when -Wstringop-overflow also highly depends on the optimizer (and will frequently generate false positives!) but not only remains in GCC but is enabled by default (even without any -Wall/-Wextra).

Things like this are why its pays to compile your project with as many compilers as possible (as well as static analysis tools).


Don't forget asan and ubsan. That requires good unit teat coverage to work though.

Theres a lot of bugfixes there...

And some are obviously correct... But others would require a lot more understanding of the code to be sure they're correct.

Someone should go through this with a keen eye to check the fixes are actually correct, and aren't just making the fuzzer stop alerting while leaving a more subtle vulnerability open.


Yeah the fact that the kernel has changes like this with such minimal testing is the reason why we see regressions in these kinds of bugs all too often.

> anybody who uses WiFi on untrusted networks

So is this for public/open Wifi networks only? Or is it for any wireless network where you do not control the gateway?


at least one of the RCE vulns seems to be exploitable even without connecting to any network (reachable via probe response handling).

Recommend that people click through and read the comments, in particular the (now) top thread, in part:

https://lwn.net/Articles/911071/

>> anybody who uses WiFi on untrusted networks

> It's actually worse than that - you just have to be scanning (though one of the issues requires P2P functionality to be enabled).

> So basically it's just

>> anybody who uses WiFi

> unfortunately.

And:

> Sorry, it took me longer than expected but I just posted PoCs + logs here: https://www.openwall.com/lists/oss-security/2022/10/13/5

> Most of the vulnerabilities were introduced in 5.1/5.2.


> > anybody who uses WiFi

It's worse than that - android kernels process beacon frames even if wifi is disabled.

So you should be worried about this if you have an android 11/12 phone, even if you don't use wifi.

Linux desktop/laptop users should be worried if they have wifi enabled, even if not connected to a network.


>It's worse than that - android kernels process beacon frames even if wifi is disabled.

>So you should be worried about this if you have an android 11/12 phone, even if you don't use wifi.

Is this issue (RCE even with wifi off across a huge swathe of devices ) common to many vulnerabilities, and we're just discussing this one because it hit the front page, or is this vulnerability especially... egregious?


> this vulnerability especially... egregious

This. The typical vulnerability requires an obscure hardware or software config, the user to do something unusual or foolish, or an attacker on the local network. This requires none of that.


>The typical vulnerability requires an obscure hardware or software config, the user to do something unusual or foolish, or an attacker on the local network. This requires none of that.

Thanks for the explanation. I usually abhor how the word "wormable" is thrown around but it sounds like it might apply here, especially since many devices running this software may be difficult to patch? Yikes.

I actually just put in my two weeks notice to spend the rest of spooky season focused on my art rather than infosec, but I hope folks don't have this... abused.


> So you should be worried about this if you have an android 11/12 phone, even if you don't use wifi.

Android 12 is 4.14


That's odd, mine is on 5.4

It looks like each Android release allows for a selection of kernel versions (for the OEM):

https://source.android.com/docs/core/architecture/kernel/and...

Mine is at 4.19.


My pixel 4a says it is on android 13 but kernel 4.14. It seems they break their own rules.

Those are only requirements for newly launched devices. Post launch, devices tend to stay on the same major kernel version even when they get new OS versions.

The linked page says "kernel upgrades aren't generally required when updating the platform release", so the relevant entry is the one for Android 10, which the Pixel 4a originally shipped with.

Motorola Edge 30 ultra android 12 5.10

Why 11/12? I have 13 and my kernel is 4.14. They said these got added in 5.1/5.2 right? Android seems to have wildly varying kernels within versions.

Pixel 3a XL running Lineage 19-20220917, Android 12.

/proc/version says 4.9.327.

Assuming not impacted, hopefully the November patch will address any remnants.


Motorola Edge 30 ultra running the stock android 12 ROM is on 5.10.

Guess I'm unlucky.


Kernel versions are going to be dependent on the whims of manufacturers.

guess its gonna be easier than ever to root one's android phone.

Only if that phone runs Linux 5.1 or 5.2, obviously most phones will be running Linux 4.14.

As long as the vulnerable changes weren't backported to it (I hope they weren't).

Could someone more knowledgeable than me comment if this is as worse as it looks?

As I understood the issues, this will probably lot's of "fun". You can broadcast the pcap files with any monitor mode capable wifi router. Luckily it's 5.1+ so most devices run very old vendor patched kernels and are probably not affected but at least for causing havoc this is really bad. As one issue is using beacon frames just a scan for networks should be enough for a crash. So you can at least crash and maybe exploit any device running recent Linux that scans for wifi networks.

I'm not sure how it's possible to do over the air remote code execution but I guess people are working on this.


I found the vulnerabilities, but am no expert for the Wifi stack.

DoSing is now "easy" as you say, just send those frames and a Linux computer that is currently listening to the network (e.g. scanning for networks) and thus processes the Beacon frames will at least crash. It might be the case that some wifi chips will filter those invalid frames or crash themselves, that depends on the actual hardware / firmware.

The victim must not be connected to a malicious AP or similar, so there is no requirement for tricking a user into something.

RCE is not trivial at all, but due to the nature of the different faults, might be possible. Therefore, see e.g. Mathy Vanhoef who discovered several impressive Wifi vulnerabilities in the past:

https://twitter.com/vanhoefm/status/1580675615992451072


FYI Fixes are now in openWrt master 21.x and 22.x branches. New bin files will be posted soon. Or you can build from the git.

From a quick look at the openwrt home page, they had really bad timing this time: they had just released an important security release two days ago (on the 12th), one day before this new set of vulnerabilities was announced yesterday (on the 13th).

Looks like these are all in mac80211. I'm not 100% familiar with the intimate details of 802.11 but I have read the relevant parts of the standard, at least enough to RE some drivers, and a lot of things were clearly designed to be fixed and of a definite size so as to be implementable on a highly constrained embedded environment, so to see things like use-after-frees appear is a little disappointing.

Weekly news of memory related CVE.

Keep using unsafe langs.

What will be there in next week? CVE in Chromium?

At this point betting sites should add category for that kind of games.

I do wonder what people of future will think about this:

"So they had research indicating that a lot of issues were related to memory, had technology which significantly reduces this issue, but they still kept doin mess for years?"

https://msrc-blog.microsoft.com/2019/07/22/why-rust-for-safe...

https://microsoftedge.github.io/edgevr/posts/Super-Duper-Sec...

https://www.chromium.org/Home/chromium-security/memory-safet...

Memory issues and JIT (browsers) are two things that are responsible for disgusting amount of security issues


This is naive.

You cannot rewrite the entirety of the Linux kernel in another language overnight. You'd have years at least until it becomes production-ready. Not to mention the performance and memory use will be worse.

Certainly the situation can and should be better, but adopting this "it's so easy, how does nobody see it?" attitude helps no one.


I mostly agree, but I think I would have read the comment differently. I've seen C++ people have a strong distaste towards Rust for various reasons and don't exactly care too much about the "memory safety" part. Which is... unfortunate. So while it might be "beating a dead horse", the horse isn't even dead.

You're right.

People need to be aware how mem. safety affects security in critical software like Chromium or Microsoft's.


In fairness, a shitton of these issues would be solved by following C++ patterns like RAII, instead of the defer/gotos style the kernel seems to be proud of.

This is kernel space, not userspace.

Rusts memory safety is not a 100% protection here.


Rust's memory safety is never a 100% thing, but that is no argument, as long as it's a significant enough improvement...

Let me explain: in the kernel memory is not always simply "memory".

Sometimes writing to a memory you own has huge side effects. And that part is not handled by rust.


Perhaps instead we could train an ML model to find these use-after-free bugs?

>Not to mention the performance and memory use will be worse.

How much?


You obviously cannot rewrite things in another language overnight. But I do wish that the industry saw this as an emergency rather than something ranging from "well, we will get to it when we get to it" to "ugh, I'm tired of these people talking about memory safety - don't you know that you can write correct C programs?"

The linux kernel in particular is perhaps the single most important piece of software on the planet. And we vulns like this all the time. Hundreds per year. And there's billions more lines of C and C++ out there handling all sorts of untrusted input.

The path off C and C++ is complicated as shit. Interop with Rust is messy and there aren't effective tools for automatic translation. Carbon is barely a language at this point (they don't even have a compiler) and doesn't yet provide safety. The story for the other alternative languages isn't any better. But I really wish the industry was throwing billions at this across dozens of major companies and open source organizations.


Rust is not a panacea. You can't just claim that this one emergency project somehow solves all the future bugs. Would Linux being rewritten in mostly-Rust help with some classes of memory bugs some of the time? Sure. Would there be a lot of other tradeoffs to consider, are there risks, would there still be plenty of kernel CVEs going forward? Yes to all of these.

Rust is absolutely not a panacea. Other kinds of vulns can absolutely exist. But vulns caused by memory-safety errors are so incredibly common in Linux and other critical software that it should be embarrassing. If we could get to a world where all of our kernel-level vulns were logical errors rather than UAFs I would weep with joy.

I'm not saying that using memory-safe languages (or a different kernel design that at least isolates bugs like this) fixes security forever. I'm saying that it would dramatically increase the cost of developing an exploit for the world's most important piece of software.

I'll even soften my request. Let's forget about memory safety. Let's just talk about regression tests. How the fuck is it possible for a vuln to regress in the kernel because nobody added a test when it was first fixed? This is a disaster. Everybody has just somehow decided that the current state of things is tolerable and I feel like I'm taking crazy pills.


If we can't even get people to put static analysis in their pipelines how are we going to get them to switch to rust? If everyone that used c++ for instance built in clang and g++ both, ran cppcheck and clang tidy, and ran both asan and ubsan, we still wouldn't get rid of all the memory bugs rust eliminates by simply compiling, unless you have 100% code and branch coverage to make sure *san are doing their jobs.

The easiest path to sanity is probably rust, but we can't even get static analysis to be a norm...


> If we can't even get people to put static analysis in their pipelines how are we going to get them to switch to rust?

I don't know. Somehow we need to shift industry culture. The good news is that this has been done. In the past, things we now take for granted like source control and unit tests weren't norms. Maybe someday tools like static analysis, fuzzing, and considerations for memory safety will be industry norms. I hope so.


Nobody is claiming Rust would prevent all bugs, just a huge proportion of them. At least 70%, I would argue even more.

But yes it should also be accompanied by better security architecture, i.e. not running network drivers in a monolithic kernel.


Okay, so switch to Redox?

Perhaps it's time to reduce our dependence on the Linux kernel then.

I think that's a possibility too. This is also hard as hell. Android seems to have some long term strategy with Fuchsia but that's been in progress for what seems like forever. And webservers basically don't have an alternative right now.

Most of the other options are written in C or C++ themselves.

i can understand the argument for new code, but what do you want people to do, recode the entirety of the linux kernel in rust? the kernel is allowing rust for new drivers

Faster adoption

Raise awarness


Easy to sit on the sidelines and tell everyone else to do better, isn’t it?

It is, but remember that people's lives and freedom are on the stake

So go contribute code instead of demanding others do it?

Well, also posted on LWN today for subscribers: A first look at Rust in the 6.1 kernel: https://lwn.net/Articles/910762/

(If you don't have a subscription, the article will become freely available to everyone on Oct 27th.)

tl;dr, it doesn't do anything interesting yet, but the infrastructure is getting there, and starting the process of evolving the kernel to using a safe language.


Fortunately, on Qubes OS, only the networking VM can be exploited like this, and it will be clean again after its reboot.

I installed 4.1 and only my firewall VM is disposable. Wouldn't that mean my net VM could still have an exploit that leaves something in the home directory? (Would be nice if it was easier to trash and rebuild it).

You can choose sys-net to be a disposable during the install. It's not the default. You can also make it a disposable manually: https://www.qubes-os.org/doc/disposable-customization/#using...

Beware that your WiFi password will be forgotten every VM reboot (but there is a workaround on the forums).


Thank you! (Yes I forgot that it was an option I chose; I probably went with the defaults, not knowing the implications of deviating)

It's possible, but I believe the design is such that sys-net is untrusted, so an exploit there is no more risk than any other use of an unencrypted connection on the network.

But it sure looks like it was a wise idea to spend the resources on isolating network hardware!


How's GPU support in Qubes these days? (Nvidia, CUDA)

Eh, it didn't get cutesy name like BadWiFi, won't be that bad /s

“beacown” apparently: https://github.com/PurpleVsGreen/beacown

Though that may not be a generally used name as yet.


Can we please stop running network drivers and network stacks in kernel mode by default? It's 2022 and we've got more than enough compute power nowadays that the performance hit for running these in user-land is negligible for most use cases. Smartphone, tablet or laptop users usually do not need the level of performance that requires running that stuff in the kernel when browsing the web.

I get that there are some use cases where performance really matters to the point where kernel network stack and drivers make a difference (high-throughput and/or low-latency services running on servers, high-performance routers...), but that should not be the default for everyone.



Note that the Tanenbaum-Torvalds debate was in 1992, over thirty years ago. The security of computer systems might have improved since then, but the fallout of security issues has massively increased. Managing your bank accounts wirelessly on the Internet from anywhere in the world with a thin, battery-powered device that fits inside your pocket was a pipe dream (and no one could've possibly imagined that an Internet-ready smart toaster could compromise it and steal your money on your accounts).

1) Many will cry about that performance hit (including me).

For over a decade our computers have gotten faster marginally, but our software has gotten slower at a greater rate.

You can barely navigate the web now with a new low end computer (that isn't a Chromebook). Most on this site won't care though because our machines cost $2,000+ and the web is Fine(tm); many folk aren't buying anything over $300 though.

2) These are memory bugs, so the introduction of Rust into the kernel could help us here potentially, no need for an architectural revolution.


> Many will cry about that performance hit (including me) because for over a decade our computers have gotten faster marginally, but our software has gotten slower and bloatier at an increasingly rapid pace.

We're talking about network stacks and network drivers, not web browsers. Migrating the network stack from the kernel to a user-land process is not going to measurably slow down web browsers, especially on modern systems with gigabytes of RAM, multiple cores, IOMMUs and whatnots.

> These are memory bugs, so the introduction of Rust into the kernel could help us here potentially, no need for an architectural revolution.

That would require rewriting the network stack and network drivers in Rust (driver code is much more likely to have bugs than the rest of the kernel) for this to be effective, otherwise you'll still have a lot of C code in the network path. I'd argue that this would be a bigger architectural revolution than porting the existing code and running it in user-land. MINIX3 went through such a change when drivers were removed from the kernel (can't find the publication about it right now) and they only required reasonably small changes when porting these to user-land, there were not rewritten from scratch.

But this is not just about memory safety, Rust code can still be vulnerable in many other ways (memory leaks, unsafe blocks, wrong assumptions, incorrect algorithm implementation, buggy/compromised toolchains...). Code running inside the trusted computing base of a system is a liability, enforcing privilege separation and principle of least authority reduces it.


> We're talking about network stacks and network drivers, not web browsers.

Ah yes, the magic web-browser that doesn't do any kind of networking at all.

> Migrating the network stack from the kernel to a user-land process is not going to measurably slow down web browsers, especially on modern systems with IOMMUs and whatnots.

I don't know how you can possibly assert that, it's contradicting computer sciences' current understanding of operating system design as it relates to kernelmode/usermode switching, unless you're doing weird shared-memory things in userspace... which is terrifying.

> That would require rewriting the network stack and network drivers in Rust

Not really, C and Rust can interop just fine, you can have network drivers that are rust but the actual networking stack itself can remain C, if you want.

> but this is not just about memory safety, Rust code can still be vulnerable in many other ways

The post is literally memory safety bugs.


> Ah yes, the magic web-browser that doesn't do any kind of networking at all.

They clearly didn't claim that. Your webbrowser being slow nowadays is not because it needs to do some networking.


They are claiming a loss in performance is ok.

I am claiming that people keep making this claim and it no longer holds true because software is already losing too much performance for the value we get back.

That's my whole thesis.


Your claim assumes that a small loss in performance in networking will lead to a loss in performance of the overall web browser, which is only true if networking is the bottleneck while browsing. And it usually isn't.

Ah, so you think the only thing I do with a computer is use the browser? That's weird, I was just making an example of something that is so slow that is literally unworkable in the modern day already.

Impacting networking affects the entire machine, especially in so far as a computer is increasingly just a dumb terminal to something else.

Look, If you make network requests potentially 20% slower then the browser performance will be impacted too, it's so obvious that I'm not sure how I can explain it simpler.

By how much? I am not sure, but you can't say it won't be slower at all unless we're talking about magic.

Pretending that it's trivial amounts of performance drop without evidence is the wrong approach. Show me how you can have similar performance with 20% increase in latency and I will change my stance here.

As it stands there are two things I know to be true:

Browsers rely on networking (as do many things, btw) and software is increasingly slow to provide similar value these days.


The point is that most users and use-cases of networking don't have high requirements on bandwidth or latency that warrant a network stack design focused on high performance. Let the ones who want to live on the edge do so if they want, but don't force your high performance, one-bug-away-from-total-disaster network stack design based on your own (probably overblown) requirements on everyone else.

Grandma doesn't care if her tablet can't saturate a WiFi 6 link. Grandma doesn't care if her bank's web page takes an extra 75µs to traverse the user-land network stack. But she will care a whole lot if her savings are emptied while managing her bank account through her tablet. Even worse if her only fault was having her tablet powered on when the smart toaster of a neighbor compromised it because of a remotely exploitable vulnerability in her tablet's WiFi stack.

Or are you suggesting that grandma should've known better than to let her tablet outside of a Faraday cage?

> Pretending that it's trivial amounts of performance drop without evidence is the wrong approach.

Amdahl's law begs to differ. If it takes 5s for the web site to arrive from the bank's server, spending 5µs or 500µs in the network stack is completely irrelevant to grandma. Upgrading her cable internet to fiber to cut these 5s down to 500ms will have much more positive impact to her user experience than optimizing the crap out of her tablet's network stack from 5µs down to 1µs.


What an incredibly weak argument, I'm disappointed to read it.

We're not talking microseconds, we're talking a fundamental problem in computer science for 30 years which is no closer to being solved.

We're talking about a classification of bugs which are solved by other means rather easily that do not take an unknown performance penalty on one of the slowest to improve component of modern computers.

Grandma isn't losing anything due to this, heartbleed: this ain't. Spectre: this aint. and crucially we have the tools to ensure this never happens again without throwing our hands up in the air and saying "WELL COMPUTER NO GO".

If you're actually scared, I invite you to run OpenBSD as I did. you will learn very quickly that performance is a virtue you can't live without, a few extra instructions here, a lack of cache on gettimeofday() and suddenly the real lag of using the machine is extremely frustrating.

And again, for the final time I will say this: we can fix this and make it never happen again without any loss in performance.

that you keep advocating a loss in performance tells me that you've spent a career making everyones life worse for your own experience, I am not a fan of that mentality.

or maybe I've worked in AAA Game Dev too long and we don't get the luxury of throwing away performance on a whim.


Extrapolating your position makes me think your ideal operating system wouldn't be an offshoot of the Linux kernel. It would be a general-purpose, fully asynchronous, MMU-less, zero-copy, single address space operating system secured through static program analysis, where the web browser and the NIC driver are but a couple of function calls away. Kinda like Microsoft Research's Singularity, but probably without the garbage collection.

Maybe one day every phone, tablet and laptop will run such an operating system, but I doubt that we'll have this as a viable alternative anytime soon. In the meantime, I think there's a reason why Google with Fuchsia OS and other companies are hedging their bets mainly through micro-kernel-style approaches for their next-gen general-purpose operating systems.


I love that you just run to the extremes.

It is an excellent way of getting me to dismiss you entirely.

My position is that: The best system is an improvement on the one we have, not some mythical potential solution that has unknown consequences.

Though I have a fondness for rump kernels.


You are assuming copying around buffers won't consume any cpu? Maybe it's perfectly fine, maybe it's not. But it needs some experiment before we can handwave it.

Taken from https://news.ycombinator.com/item?id=33200171#33203269

This publication (http://www.minix3.org/docs/jorrit-herder/asci06.pdf) claims that MINIX3 could saturate a 1 Gb/s Ethernet link with an user-space network stack, with separate processes for the stack and the driver, on a rusty 32-bit micro-kernel that can't do SMP. In 2006.


> Ah yes, the magic web-browser that doesn't do any kind of networking at all.

The web browser isn't Netflix trying to serve hundreds of gigabits per second of encrypted video streams from a single server. Do you really need the ability to reliably saturate a 40 Gb/s Ethernet link to browse Hacker News comfortably? You'll hit various other bottlenecks long before performance for practical usages of web browsers will be significantly impacted by a user-land network stack.

As I've said, there are use-cases where extreme throughput and latency requirements warrant a design focusing on performance. Smartphones aren't one of them.

> I don't know how you can possibly assert that, it's contradicting computer sciences' current understanding of operating system design as it relates to kernelmode/usermode switching, unless you're doing weird shared-memory things in userspace... which is terrifying.

Again, not everyone is Netflix. I'd rather have a computer capped at 1 Gb/s speed with a user-land network stack than a computer capable of saturating a 40 Gb/s Ethernet link with a kernel network stack when I'm managing my bank accounts. Most end-users don't need ludicrously fast network speeds to browse funny cat GIFs on their web browsers.

Also, I've contributed code to multiple operating systems (MINIX3, SerenityOS). Running an user-land network stack isn't going to turn your 1 Gb/s Ethernet card into a 10 Mb/s Ethernet card.

> Not really, C and Rust can interop just fine, you can have network drivers that are rust but the actual networking stack itself can remain C, if you want.

As far as I can tell, the bug is in the network stack itself. A network driver written in Rust wouldn't immunize your Linux kernel here from this bug.

> The post is literally memory safety bugs.

The consequence is about computer security, of which memory safety bugs are but one cause among many.


> The web browser isn't Netflix trying to serve hundreds of gigabits per second of encrypted video streams from a single server.

Ironically, server workloads are the ones that are increasingly moving to networking stacks that run in user space, using frameworks like DPDK, with performance as a motivator: https://en.wikipedia.org/wiki/Data_Plane_Development_Kit

Of course, there are some caveats - from my understanding, typical DPDK use cases would turn over the entire NIC to a single application, meaning you aren't contending with sharing the network between multiple, potentially adversarial user mode processes. This is fine for a server, but not really appropriate for a PC or smartphone.


Yes, the way Netflix and Co. are using Userspace drivers is by passing entire devices to a single application.

There's no general purpose IPC happening there.


Netflix interestingly (rather than focusing on DPDK/user-space techniques) seems focused on increasing the throughput of kTLS on their CDN appliance boxes so they can simply sendfile(2) right out of VFS cache in kernel space for the bulk of the data plane. An alternative pathway to the same goal of increasing throughput by colocating your general data and your network stack state in the same context.

I wonder if io_uring will be able to maintain competitive against DPDK-like approaches. Multi tenant solutions are more attractive and seem like they could be extremely competitive since they should be largely equivalent in the case that you have a single tenant.

> unless you're doing weird shared-memory things in userspace

Shared-memory things in userspace, i.e. buffers shared between 2 distinct user processes are no weirder than buffers that are shared between a user process and a kernel-mode driver. In both cases the buffers cannot be accessed by third parties.

Moreover, the transfer of data between 2 processes through a shared buffer can be done without any context switch (which could be slow), if the 2 processes are executed on distinct cores. Therefore having the network device driver as a distinct process does not have to cause any reduction in performance, if the means for inter-process communication are chosen wisely.

For any device driver that is implemented as a user process, the kernel can enable direct access to any I/O ports and memory-mapped I/O areas that are needed by the device, so the device driver can work in user mode without requiring any context switches.

Such direct I/O access cannot be enabled for ordinary processes, because those are not trusted enough and also because the direct I/O access could be enabled only for a single process at a time.

A dedicated device driver process solves both the trust problem and the multiple access problem equally well as a kernel-mode driver.


Things are more complicated. You can indeed have a very fast network driver in userspace (in fact for many use cases userspace networking is faster than the kernel). But where do you put the rest of the network stack?

> I'd argue that this would be a bigger architectural revolution than porting the existing code and running it in user-land. MINIX3 went through such a change

MINIX is a microkernel architecture - running drivers in userspace is one of its core features/selling points, and one that differentiates it from (modular) monolithic kernels such as Linux. So, this isn't a very solid line of reasoning.

It seems to me that the situation is the opposite - that moving drivers to userspace is an architectural change, which is more complex than porting an existing architecture to a new language.

> Rust code can still be vulnerable in many other ways

Sure, but not vulnerable in the way that the vulnerability under discussion is.

> memory leaks

Much harder in Rust than C, and also unlike in C, not going to result in security vulnerabilities.

> buggy/compromised toolchains

If you're going to assume that your toolchain is compromised, than anything is on the table, including the toolchain inserting a backdoor into the kernel and completely bypassing the proposed architectural change of moving drivers into user-space. And, needless to say, compiler bugs are rare in general, and compiler bugs that cause software vulnerabilities are nearly unheard of (and I've literally never seen one before).

Nobody thinking rationally is going to tell you that Rust is going to eliminate all your bugs or make your code secure. However, by far, the majority of security bugs in the Linux kernel are due to mistakes that the design of Rust either completely eliminates or massively reduces.

And security is intrinsically a tradeoff - the Linux kernel is not optimized for maximum security (which would be something formally-verified like seL4), but a compromise between security, performance, and development velocity. The claim is that Rust will provide significantly better security at basically the same performance and possibly modestly improved development velocity - the very least that one should do is rewrite the existing architecture in it (or, again, a language that meets or exceeds the specs of Rust) and then see what the bug rate is before deciding to take a guaranteed performance hit through an architectural change.


> MINIX is a microkernel architecture - running drivers in userspace is one of its core features/selling points, and one that differentiates it from (modular) monolithic kernels such as Linux. So, this isn't a very solid line of reasoning.

The first two versions of MINIX ran drivers inside the context of the kernel. The migration of drivers to user-space and overall emphasis on reliability didn't happen before MINIX3 in 2006.

Given that Linux nowadays has FUSE and UIO, still calling it a strictly monolithic kernel is probably a bit of a misnomer at this point. The same goes for Windows, NetBSD and others by the way.

> It seems to me that the situation is the opposite - that moving drivers to userspace is an architectural change, which is more complex than porting an existing architecture to a new language.

Trying to do a straightforward port of a C code base to Rust will quickly grind to a halt due to Rust's borrow checker. On a highly-optimized code base such as the Linux network stack, untangling every last optimization trick and shortcut to make the Rust compiler happy would require a large-scale refactoring that'll end up looking nothing like the original code base, at which point you might as well rewrite it from scratch.

In comparison, migrating that C code base to user-land would be less of a disruptive change to the code base (as was the case when MINIX3 did so with its drivers). It's still the same old network stack, adapted to run on a different environment.

> Sure, but not vulnerable in the way that the vulnerability under discussion is.

Rust isn't a silver bullet against every class of bug. Code that runs inside the trusted computing base means that it's one security bug away from system compromise. Writing that code in Rust doesn't change that.


> You can barely navigate the web now with a new low end computer...

You can barely navigate the web now with almost any computer. I have a high-end laptop and opening a few tabs from various sites on the Internet will cause the CPU usage and fan speed to spike. Just for a few tabs! Obviously not all sites do this, but the web has become a framework of advertising monstrosity and I can barely navigate and consume much of today's web content without enabling Reader Mode.


Well, you can navigate the web with older low end computers if you're running the full suite of blockers (NoScript & UBlock on Firefox seems to work pretty well), but that means many sites don't work unless you selectively allow certain scripts to work by fiddling with NoScript permissions and selective blocking some elements with Ublock. HTML + CSS sites are no problem, however.

How do the micro kernel folks get around this? Are they paying the syscall toll between kernel components?

Yes, exactly!

Redox-OS is based on this approach and is honestly extremely clean and easy to understand (when compared to MINIX which also adopts the same approach).

Wikipedia has a much better explanation than I can give right now: https://en.wikipedia.org/wiki/Microkernel#Performance


> How do the micro kernel folks get around this?

AIUI, in general, they don't. Avoiding the overhead of context switches is the research question in microkernels, and while there is progress being made, last I'd heard there wasn't a solution that didn't carry caveats. Now to be fair, sometimes those caveats are fine - ex. if you only write software in rust maybe you can get away with actually passing around memory without copies - but often they undermine the ability to run arbitrary software on general purpose hardware.

(If I'm behind and this has been solved in the general case, I'm happy to be corrected, but this was my undestanding of things as of a few years ago and I haven't heard about a breakthrough)


Regarding the first point. If I understand correctly you say that there is an inevitable performance hit when not running in kernel mode. But is that really so?

I don't like that you were downvoted for curiosity.

Yes, there is a performance hit but how large it is depends a lot on what you're doing.

"Switching" is one of the most costly operations and in Kernel mode you do not need to do it unless interacting with something in user space.. which you would only do because something in Userspace requested it somehow.

For other things, such as virtual memory, Microsoft found that the protections needed for virtual memory could be anywhere between 10 and 20%; but since there's no concept of virtual memory in kernel space: it's hard to say concretely that your program would be "20% faster". It would be too much of a different program.


The whole point of dpdk is that it bypasses the kernel to obtain speed. It almost certainly requires a lot of privileges, but the kernel itself isn't the source of the hit.

The cost is in the switch between driver->kernel->network-consumer as running TCP and routing packets to the correct process are done by the kernel.

If you run dpdk you get raw ethernet frames and run TCP yourself. This means that a program can receive any data sent to the machine. More sophisticate cards can do routing in hardware and present multiple "virtual" cards, but this is not yet commonly available in consumer hardware.


dpdk is not general purpose networking. It's passing a device to a process and forgetting about it; the process itself needs to decode what is sent on the wire and make sense of it.

It's basically the kernel giving up on trying to do anything with the hardware, thus it's not available to any other process except the one that takes the hardware.

To have general purpose networking in user space you will end up with some other IPC which does not rely on sockets (because sockets are kernel) or shared memory which is dangerous as hell.


A faster network is going to have a marginal effect on software getting slower. There are a lot of factors at play: network speed, cpu power, local caches, etc. The speed of the network driver is one factor among many and I'd be surprised if were the most common bottleneck.

The operators of websites that derive most of their revenue from advertising are going to run their sites at whatever level users will tolerate. Where network drivers are faster they'll either cram in more ad tracking, or won't bother optimizing their existing trackers. If users stop accessing sites because they're too slow to load, operators will either cut down on ad tracking, or more likely, put some effort into optimizing the performance of their ad trackers.

Rust is itself something of an architectural revolution. I believe network drivers in userspace is already a thing, and eBPF may also have a role here. All of this is worth exploring. This is what progress in Linux looks like.


> A faster network is going to have a marginal effect on software getting slower.

I don't mean to be glib but: citation needed?

One of the slowest moving hardware improvements (compared to CPU/Memory speeds) is networking.

It's going to take some serious convincing to tell me that we should just fork off performance; when software is already getting slower and slower.

It's not fair to blame advertisers exclusively, we also have electron and the hundreds of JS frameworks, that's before we get down to the low level abstractions that hook into basic programs.


The network stack isn't written in javascript.

The slowness of the web will not get any worse by moving the network stack to userspace. Guaranteed.


You have the entire body of computer science research around microkernels against you here.

What has changed?


CPUs got faster, memory got larger, compilers got better.

Microkernels aren't a dead idea and they are even making their way into consumer electronics. (see Zircon).

In fact, even newer oses are backing into microkernels. Windows adopted similar microkernel concepts in Vista with the HAL. Android adopted a HAL with project treble in android P.

The steady march of modern OSes has been moving driver logic out of kernel space and into user space.


This is driver software, not relevant. Rust isn't a magic spell to be thrown out there when traditional solutions work quite well. The old reasons for hard kernel/userland separation are far less reasonable now.

1) For this very reason, it doesn't make sense to optimize low-level components from the OS down to the hardware anymore, and hasn't for over a decade, if an improved user experience is your goal.

The short-term effect may be a slight improvement, but it's a treadmill and the next wave of web crapware will more than nullify it.

If a microkernel architecture with worse performance got established on all mainstream devices, the experience would be worse in the short term but in the medium term, the crapware would have to adapt so that it again becomes just barely usable as it is today.

The problem is that the short-term gains create an incentive for users to buy/use the marginally faster hardware or kernel, which then forces everyone else to follow suit.


> Many will cry about that performance hit (including me).

Ok, we could make it optional, i.e. additional security for those who wish it (on top of other things we could do, like the things you mentioned but which aren't a panacea too).


You really need to back up these assertions with some evidence.

Do you have benchmarks that show the impact of switching to userspace on a typical, loaded desktop system with all kinds of workloads? Or are you just guessing?


I did not anticipate a Hacker News discussion about a remotely exploitable Linux kernel WiFi vulnerability requiring some network benchmarks on an unusual network stack architecture, but I'll oblige:

This publication (http://www.minix3.org/docs/jorrit-herder/asci06.pdf) claims that MINIX3 could saturate a 1 Gb/s Ethernet link with an user-space network stack, with separate processes for the stack and the driver, on a rusty 32-bit micro-kernel that can't do SMP. In 2006.


What kind of latency would this introduce on packet processing? A context switch used to be measured in microseconds. I don't have a linux system here to run lmbench on, and I wonder what the level of latency would be on a modern system.

Current everyday networking has tens-of-microseconds latency[1] and on 2018 era CPUs the context switch overhead seems to be 1-2.5 microseconds[2]. So in benchmark and specialized use cases, context switch overhead would be measurable but many orders of magntude away from relevant when talking about communication over the internet.

[1] https://blog.cloudflare.com/how-to-achieve-low-latency/ - 30-45 microseconds on plain 10G ethernet - faster ethernet probably wouldn't improve much on this

[2] https://eli.thegreenplace.net/2018/measuring-context-switchi...


> It's 2022

So someone else should have done it for you by now?

Be the change you want to see in the world.

I'm sure you have an excuse for not doing it personally. Just as I'm sure the person who you've mentally assigned responsibility has at least as good of an excuse too.


> Be the change you want to see in the world.

I have made dozens of commits to MINIX3, including a brand-new ISO 9660 file system implementation (https://github.com/Stichting-MINIX-Research-Foundation/minix...).

I have made more than a hundred commits to SerenityOS (https://github.com/SerenityOS/serenity/commits?author=boricj).

Just because I deplore the general state of security in mainstream operating systems doesn't mean that I demand that someone else does something about it for free.

I'm not paid to fix security bugs in the Linux kernel, do you expect me to fix these myself for free just because you want to? No one is entitled to my own free time spent hacking on random stuff.


Interestingly the user space way is also used in the performance absolutist end of the spectrum, so userspace can talk to the HW without kernel involvement (Snabb, dpdk, etc).

See eg https://talawah.io/blog/linux-kernel-vs-dpdk-http-performanc...


indeed, but afaik these more or less depend on dedicating a cpu-thread to the software-thread because a cache eviction would completely wreck the performance

I think with networking appliances that are built from eg DPDK building blocks the main motivation for pinning and isolcpus usage is largely that users want to have a fixed topology of CPU cores, PCIe devices, NUMA memory pools/hugepages. But this doesn't really mean it would be necessary if we started using userspace networking drivers for general purpouse linux workloads.

For performance you want to reduce context switches between processes including the kernel. With kernel bypass you do everything in one process in the user space, but you are also losing out on features like sharing hardware resources between processes. With general user space drivers you will gain additional context switches. There are also ways to reduce the cost of those context switches though, like IO-uring that Linux recently got.

> I get that there are some use cases where performance really matters to the point where kernel network stack and drivers make a difference (high-throughput and/or low-latency services running on servers, high-performance routers...), but that should not be the default for everyone.

I’m on board so long as there is a choice. Routers with crappy hardware need as much help as possible. Also tangentially this is why the current darling of VPN tech, WireGuard, is implemented in kernel not userspace.


> It's 2022 and we've got more than enough compute power nowadays to run in user-land

Oh hell no! User-land driver can barely handle WiFi 4 speeds (72Mbps), with terrible CPU performance (1000's of context switches and interrupts per second).

For WiFi 5 (ac), WiFi 6 (ac), you need heavy WiFi firmware involvement, multiple DMA queues and kernel driver, and even with all that it requires special care to reach target performance. There is no chance in hell to reach that kind of performance in user-land.


Good gracious.

"WEINBERG'S SECOND LAW: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization."


This is more of a disparaging analogy than a law.

In the last decade, we have bump link speed from 100m 1G to some board even have 2.5G rj45 port now. I am not sure if move data between kernel and userland at 2.5G/s is even a good idea.

And even worse, because kernel still have to distribute data to other userland program, you actually need another round trip so the impact need to multiply by two.


Worse, IP package sizes didn't grow at all. So it would require 200 000 context switches per second if you don't want to add code bundling packets kernel-side and dispatching them a-la Nagle's algorithm. This is both error-prone and adds latency. Even worse, trampolines and other mitigations make it so switches clear CPU cache.

The trick is to not copy the data, but to pass pages around.

It's the opposite: In the last decade networking connection speeds have plateaued. In the 90s we had 100M ethernet, in the 00's 1G ethernet, in the 2010's 10G ethernet, and it stuck there for a long time. We had much much less CPU cycles available per packet when 1G came, for example.

I don't think I have seen 10G ethernet network card in consumer grade hardware even recently. 10G is there in core infrastructure for a long time. But only until recently you see 2.5Gbps in endpoint devices. (Probably 2.5 makes more sense to a RJ45 head cable)

And also I don't think 10G routers/switchs ever use pure software based solution to handle the traffics. They are all hardware based or mixed solutions.

It's amazing that memory bandwidth / cpu speed / core counts grows so much that makes this even possible. But it still isn't a good idea.


"Drivers are exploitable so we should run them in userspace" is a hack, and not a good one.

The problem is that drivers are exploitable in the first place, so the solution is that we should make them not exploitable (using Rust, or a better language than Rust that fixes some of its problems) and try to preserve our performance that is rapidly being stolen away by bloated userspace software, rather than just shrug our shoulders and say "oh well, I guess that drivers are just intrinsically insecure".


It's not a hack. Virtual address spaces and process isolation were built exactly for misbehaving code.

Uhm... no?

1. Date of year has nothing to do with doing things correctly, ever.

2. Ironically, pulling it out of the kernel and running it in user-land will probably bring about more bugs and issues. I would much rather we just fix the problem where it is, and leave it at that, instead of potentially introducing new problems, like backdoors and exploits in the software provided. Not saying it WOULD happen, but the potential for it alone is just not worth the risk in my honest opinion. Let's just fix the right way, and be done with it.


> Can we please stop running network drivers and network stacks in kernel mode by default?

No. I want the kernel to have as much functionality as possible. I have some zero dependency freestanding software that I boot Linux directly into. I really don't want to have to maintain additional user space in the form of C libraries. If I need to manage wifi connections, I should be able to make some system calls and be done with it without having to link to anything else.

Anything related to hardware belongs in the kernel so that all software can access it via Linux's amazing language agnostic system interface. If there are security problems, then that process should be improved without screwing up the interface by replacing it with user space C libraries. We have enough of that in the graphics stack.


I only have a very surface level understanding of linux, how does code running in user mode incur a bit of overhead as opposed to Kernel?

Stupid question, but how come this has not been embargoed?

Seems like a pretty major vulnerability that affects tons of devices.


Is there reason to believe it wasnt? The oss-security mail message would seem consistent with the fixes having been prepared under embargo.

Nice. Just in time for a long weekend on public WiFi with my Linux laptop.

Seems like most of these got introduced in 5.1/5.2/5.8 and fixed in 5.19.14.

I don't see any of these fixes in 5.19.14; in fact, Fedora has just released a 5.19.15 with these fixes manually applied on top of it. The stable release with these fixes will probably be 5.19.16.

It seems like you're right. I misunderstood something. That's worrying.

ArchLinux seems to also have it patched in 6.0.1-arch2-1 though[0].

[0] https://security.archlinux.org/ASA-202210-2/generate


I think Fedora 37 beta already use 5.19.15, the stable release in few days afaik.

All currently updated Fedora releases use the same kernel release, that is, every kernel update is released to all currently updated Fedora release at the same time (with slight timing differences for the migration from "testing" to "stable"). If you look at https://bodhi.fedoraproject.org/updates/?packages=kernel right now you see 5.19.15-x01 for Fedora 35, 36, and 37 (the 5.19.15-x00 packages didn't have these fixes, the 5.19.15-x01 packages have them).

> The stable release with these fixes will probably be 5.19.16.

Updating my own comment (too late to edit), that release is now out with the fixes. The full set of stable releases with these fixes is: 6.0.2, 5.19.16, 5.15.74, 5.10.148, and 5.4.218 (source: https://lwn.net/Articles/911272/).




Hmm does anyone know if there is a site/community/service that keeps track of backports fixing CVEs for different Linux distros?

> The 6.0.2, 5.19.16, 5.15.74, 5.10.148, and 5.4.218 stable kernel updates have all been released. Among other things, these updates contain the fixes for the recently disclosed WiFi vulnerabilities. ~~ LWN.net

Legal | privacy