Okay, there's a lot to unpack here. First arbitrary code execution doesn't really matter. We're talking about a client-server relationship. Just because you have arbitrary execution on your laptop doesn't mean my server's security is compromised. And just because you have arbitrary code execution in a sandbox, container, flatpak, VM, or other host and can talk to my compositor doesn't mean it's compromised.
The security model for Wayland is that it should enable you to create secure systems if you wish to plug the rest of the holes. You can't really use Xorg in a secure system because being able to talk to the X server at all exposes too much privilege.
Screen recording (and taking screenshots) is a bad fit for the Wayland protocol because then what is a very small and lean protocol suddenly needs to know about things like image and video encodings. And since it's a display-only protocol you either have to teach Wayland about auto streams and mixing or stitch together your audio out of band.
But fine, we somehow all agree that such things are in the scope for a display protocol. Recording your screen is still a privileged operation and Wayland doesn't have an authentication or permissions system. How should they authenticate? In a way that a malicious program can't fake?
We end up just reinventing dbus, polkit, pulseaudio, pipewire but in a way that ensures that you can't swap out those components for something else.
If you care about standardization then it's really not that bad: GNOME and KDE typically work in tandem when desiging dbus interfaces and smaller DE's end up implementing those. So far the world hasn't ended for media controls and it won't end for screenshots.
Sorry, you are wide off the mark. Arbitrary code execution is game over because performing privilege escalation is usually prettyntrivial after that. Also, wayland is just a local IPC protcol.
To be more precise, its graphics related part is a GPU buffer management protocol only. No graphics data is passed between processes through that channel. All that happens is buffer allocation and buffer access brokering. This ties into what the wayland part of a screen capture tool is: read access to the frame buffer after composition. No image encoding, no video codecs or anything else, just the ability to get to the raw pixels.
This also means that sound is a non-issue for wayland because the audio stack is separate.
Also, I remember that adding authentication prompts to compositors was seriously comsidered when screen capturing came up on the wayland mailing list. IIRC it was only rejected because it created weird synchronization issues because of the unspecifies time interval between the request by an application and the response by the compositor. And I seem to remember that there are some compositors who allow whitelisted applications to perform operations that are comsidered privilieged.
Anyway, I consider a display server that doesn't allow an application to query window positions and acreen resolutions and forbids explicit window placement to be broken as far as any serious GUI apllications are concerned. This combination of restrictions makes implementing features like sane tooltips and context menus impossible.
I am neither fond of Xorg nor Wayland. I have used both and I see shortcomings of both.
Xorg is Overly Bloated/Done without giving much importance to security.
On the other hand, Wayland is technically superior , might be more secure than Xorg. But, it's feature parity doesn't match with that of Xorg.
Now,i don't think those features must be pushed into Wayland itself. i don't know but something desktop oriented Wayland implementation with those missing pieces filled would be good idea.
But, will security suffer? Maybe. For example, Wayland is said to be secure because window can't see other windows. But, necessity arose due to requirement of Screen Recording so some API was created to access The Screen. So, didn't it make that security claim irrelevant because each window can access the screen? i think it's still not thought out properly.
Xorg already provides a full suite of security protocols that allow fine grained control over every aspect of any application down to the single pixmap via access control hooks.
The fact that nobody really uses them should tell you that the sandboxing craze and whitelisting is mostly if not completely security theater and hostile to the general workflow typical for Desktop applications.
Wayland addresses none of those problems. Wayland is merely just a protocol that can be used to blit some bitmaps together. The protocol says nothing about security except that those problems should be dealt with other protocols which are not part of Wayland.
Wayland is like X11. Xorg implements the X11 protocol. There are other X11 server implementations — XWin32 is one example on Windows.
Nothing has changed with Wayland except we have a new thing and lots of groups writing compositors. And this is great — Mutter, Kwin, wlroots, Mir — and they will all speak a common protocol for putting stuff on the screen and handing input events. And projects with similar use-cases “desktops” are standardizing on common dbus interfaces for non-display stuff.
This is genuinely so much better than the Xorg monoculture. Wayland’s design has made it possible for lots of different groups to implement display servers and have interoperability because what we had before was “X11 actually means do what Xorg does.”
There’s lots of in-fighting about the scope of Wayland and people that want to make a protocol for putting pixels on the screen also handle “desktop stuff” like audio, screenshots, screen recording, keybindings, input automation, authentication. I think this is misguided because it would effectively turn Wayland into a generic message bus between “apps with windows” and the display server when we already have a generic message bus for every application — dbus.
which after shaking out will be promoted to org.freedesktop.* after standardization. Despite the fact that notifications have been "DE specific" in the same way for years and years nobody seems to complain about org.freedesktop.Notifications.
Wayland does not care about security. In fact it does not even offer standardized access control protocols. The flaw here (among many others) is that the protocol that is responsible for negotiating display output for multiple processes (which is what Wayland tries to be) should also contain standardized interfaces to exchange that output among processes, but Wayland doesn't.
Yeah, Wayland is mostly a regression for my use cases: client-side decorations, breaking screen recording/sharing etc. I know there are theoretical issues with the X security mode, but this replacement is a classic case of “security” producing something that’s basically useless for what people use the software for.
Also, because X works over a socket, it’s language-agnostic in a way that a library just can’t be: libraries like CLX allow Common Lisp graphical applications with no dependency on a C library.
The biggest beef I have with wayland is it's security model. Ok, you don't want to let other applications listen to keystrokes, record the screen, etc, by default? Fine. There should be some low level implementation that pops up a box asking the user if they wish to allow that app to do that, not outright breaking the practice!
Wayland feels like some ivy tower academic's ideal of what a display system should be. Everything usable is an 'exercise left to the reader'.
Look, if you're a dev working on an open source project meant to replace a critical piece of infrastructure, don't be surprised when people don't like or adopt it when it breaks a lot of their uses cases.
I really don't get the security angle. If I'm already running arbitrary code then it doesn't matter that Wayland is more "secure" than X11. It's just silly security theater, because there are plenty of other ways you can access whatever other applications are doing and interfere with them, even if the display server doesn't allow you to do that directly.
The only security benefit I see is if you're running sandboxed applications and you want to connect them to your display server, then sure, there is a benefit here, but how many people are running untrusted GUI apps in a sandbox on a desktop?
Wayland is a protocol only. A wayland implementation may choose to incorporate the functionality of a window manager, or can create some way to outsource it to a separate executable.
The security policies are not questionable. XServer is beyond fixable. A single rogue extension can screen share everything you do to a remote server, you would not even recognize that.
The Wayland developers have a security model [-1] that is hostile to "power-users" (those who like to use the Unix (or other OS) programming environment to its full potential) and the visually impaired (eg., blind). See [0] [1] [2] to see what features I am talking about.
It is possible to implement some of those features on a per-compositor basis, but the result of that will be graphical API fragmentation, as programs that interact with GUIs will need to have separate code for each compositor. And the work is not done even for Gnome (more precisely Gnome's Wayland compositor and the Gnome applications that use it) yet.
On the other hand one could say, eg. "Why not make a compositor accessibility protocol on top of Wayland?". End result of that, it is easy to guess, would be something worse than X Windows (because of even more layers of abstraction, and possibly even more incompatible standards/APIs/protocols), which the Wayland people were supposedly trying to escape from.
Edit: Another thing that makes Wayland (at least without an extension ...) unsuitable to replace X Windows is forced compositing. This means unavoidable double buffering and thus worse video performance (especially noticeable for interactive stuff like video games).
[-1] I prefer calling it security theater, because it does not bring any real security improvement in practice.
I’m sorry, I may not have the time to answer every point you have made:
> I was not convinced that there was anything inherently wrong with X11-the-protocol
There is, the non-existant security model that can’t really be backfitted without breaking every program - in which case they can just as well fix all the bad parts.
> Is there a particular point in the video you want me to pay extra attention to that clarifies this?
I found the graphics of the client-compositor-Xserver vs client-compisitor under Wayland really informative. In modern usage, the Xserver actually acts more like a library and IPC bus, and is bad at the latter. Also, related to the API thing, there is no way to signal that a buffer is ready. You may not be interested in the “every frame is perfect”, but I like that I can watch a video in vlc without tearing.
Also, a wayland compositor can be much more lightweight than the whole xserver, because it is not as chatty (there is no useless communication to the xserver that communicates to the compositor for no reason)
It’s not without reason that wayland is/can be used in embedded systems.
> and did not see anything related to controlling which programs get to see which events
There is a one-to-one communication with the compositor and the client. Keyboard events, window resize and the like are sent to only a specific client. I may have worded it incorrectly that it is specified — I would rather say it has an inherent model for it, that can be changed with extension protocols when needed. But the default should not have been the everyone listens to everything and find what is interesting. (For example it is now possible that a global hotkey have to be registered and the compisitor will react to that based on the registration. But there can’t be a clash now and it will work reliably)
Also, in my opinion this flexibility (with which clients should not worry about) lets you create novel ways to interact with windows, that was not possible with X.
Also, you seem to think that there is all that much difference between compositor families —- it is not the case. The core and many extension libraries are while implemented multiple times, work in the same way. Thus a traditional client with some windows will just work. Some compositor have some custom extension for eg. having a specific status bar, which you may find bad since under X there could be cross-wm status bars etc. But realistically you could not have them eg. under gnome or kde without tinkering, so the status quo doesn’t really change.
> Also, putting core functionality that everyone must implement the same way into extensions just so they can call Wayland "just a protocol" or "just a display manager" is disingenuous
How would you create that API of X you mentionod? Wayland is a protocol, the core is mandatory. And it is in a repo, so that it can have versions — this is yet again an area where x is flawed. Even the core api can continue to evolve, and eg the compositor/client can both decide to support for example an older version — although in practise the core api is backward compatible. But a new feature for example can be used by a fresh client when available, with a proper way to fallback — due to the wl_registry.
> Does the wlroots project define what extensions are standard and required for a piece of software to call itself a Wayland compositor
That is the core protocol. You seem to have a misunderstanding around it. Otherwise, how would a wayland app work on every wayland compisitor? Wlroots can have some custom extensions and it does have , but you seem to misunderstand the point of those/scope of them. They are simple things like “a specific window that can work as a widget, eg don’t loose focus etc”.
Everything buffer related is core, and for example full screen WAS not part of the core initially, but an implementation that all compositors agreed on was merged and everyone implemented it many years ago.
> If you can't help me understand why Wayland could not have been implemented as an X extension
I’m trying to but you seem to have some grudge against the project.
I am no X developer so unfortunately I don’t have more knowledge on the topic than what I have already shared, but for example X developers tried to retrofit HiDPI to X, and things like mixed HiDPI over multiple monitors (hell, the whole multi-screen setup) simply can’t be done realistically — from what I gathered due to X API’s lack of semantic informations like scale. Wayland corrected the many many failings of the API in a future proof way that can avoid. Also, why do you think that basically every OS already changed to a compositor-based display server 2 decades ago? It is simply the better abstraction and this is a simple answer, but it is the fundamental one.
This is an incorrect assessment of the claim which is, "with Wayland it becomes possible to build a system where applications cannot record each-other's events." To the best of my knowledge the only approach with X11 that actually works is running each app with a separate X server. The author points out that being able to register global handlers or programmatically interact with other apps is useful which is why it's possible by asking the compositor nicely. The trick with libinput requires that the system already be pwnd.
Another way of saying the same thing, "you cannot meaningfully sandbox applications that talk to your X server."
> Network Transparency
Is a bad thing and a poor remote desktop solution. Let's say you have a desktop with an Nvidia graphics card and you've installed their drivers. Your laptop is using the OSS Intel drivers. You forward an OpenGL app from your desktop to your laptop -- it will crash. Why? Because Nvidia's libGL expects to talk to a server with Nvidia's GLX extension of the same version.
However, a ton of work is going into getting screencasting and remote desktop working under Wayland in a manner that will let you implement whatever protocol you care to speak on top RDP/VNC.
And VNC currently works on Mutter so I'm not sure why the author is bemoaning it's loss. It's the feature that's getting the most development time right now.
> Multiple clipboards
That's up to the implementor, Wayland had a very general API that's used to exchange data between clients that can be used to implement clipboards in any way the compositor sees fit. Just because GNOME/KDE don't want to maintain it doesn't mean it's not there.
> Restarting ends your session.
If your X server crashes it ends your session too. So this is really an argument that X.org is more stable than Mutter/Kwin. I don't think the devs will argue with that but the point is that they'll get there.
"Works for me" is a risky defence; if you are a slightly demanding Wayland user then it is fine, if you have unusually simple needs it isn't a useful contribution.
It isn't anything to do with the hardware, it is the design assumption that isolating application's input and output should be mandatory.
In hindsight; that was a design mistake. The correct design is probably something like isolation by default but optional (ie, allowing sharing). The current design means further protocols and de-facto standards are required to support, eg, streaming and screenshots. That is bad for an ecosystem that relies on low barriers to entry to get good software written.
Basically, there needed to be a security model but the developers skipped it because it seemed like it shouldn't be the compositor's job. And after a very painful couple of years, seems quite likely that it was the compositor's job.
There's nothing wrong with the transport layer's security, as far as I can tell. (For one thing, it's usually tunneled over SSH). The X11 security model assumes that anything with access to the screen has at least as much privilege as the user account that owns the session.
In practice, that's true (things like flatpack claim to support fake sandboxing, but whatever.)
Hypothetically, in a universe where Wayland was running on top of something other than the Linux desktop ecosystem, you could run untrusted code from bash with su, and have it talk to wayland directly, and it'd be OK. Today, you have to run it in a web browser, or using a VM with display virtualization drivers, or use any of a dozen other sandbox choices.
Those all managed to get the correct security properties under X11 because they're properly modularized. Wayland isn't, so to get this one feature, they're expecting all software to be rewritten.
>This lists quite a few assumptions. Are there many distributions that provide these assumptions out of the box (specifically, the compositor running all applications, and all applications being sandboxed)?
It's not the Wayland compositor's responsibility to provide an entire security solution, just like it's not the deadbolt manufacturer's responsibility to install an alarm system and security cameras. This isn't a criticism of Wayland, at best it's a criticism of your Linux distro.
Flatpak does answer this question, but fwiw in the Sway camp we're still looking into lighter-weight solutions.
>A Wayland compositor needs to implement: - the display server - the window manager - the compositor - the hotkey daemon - the screenshotting tool - everything else that is now unimplementable in Wayland, but needs to be in the Wayland compositor because users need them and Wayland compositor implementers have no choice but to provide the features as API extensions.
This is only partially true. wlroots implements screenshots the same way that xorg-server does: by providing an API for exporting pixels to clients. The actual screenshot tool is a standalone program which can crash without affecting the compositor. This design is virtually identical to X, except easier to secure. The same can be said of many of your other points.
From what I've seen, Wayland seems to follow a very narrow set of use cases, and willfully ignores others outside its philosophy, no matter what their practical applications might be.
From what I remember reading previously:
- Wine is unimplementable in Wayland. By design.
- Programs cannot set their own hotkeys in Wayland. You must use the desktop enviromnent's hotkey tool to set global hotkeys. (Desktop enviroments may provide APIs for this, but then you get the foreseeable issues with using e.g. KDE apps on Gnome. And what if you don't want a desktop "environment"?)
- Screenshotting and screen casting tools are unimplementable in Wayland, by design.
- Some Wayland implementations provide additional features, like screenshotting / hotkey registration, anyway, as API extensions. These extensions are not standardized and require individual support for each Wayland implementation in each program, thus causing further fragmentation.
- I use the X forwarding feature so often that I have a hotkey on my laptop to open an Emacs frame running on my desktop (on my laptop's screen). Being able to jump between machines while keeping the entire session, incl. open and unsaved files, is pretty great.
I can certainly think of situations where Wayland's advantages greatly outweigh its drawbacks, but its philosophy seems so much at odds with at least the way I interact with my computer, that the idea of switching to Wayland seems completely unfathomable. As far as I can see, at this rate, Xorg will live forever.
I've been instead working on my own little tool to fix the few annoyances in Xorg clients that I ran into (the ease with which I've achieved this and the flexibility of the tool speaks further in Xorg's favor):
I've done a spot of embedded linux programming before and it is a royal PITA to do anything but the simplest graphics. I can easily see a role for Wayland for that.
This is a great answer, thanks. Also, the comment about GUI isolation for security makes sense (though I suspect x11docker would solve the security issues as well). Seems like Wayland is on track to be a good solution for a niche use case.
My main issue with Wayland is how big of a missed opportunity it is. If you're going to introduce a massive breaking change like this to modernize the Linux desktop, the least you can do is make it an actually compelling design. Wayland was originally created to fix the parts of Xorg that made it difficult to use it for embedded devices (digital signage, car infotainment systems, etc), so the base protocol only defines enough functionality to make that use case possible. Anything more complex than that requires using a complicated patchwork of Wayland protocol extensions, some of which are specific to certain compositors and many of which duplicate the X11 protocol extensions they're intended to replace. I'd be surprised if Wayland isn't more complicated to use than X11 in a few years.
Additionally, Wayland is already showing its age. The example that first comes to mind is how events are handled. Device events are received from the kernel and placed in a 4KB buffer that clients are supposed to read from and process. If the buffer fills up without being processed (for example, if high CPU or I/O stress causes a program to get descheduled for a few seconds and the user moves his mouse over the window), the compositor will sever its connection with the program, causing it to crash. This wasn't a problem for Wayland's initial embedded use case, but if you're trying to use it on desktop with devices such as high-polling rate mice it makes it far more unstable than X11 unless you have enough system resources to guarantee that no program will ever have its event handler thread pause for any reason. See here for more information: https://gitlab.freedesktop.org/wayland/wayland/-/issues/159
I wish that the "X11 is showing its age and should be replaced, but Wayland isn't the right option" viewpoint was more mainstream.
One thing that a great majority of commenters here seem to be missing is that Wayland's issues that make it unsuitable to replace X Windows will not just be ironed out in a couple of years (at least not in a way that would be an improvement over X), because it is flawed by design.
The thing is, Wayland developers do not want you to take screenshots or automate input events (injection and interception). Those are both "power-user" and "accessibility" features. So respectively those who like to use the Unix (or other OS) programming environment to its full potential (hackers?), and the blind/visually impaired will have a hard time if Wayland gets forced on them.
It is possible to solve those problems with effort on a per-compositor basis (meaning less choice for users and more redundant programming effort - programs that interact with GUIs will need to have separate code for each compositor!), or with protocol extensions - that, of course, would not be universally accepted. For example, I think no compositor currently give a Wayland user the option to mess with input events (key presses, etc.). This means no hot-keys!
Quoting Red Hat: "Furthermore, there isn’t a standard API for getting screen shots from Wayland. It’s dependent on what compositor (window manager/shell) the user is running, and if they implemented a proprietary API to do so."
An interesting Reddit discussion: "It has been almost a decade, why does Wayland not have a protocol definition for screenshots?" - answer - "Because security, dude! Wayland is designed with the thought that users download random applications from the interwebz which are not trustworthy and run them. Wayland actually makes a lot of sense if you don't think of Linux desktop distributions and desktop systems, but of smartphones. But for some reason we absolutely need this technology on the desktop, like we had not enough pain and lose ends over here without it." [7]
See [1] [2]. And my previous comments on the same topic: [3] [4].
Another thing wrong with Wayland is that forced compositing means noticeably (in interactive applications) more latency.
Small nitpick regarding the blog post: Chromium depends on GTK3.
The security model for Wayland is that it should enable you to create secure systems if you wish to plug the rest of the holes. You can't really use Xorg in a secure system because being able to talk to the X server at all exposes too much privilege.
Screen recording (and taking screenshots) is a bad fit for the Wayland protocol because then what is a very small and lean protocol suddenly needs to know about things like image and video encodings. And since it's a display-only protocol you either have to teach Wayland about auto streams and mixing or stitch together your audio out of band.
But fine, we somehow all agree that such things are in the scope for a display protocol. Recording your screen is still a privileged operation and Wayland doesn't have an authentication or permissions system. How should they authenticate? In a way that a malicious program can't fake?
We end up just reinventing dbus, polkit, pulseaudio, pipewire but in a way that ensures that you can't swap out those components for something else.
If you care about standardization then it's really not that bad: GNOME and KDE typically work in tandem when desiging dbus interfaces and smaller DE's end up implementing those. So far the world hasn't ended for media controls and it won't end for screenshots.
reply