>This lists quite a few assumptions. Are there many distributions that provide these assumptions out of the box (specifically, the compositor running all applications, and all applications being sandboxed)?
It's not the Wayland compositor's responsibility to provide an entire security solution, just like it's not the deadbolt manufacturer's responsibility to install an alarm system and security cameras. This isn't a criticism of Wayland, at best it's a criticism of your Linux distro.
Flatpak does answer this question, but fwiw in the Sway camp we're still looking into lighter-weight solutions.
>A Wayland compositor needs to implement: - the display server - the window manager - the compositor - the hotkey daemon - the screenshotting tool - everything else that is now unimplementable in Wayland, but needs to be in the Wayland compositor because users need them and Wayland compositor implementers have no choice but to provide the features as API extensions.
This is only partially true. wlroots implements screenshots the same way that xorg-server does: by providing an API for exporting pixels to clients. The actual screenshot tool is a standalone program which can crash without affecting the compositor. This design is virtually identical to X, except easier to secure. The same can be said of many of your other points.
> massive fragmentation issues across the graphical part of the Linux ecosystem
This is a fair criticism. It is true that we see each of the compositors rolling its own extensions to the protocol. But I believe wayland is trying to standardize a lot of the protocols which should cover common use cases.
> each new compositor implementation reinvents the wheel many times
This is not true. There are libraries like wlroots[0] which you can build on top of. You don't have to redo the work that is already done in other compositors.
> application that needs to grab the entire screen will need separate code for each compositor it supports screenshots
There is the PipeWire[1] effort to support screenshot and screen capture on wayland in a unified way.
> Wayland is also, with its forced composition, hostile to interactive applications requiring low latency, e.g. video games.
Also not true. On a properly implemented compositor, video games frames shouldn't have longer latency to screen than on Xorg. X server has to do composition too, whether you run a compositor or not.
> Seems to me that if there's a necessary 50k LoC that every compositor needs, it should actually be a part of Wayland.
Wayland is an IPC mechanism and a set of core protocols for input (passing keyboard, mouse, etc. events to applications) and presentation (communicating rendered surfaces back to the compositor), not a compositor or a desktop environment in its own right. There is a Wayland library but its only concern is implementing the IPC system. Those 50k lines of code would be out of scope.
The wlroots library implements low-level input processing and rendering functions and various other protocols and standards which are common to all desktop environments; you can think of those 50k lines as a replacement for the parts of the X server which are not directly concerned with input and presentation, such as clipboard support, screen recording, the system tray, negotiation of window roles, and much more. On top of wlroots (or its equivalent) you have the actual compositors, like Sway, which perform the function of an X11 window manager and determine the high-level look & feel of the desktop as a whole.
As part of the transition some of the roles have shifted between the various components, usually for good reason. For example, security is much more of a concern now than it was when the X11 protocol was designed, which is where most of the issues relating to screen recording, mouse/keyboard grabbing, and primary selection (middle-click paste) originate. These things could be implemented just as they are in X11, but the inherent security issues of e.g. any application being able to observe what is displayed or selected in another application are difficult to mitigate without impacting the user experience. The Wayland-adjacent project developers involved are attempting to come up with solutions that actually improve the status quo, not just copy forward all the issues that were present in X11. At the same time I believe it is well understood that simply eliminating screen casting or global key bindings is not on the table; we need to be able to accomplish the same goals even if we have to do it a slightly different way.
> Wayland is only supported by compositors that implement the wlr-layer-shell protocol. Typically wlroots-based compositors.
I'm really concerned about Wayland fragmentation. Will some tools work on only wlroot implementations and not others? X11 apps generally work across window managers, although the weird ones (tiling window managers like i3) may have some interesting things you have to work around.
If wlroots became the standard for all window managers on Wayland and everyone used it, I guess it would be fine. But if not, we're going to see a lot of apps that have to be adapted for each and every composer.
>> Xorg [...] allowed arbitrary stuff to build on top.
> Same as wayland.
No, because Wayland puts everything on the compositor. X11 let me run the same screenshot tool on GNOME, KDE, and dwm. In wayland, the only way to support GNOME, KDE, and sway is to add multiple backends. In X11, window managers and compositors were interchangeable and orthogonal; in Wayland, the "window manager" is subsumed into the compositor (now mandatory), which is the display server. I really like keynav. AFAICT, implementing an equivalent in Wayland requires integration into each compositor. There's still a monolith, but it got moved up the stack to a place where it prevents piecemeal environments.
> That means the screenshot application is, to some degree, compositor specific
This is not the intention under Wayland. Wayland protocol development is geared towards compositors implementing protocols for experimentation, and then basically "upstreaming" the result. Even before upstreaming, compositors share work (e.g. wlroots implemented at least one kde-specific protocol, that has now been superceeded by the resulting upstream work).
Also, all Wayland protocols are advertised in the registry, so applications don't do "I'm on KDE, so...", and instead just use the protocols available. It is therefore not a problem if KDE implements a GNOME protocol, for example. It'll just work.
Screenshot and recording will be a shared protocol, but due to wanting a proper protocol with goodies like damage tracking for performance, it's a little bit more work than you would think. We certainly don't want X's "dumb" approach—we want something faster and safer (that is, functionality guaranteed without tearing or fun stuff like that).
> A lot of very basic features (remote desktop, screen sharing, exclusive fullscreen, keyboard/mouse shortcuts) are still in their infancy.
- remote desktop: good point. Considering that screen recordings work fine, I don't see a technical reason that a Wayland remote desktop setup couldn't work.
- exclusive fullscreen: what do you mean by this? Using Firefox, F11 and Super-F both work in Sway and F11 works with GNOME.
- keyboard/mouse shortcuts: maybe? The fact that a random process can no longer read all of your keystrokes seems like a plus, to me. Otherwise, just add a shortcut to your window manager or DE and run a command of your choice.
- losing all apps on compositor restart: Only an issue with GNOME and KDE. Pressing Super-Shift-C on Sway reloads the compositor and not your apps.
> That last issue could be solved with a reusable library that provided basic compositor functionality for window managers. Except every member of the Wayland community has independently had that idea and written their own
Actually the opposite is true.
Recently everyone is converging on re-using the same compositor library, written by among others sway’s author: wlroots.
> It is a display protocol, responsible for displaying things. A compositor is a must for that
Er, are we using different meanings? Obviously something has to display pixels, but I literally do run X without a compositor.
> a window manager, and there is no reason why that can’t be done with this model (and is pretty much what wlroots does).
No, you can't (excepting rootfull xwayland, obviously), Wayland forces the display server, compositor, and window manager to be a single program. All wlroots does is make it easier to write that compositor, it hardly changes the model.
>The problem with "alternative" Wayland compositors is that so much more functionality is pushed into the compositor under the Wayland architecture compared to window managers on X11.
That comparison is a bad one to make. Compared to just a window manager, that's correct. But you have to compare Wayland compositors to X11 compositors and for that it's about the same amount of work. Either one expects you to bring your own rendering because that's the main benefit of writing a compositor.
It also might make sense to compare X11 window managers to compositor plugins. The reason people seem to prefer wlroots is precisely because it offers more power and more control over the just writing a window manager.
> wayland: if we move the compositor into the kernel we can solve several performance problems.
Where exactly does Wayland introduce a kernel-based compositor?
Existing Wayland implementations just consolidate what was often several userspace processes in the composited X11 model into a single monolithic userspace process responsible for the window management, input multiplexing/routing, and compositing. I'm not certain this is even a strictly specified aspect of Wayland either, it may just be convenient and simpler to implement this way while enjoying some natural performance advantages.
There's no kernel Wayland compositors AFAIK. Perhaps you're conflating KMS with Wayland since they kind of overlapped chronologically? KMS enabled running XOrg without requiring root using the modesetting driver, it's been a huge step towards cleaning up the graphics stack on Linux, independent of Wayland.
Preamble: This post is genuinely not meant in bad faith. I have glanced over the Wayland source, it is very nice, clean code, and I have no doubt at all that the wayland developers are very competent and focused people, and I admire the work they done and the motivation they have.
I won't rule out that I don't understand all relevant concepts in wayland yet, if so I'd like to apologize already.
Now to the things which - as per the title - frustrate me. The year of "Linux on Desktop" is almost there, as we all know - jokes aside, Linux really gets more and more traction even outside of developers. And while developers are fine to tinker and modify the system, and do workarounds, regular users don't like that and often even can't help themselves.
So, here is a list of some applications one expects from a modern desktop OS:
- Screenshots
- Screen recorders
- Password managers
- Accessibility helpers:
- Screen readers
- Application-sensitive keyboard augmentations (i.e. key reconfiguration based on currently focused application)
- Magnifier
- Remote desktop software (both server and client)
Now, as far as I understand the wayland model, _all_ of the above must be handled by the compositor, not by wayland itself.
The compositor exists to keep the wayland core small and to not over-generalize concepts to make different window environments, like Gnome, KDE, Xfce etc. work. Instead, each window environment has to implement its own compositor.
This is per se obviously okay. But for all of the above points there is no protocol or standard defined, to the best of my knowledge. That means, and appears to be the reality, that all those functions are either missing or are implemented in a compositor-specific way¹.
This in turn means that all of the above tools need to be re-written and adapted to each possible compositor, or that each window environment provides the required tooling, or that some outside party defines a standard which all compositors implement which facilitates a common API for all of the above.
And this is the frustrating point: we had a really nice ecosystem of tools for all of the above points. Now we don't anymore. And it doesn't appear to me that fixing this is any priority for the wayland team.
Now, please help me: do I miss something important, if so please inform me. Otherwise, what would you propose on how to support the wayland team here? I have the impression that even if someone were to provide high quality, working merge requests for parts of the above, they would be rejected due to "security considerations" or that they don't belong into wayland core.
Some examples of related bug reports or discussions:
* https://gitlab.freedesktop.org/wayland/wayland/-/issues/326
(which links to a very interesting bug in xdg-desktop-portal, where several accessibility-related projects are involved in the comments)
>Wayland protocol is much more restrictive than X's -- things like screen capture and window managers don't work because the APIs they need aren't present
This is not quite true. There are protocols for these things in some implementations. You'd want to keep compatibility with these because otherwise you'd have no reason to be supporting Wayland. Keep in mind either way you're talking about using a specific compositor—it doesn't matter if the APIs aren't present in the protocol. In this situation you'd re-use one from another implementation or add another private protocol extension. That's always been what individual compositors are encouraged to do. It's no different here.
> without some extensions to the API that Wayland developers
Depends on what what you mean by "Wayland developers". Wlroots, and thus sway, support such extensions and I think KDE is open to standardizing such extensions. Gnome supports remote desktop through an xdg portal. The problem is that, as with several other things, all of the compositors haven't agreed on a single standard.
I can vouch for the complete opposite. I use sway, an i3 like Wayland compositor (compositor = window manager in the confusing wayland lingo), and its completely problem free, only changing kern.evdev.rcpt_mask to get input working and specifying a sway socket file thingy is needed. (Something I am confused as to why package maintainers don't set by default)
Maybe the wayland mode of other Window Managers is broken, which can have both X11 and Wayland modes.
I have a lot of disagreements about things said here. For example, I suspect wlroots based compositors are not particularly inefficient, therefore I’m not sure I strongly agree with the ewaste issue. GNOME 3 doesn’t seem particularly lightweight either way; most low end devices would not be running GNOME and KDE to begin with. Also, wlroots is listed alongside GNOME and KDE, but it is a library or framework for Wayland compositors. It’s an ecosystem of compositors. wlroots is basically a logical conclusion of Wayland’s different approach versus X; since the DE/WM is the compositor, it makes sense to make the common display server components into a library, the same way you would hope GNOME and KDE would not reimplement Xorg (though the lower surface area of Wayland and shared components like libinput and libwayland make it more tenable that they do in fact implement their own Wayland compositors largely from scratch.)
> However, Xorg code can be secure! Just look at xscreensaver,
I am rubbed the wrong way by this, because it sounds a lot like “just don’t make mistakes!” — if there’s any way to crash xscreensaver, that is a potential security issue...
There are a number of counter arguments to the parent article in "Why I'm not going to switch to Wayland yet."[1]
On the subject of leaving the responsibility of taking screenshots to the Wayland compositor:
"for simple things using the compositor's screen shot tool is fine. But what if I don't like the screenshot tool for my compositor of choice? My experience with the GNOME screenshot tool (granted this was pre-wayland) was that it wasn't as good as, say, shutter, which has a lot of options, let's you easily crop and edit the screenshot from inside the screenshot tool etc. And then swaygrab doesn't even (currently) have an option to capture a rectangular region."
There are some other things "Why I'm not going to switch to Wayland yet" mentions which are important to me, like Wayland's lack of color picker tools and xdotool functionality.
The parent article says:
"Wayland doesn't have network transparency! This is actually true! But it's not as bad as it's made out to be. Here's why: X11 forwarding works on Wayland. Wait, what? Yep: all mainstream desktop Wayland compositors have support for Xwayland, which is an implementation of the X11 server which translates X11 to Wayland, for backwards compatibility. X11 forwarding works with it! So if you use X11 forwarding on Xorg today, your workflow will work on Wayland unchanged."
So why wouldn't I just use xorg to begin with?
Overall, Wayland just seems immature to me, and not a viable competitor to Xorg, which has had all of this functionality for decades. As an Xorg user, I really struggle to come up with compelling reasons to switch.
It's not the Wayland compositor's responsibility to provide an entire security solution, just like it's not the deadbolt manufacturer's responsibility to install an alarm system and security cameras. This isn't a criticism of Wayland, at best it's a criticism of your Linux distro.
Flatpak does answer this question, but fwiw in the Sway camp we're still looking into lighter-weight solutions.
>A Wayland compositor needs to implement: - the display server - the window manager - the compositor - the hotkey daemon - the screenshotting tool - everything else that is now unimplementable in Wayland, but needs to be in the Wayland compositor because users need them and Wayland compositor implementers have no choice but to provide the features as API extensions.
This is only partially true. wlroots implements screenshots the same way that xorg-server does: by providing an API for exporting pixels to clients. The actual screenshot tool is a standalone program which can crash without affecting the compositor. This design is virtually identical to X, except easier to secure. The same can be said of many of your other points.
reply