This looks really cool! though its a bit limited since it is a FUSE module and not a kernel driver, and unlikely to become a kernel module since it is written in C++ with large dependencies :-\
Would it be possible to take the core design changes here and apply them to squashfs, and maybe propose a next major version of the squashfs internal format to make all these things possible?
I would use this if it didn't depend on OS-specific features. Squashfs is not portable to Windows, unless you extract it to disk.
I actually prefer the jar/Tomcat model, where the read-only image gets distributed to servers, and when you run the app the image gets unpacked to disk as needed. You could also write I/O wrappers that would obviate the need to extract them to disk, and you could even make compression optional to reduce performance hits.
It seems like all you really need is a virtual filesystem implemented as a userspace i/o wrapper. Basically FUSE but only for the one app. There's no need for the FUSE kernel shim because only the application is writing to its own virtual filesystem. So this would work on any operating system that supported applications that can overload system calls.
For example, I would start with this project http://avf.sourceforge.net/ and modify it to run apps bundled with itself. With FUSE installed, other apps could interact with its virtual filesystem, but without FUSE, it could still access its own virtual filesystem in an archive. I would then extend it by shimming in a copy-on-write filesystem to stack modifications in a secondary archive.
However, since it's a FUSE only file system, it's difficult to see how it would be used on embedded system firmware, so it could perhaps see use as a distribution mechanism. Similar to tar or zip files, but possibly with (much) better performance for random access, should you need only smaller portion of the whole archive.
The author indicates need for keeping multiple similar copies of sets of unchanging files on their computer, and made this to reduce the space needed for them, while retaining the access through the file system. So that is also a use case.
DwarFS may be good, but it's not in the Linux kernel (depends on FUSE). That makes it less universal, potentially significantly slower for some uses cases, and also less thoroughly tested. SquashFS is used by a lot of embedded Linux distros among other use cases, so we can have pretty high confidence in its correctness.
Awesome work. Did your team evaluate creating a virtual filesystem that could process the SquashFS images without involving the kernel? Having completely independent executables that could run on _any_ system with zero additional install would be sweet.
To clarify - a stub in each XAR would act as a filesystem driver and intercept calls to open/read/etc, redirecting them to the internal data blob.
We actually started with using "real" squashfs files. This had three main disadvantages:
- We had to maintain our own setuid executable to perform the loopback setup and mount (rather than relying on the far more tested and secure open source fusermount setuid binary that all FUSE file systems rely on)
- Getting loopback devices to behave inside of containers (generally cgroup and mount namespace containers) was a little tricky at times in some of our environments
- We didn't want to have a huge number of extra loopback devices on every host in our fleet
In fact, after implementing the loopback-based filesystem version, we almost abandoned XAR as the downside of the security considerations and in-container behavior wasn't ideal. The open source squashfuse FUSE filesystem really is what made it possible.
Another side benefit is we could iterate far faster with squashfuse -- this let us fix some performance issues, add idle unmounting, and implement zstd-based squashfs files, and then deploy that to our fleet, faster than we could deploy a kernel to 100% of hosts.
And squashfs works the same way - mksquashfs takes a directory as input and writes a file as output. That file can then be loopback-mounted to present the readonly filesystem.
I haven't had a chance to use it yet, but https://github.com/mhx/dwarfs claims to be times faster than squashfs, to compress much better, and to have full FUSE support.
Surprisingly, the article doesn't seem to mention SquashFS[1] or EROFS[2].
Both SquashFS and EROFS are filesystem specifically designed for this kind of embedded, read-only use case. The former is optimized for high data density and compression, and already well established. The later is comparatively new and optimized for high read speed.[3] SquashFS as a rootfs can already be found in many embedded applications using flash storage and is typically also combined with tmpfs and persistent storage mount points or overlay mounts.
For both those filesystems, one would build a rootfs image offline. In the Debian ecosystem, there already exists a tool that can bootstrap a Debian image into SquashFS[4].
What a coincidence! For a project that I maintain (Buildbarn, a distributed build cluster for Bazel) I recently generalized all of the FUSE code I had into a generic VFS that can both be exposed over FUSE and NFSv4. The intent was the same: to provide a better out of the box experience on macOS. Here's a design doc I wrote on that change. Slight warning that it's written with some Buildbarn knowledge in mind.
Fortunately, fuse-t doesn't make any of my work unnecessary. Buildbarn uses go-fuse, which talks to the FUSE character directly instead of using libfuse. fuse-t would thus not be a drop-in replacement. Phew!
PS: A bit unfortunate that fuse-t isn't Open Source. :-(
Slightly related: we also recently switched to SquashFS for the gokrazy.org’s root file systems.
If you’re curious about how SquashFS works under the hood, check out https://github.com/gokrazy/internal/blob/master/squashfs/wri.... I also intend to publish a poster about it at some point.
reply