Wow, that is quite informative. I like this article very much. The content was good. If any of the engineering students are looking for a projects for renewable energy projects, I found this site and they are providing the best service to the engineering students regarding the projects <a href="https://takeoffprojects.com/renewable-energy-projects">renew... energy projects</a>
Probably won't replace zfs, but if the recovery / failure handling story is better than btrfs, maybe we'll see that one replaced. They're going to be fairly similar in features and concepts.
If it just has all the features zfs has and none of the years of stability that zfs has... why would any current user (who is by definition ok with the licence) take the effort to switch? What's the extra benefit that zfs currently doesn't provide?
Big chunk of memory - that's not the case except if you are using deduplication. I have it running fine on a 4GB machine (10TB mirrored volume) without any issues.
What I believe gp was referring to was that zfs uses different caching system that the main system. Specifically arc vs page cache do almost the same job, but are separate and may fight for resources. Discussed in a few places, but here's an example with some behaviour summary https://www.reddit.com/r/zfs/comments/o8xqzb/zfs_on_linux_ca...
Just being in the mainstream kernel, instead of an out-of-tree module, already gives a couple of benefits. The code will always be kept up-to-date with the internal kernel API, so you won't have a situation where the kernel is updated but the out-of-tree module fails to compile due to an API change. And if you're running a signed kernel (for SecureBoot), you don't need to have a complex setup to add an extra key and sign the module whenever it's rebuilt. Also, as others have mentioned, being in-tree means it's better integrated with the rest of the kernel; this includes making changes to core kernel code when they could help the module.
> What's the extra benefit that zfs currently doesn't provide?
Not being at the mercy of a company like Oracle is a huge plus in many ways. A huge plus for future development and risk-free adoption.
I use ZFS on a server of mine, but I am one of those paranoid people that would switch just to get away from a project that could be hamstringed at any moment if Oracle has one of its episodes again.
ZFS (hopefully) never finding its way into the mainline kernel is kind of a meta-disadvantage.
They can't just relicense OpenZFS. The only attack vector is suing Ubuntu for distributing zfs because Canonical's lawyers think that's fine but even then, you're still free to use it by adding it to the system yourself like any other distribution is doing.
Running a new filesystem is far more dangerous than your theoretical legal concerns.
They can go after end users if they manage to find a business angle where they can start charging for the system. "But that's silly, Oracle wouldn't be bothered by small businesses, right?" https://www.reddit.com/r/sysadmin/comments/d1ttzp/oracle_is_...
OpenZFS is licensed under a free software license. There's no mechanism by which they could demand anything. At best they could try to sue Canonical for CDDL violations, though the legal arguments online suggest that Oracle wouldn't have a leg to stand on here because the license incompatibility comes from the GPL side (meaning that Linux copyright holders might be able to sue Canonical but even that is a stretch since it's hard to argue OpenZFS is a derivative work of Linux).
The thread you linked is someone who was using software with a proprietary license against its terms and is being asked to pay for its usage -- obviously I think Oracle is being scummy but it's not a comparable situation at all. This would be like saying that you won't use VS Code because Microsoft once demanded that someone who was using a cracked copy of Windows pay them -- it's a complete non-sequitur.
Because right now it "has no features" because it's not really widely available. No installation, no features. The proof of the pudding is in the actual shipping and installing and using it. :)
CoW (copy-on-write), deduplication of data blocks, replicas (RAID equivalent), file system snapshots and multiple, tiered caching layers, plus several other features.
On a sidenote, I integrated bcachefs as a vmx driver into VMWare ESXi for a CI/CD build server a few years back. The build system ran in VMs and on containers, but the non-essential target directories sat on bcachefs volumes with the caching layer directed first at RAM, then at SSD, then finally at the HDD. Managing all the Unity3D and Unreal caches was amazingly fast across dozens of different SKUs of what the same project.
From my project write-up on my LinkedIn:
Bcache-like caching layer for VMWare ESXi
Reduce latency and read/write delay even if using SSD as your storage.
Written in C and inserting itself as a storage tier into VMWare ESXi to handle read & write storage requests this caching system accelerates all accesses to the underlying backing store.
Can work in both write-back and write-through caching modes. (Native C kernel device driver)
There's a ton of features, but really the most important one is that CoW filesystems tend to extend checksum protection to the actual data you're storing on them. Journaling filesystems generally only protect themselves and not the data.
The only small advantage I can think of is that the filesystem knows which parts of the drive are being used, so it will not try to recover stale/unused blocks. Same goes for writing - dm-integrity needs to initialise the whole drive and update every single write, but the fs can issue some big TRIMs instead. (Or does integrity know how to trim these days? (Apparently it depends on the usage mode https://www.kernel.org/doc/html/latest/admin-guide/device-ma...))
With dm-integrity you either have a small hole of a few milliseconds (bitmap mode) or write all data twice (journal mode). When integrating everything into a cow file system you can sidestep the issue, as you basically journal the data through cow anyway.
The thing that I’m most looking forward to: the ability to have multiple drives of different sizes, and expand the array with new oddly-sized drives (great for a homelab NAS built of spare parts; btrfs can do this, but zfs can't), combined with being able to set some drives as caches (eg having the filesystem automatically store frequently-read data on an SSD and rarely-read data on HDD; zfs can do this, btrfs can't)
I don't know about bcachefs' capability on this but with zfs, you can do an instant snapshot of MySQL/PostgreSQL data directory and call it a backup instead of fighting the tough fight that is database backup with their dump utilities taking a good amount of space and time and not easy to do incremental backups.
I really want a filesystem with snapshots as the foundation of a better backup system. Bcachefs has other useful features like CoW and disk pooling (in fact snapshots weren't even on the table until recently), but in my mind snapshots are a must-have feature these days, like journaling was 20 years ago.
Unfortunately, this has been just around the corner on Linux for over a decade now, and the two filesystems that promised to deliver it are unlikely to reach mainstream support on Linux. ZFS has licensing issues, and uses too much memory for a desktop system. BTRFS tried to do too many things, and has had too many reliability issues for me to trust it.
Licensing issue is only for distribution. You're free to use it without breaching the license. Not sure why people won't just use it.
How is zfs using too much memory?
zfs can run on a 2GB server (with some swap). Any laptop would have enough memory to run it. You might want to change "zfs_arc_max" as zfs tries to use half the memory available on a system if you don't set it and don't use deduplication.
reply