> Other than its native raid 5/6 story what major features is it lacking in comparison to zfs?
For example, native at-rest encryption. dm-crypt/luks is adequate, but has significant performance problems[0] on many-core hardware with highly concurrent storage (NVMe) due to excessive queuing and serialization. You can work around these things by editing /etc/crypttab and maybe sysctls.conf, but the default is pretty broken.
> The takeaway for me is that I'm OK with what's currently in Linux for the HDDs I use for my backups but I'd probably lose out if I encrypted my main SSD with LUKS.
Yep, when building my latest workstation, I went with a pair of ("regular") SSDs (RAID1) for my data. Later, I decided to add an NVMe for the OS for the additional speed.
I then went and encrypted all of the drives (via LUKS), however, which basically killed any additional performance I would've gotten from the NVMe drive. I would have been just fine as well off with only the SSDs and without the NVMe drive.
This was one of my concerns when I first encountered this trade off in Solaris zfs. That is, you claim that the Solaris zfs encryption is better in this regard but it’s not.
> The benchmarks I've seen do not make ZFS look all that great.
The thing about ZFS that actually appeals to me is how much error-checking it does. Checksums/hashes are kept of both data and metadata, and those checksums are regularly checked to detect and fix corruption. As far as I know it (and filesystems with similar architectures) are the only ones that can actually protect against bit rot.
> And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"
It has as much maintenance as any open source project: http://open-zfs.org/. IIRC, it has more development momentum behind it than the competing btrfs project.
> I have some archive data stored on GELI drives; this is a welcome development.
Sure, I have a few luks encrypted drives (having run Linux, not freebsd).
But I'm inclined to move them over to zfs.
> GELI is a much more straightforward way to deliver an encrypted block device compared to an encrypted Zvol.
Maybe if you need an encrypted block device - but with zfs in Linux and freebsd as standard, experimental support on windows, Mac - I'm not sure I agree geli/cryptsetup is an easier way to deliver an encrypted (at rest) filesystem.
More to the point - between experimental geli support on Linux, and stable zfs on Linux and freebsd - I would strongly prefer zfs. (or freebsd and geli for accessing archives that are tricky to move over to zfs for some reason).
> Whoever seems to be singing the praises of ZFS on Linux hasn't put it through its paces in modern, multi-tenant container workloads.
I ran a Hadoop Cluster with it? Does that count? Your problem is probably the ARC and memory problems due to slow shrinking or stuff like that? There is some work or at least the intention to use the pagecache infrastructure for the ARC to make things more smooth. However at the moment it's still vmalloc afaik.
You can reduce the ARC size and you'll be probably fine with your containers if they need a lot of memory.
The SPL isn't so bad it's more or less wrappers.
have fun with btrfs! It's a horrid mess! Looks like you never had the pleasure! btrfs is also a non starter on basically everything that goes beyond a single disk - even their RAID-1 implementation is strange, RAID5,6 are considered experimental and I could go on.
> Linux has also been developing its own ZFS-like filesystem, btrfs. Since it's been developed in the open (unlike early ZFS), people tried earlier ("IS EXPERIMENTAL") versions that had serious issues, which gave it something of a bad reputation. It's much better nowadays, and has been integrated in the Linux kernel tree (fs/btrfs), where it is maintained and improved along with the kernel code. Since ZFS is an add-on developed out-of-tree, it will always be harder to get the same level of attention.
So long as there exists code in BTRFS marked "Unstable" (RAID56), I refuse to treat BTRFS as production ready. If it's not ready, fix it or remove it. I consistently run into issues even when using BTRFS in the "mostly OK" RAID1 mode.
I don't buy the implication that "it will always be harder to get the same level of attention" will lead to BTRFS being better maintained either. ZFS has most of the same features plus a few extra and unlike BTRFS, they're actually stable and don't break.
I'm no ZFS fanboy (my hopes are pinned solidly on bcachefs) but BTRFS just doesn't seem ready for any real use from my experience with it so far and it confuses me. Are BTRFS proponents living in a different reality to me where it doesn't constantly break?
EDIT: I realize on writing this that it I might sound more critical of the actual article than I really am. I think his points are mostly fair but I feel this particular line paints BTRFS to have a brighter, more production-ready future than I believe is likely given my experiences with it. BTRFS proponents also rarely point out the issues I have with it so I worry they're not aware of them.
> Not really. I've tried it, and it still has pain points I'd not like to have in my filesystem. It's like ZFS almost a decade ago (and I'm not talking about features)... although ZFS on Linux vs. btrfs on Linux... right now I'd still go with btrfs.
Care to elaborate? I've never tried ZFS, but been very happy with btrfs for my smalltime personal usage, I'm wondering why people find it so painful in comparison.
> What is the purpose of ZFS in 2021 if we have hardware RAID and linux software RAID?
Others have touched on the main points, I just wanted to stress that an important distinction between ZFS and hardware RAID and linux software RAID (by which I assume you mean MD) is that the latter two present themselves as block devices. One has to put a file system on top to make use of them.
In contrast, ZFS does away with this traditional split, and provides a filesystem as well as support for a virtual block device. By unifying the full stack from the filesystem down to the actual devices, it can be smarter and more resilient.
The first few minutes of this[1] presentation does a good job of explaining why ZFS was built this way and how it improves on the traditional RAID solutions.
> What is the purpose of ZFS in 2021 if we have hardware RAID
Hardware RAID is actually older then ZFS style software RAID. ZFS was specifically designed fix the issues with hardware RAID.
The problem with Hardware RAID is that is has no ideas what going on on top of it, and even worse, its a mostly a bunch of closed-source fireware from a vendor. And they cost money.
You can find lots of terrible story about those.
ZFS is open-source and battle tested.
> linux software RAID
Not sure what you are referring too.
> BTRFS does RAID too.
BTRFS is basically copied many of the features done in ZFS. BTRFS has a history of being far less stable. ZFS is far more battle tested. They say its stable now, but they had said that many times. It eat my data twice so I have not followed the project anymore. A file system in my opinion gets exactly 1 chance with me.
They each have some features the other doesn't but broadly speaking they are similar technology.
The new bcacheFS is also coming up and adding some interesting features.
> Why would people choose ZFS in 2021 if both Oracle and Open Source users have 2 competing ZFS?
Not sure what that has do with anything. Oracle is an evil company, they tried take all these great open source technologies away from people and the community thought against it. Most of the ZFS team left after the merger.
The Open-Source version is arguable better, and has far more of the original designers working on it. The two code bases have diverged a lot since then.
At the end of the day ZFS is incredibly battle tested, works incredibly well at what it does. And had a incredible reputation of stability basically since it came out. They question in my opinion is why not ZFS, then why ZFS.
>The main issue with OpenZFS performance is its write speed.
>While OpenZFS has excellent read caching via ARC and L2ARC, it doesn't enable NVMe write caching nor does it allow for automatic tiered storage pools (which can have NVMe paired with HDDs.)
Huh? What are you talking about? ZFS has had a write cache from day 1: ZIL. ZFS Intent Log, with SLOG which is a dedicated device. Back in the day we'd use RAM based devices, now you can use optane (or any other fast device of your choosing including just a regular old SSD).
>Did you miss the part where I mentioned "for personal use"?
Since ZFS is simpler to use then your setup, is used to store 55PB of data without a single bit error since 2012, i don't see why someone should use inferior stuff, even when it's "personal use".
>But many small tools focused on just the functionality I need allows me to build a simpler system overall.
Sometimes monoliths are better for example the network-stack and storage....maybe kernels (big Maybe here)
Ahm, what? SLES provides Btrfs as the default choice.
> With SUSE Linux Enterprise 12, we went the next step of innovation and started using the copy-on-write file system Btrfs as the default for the operating system, to support system snapshots and rollback.
Also Facebook [1].
And yes, I use it everywhere.
Oh, and ZFS is full of bugs as well, it's far from done and being actively fixed and improved by ZoL people, that's why freebsd people are switching to ZoL.
I've never heard much arguments why ZFS is better than Btrfs, I would like to hear some (safe for RAID6).
For all the talk about the raid5/6 issues on btrfs, people don't seem motivated enough to actually spend time fixing this. It's almost as if mdadm was enough and there wasn't that much drive to make it happen.
Yes! Managing our files is the whole point of file systems! It's amazing how bad at it most of them are. Linux is still catching up with btrfs...
It's extremely aggravating how most file systems can't create a pool of storage out of many drives. We end up having to manually keep track of which drives have which sets of files, something that the file system itself should be doing. Expanding storage capacity quickly results in a mess of many drives with many file systems...
Unlike traditional RAID and ZFS, btrfs allows adding arbitrary drives of any make, model and capacity to the pool and it's awesome... But there's still no proper RAID 5/6 style parity support.
> If the FS reads data and gets a checksum mismatch, it should be able to use ioctls (or equivalent) to select specific copies/shards and figure out which ones are good. I work on one of the four or five largest storage systems in the world, and have written code to do exactly this (except that it's Reed-Solomon rather than RAID).
This is all great, and I assume it works great. But it is no way generalizable to all the filesystems Linux has to support (at least at the moment). I could only see this working in a few specific instances with a particular set of FS setups. Even more complicating is the fact that most RAIDS are hardware based, so just using ioctls to pull individual blocks wouldn’t work for many (all?) drivers. Convincing everyone to switch over to software raids would take a lot of effort.
There is a legitimate need for these types of tools in the sub-PB, non-clustered, storage arena. If you’re working on a sufficiently large storage system, these tools and techniques are probably par for the course. That said, I definitely have lost 100GBs of data from a multi-PB storage system from a top 500 HPC system due to bit rot. (One bad byte in a compressed data file left the data after the bad byte unrecoverable). This would not have happened on ZFS.
ZFS was/is a good effort to bring this functionality lower down the storage hierarchy. And it worked because it had knowledge about all of the storage layers. Checksumming files/chunks helps best if you know about the file system and which files are still present. And it only makes a difference if you can access the lower level storage devices to identify and fix problems.
> It really surprises me that zfs apparently cannot do this.
Likewise. I really want to like ZFS, but with the 'buy twice the drives or risk your data' approach as above really deters me as a home user.
ZFS has been working on developing raidz expansion for a while now at https://github.com/openzfs/zfs/pull/8853 but I feel that it's a one-man task with no support from the overall project due to that prevailing attitude.
BTRFS is becoming more appealing, even though it has rough edges around RAID write holes that really isn't a big deal, and reporting of free space. I can see my home storage array going to BTRFS in the near future.
> The post was literally about how ZFS compression saves them millions of dollars.
... relative to their previous ZFS configuration.
They didn't evaluate alternatives to ZFS, did they? They're still incurring copy-on-write FS overhead, and the compression is just helping reduce the pain there, no?
> The Let's Encrypt post does not describe how they implement off-machine and off-site backup-and-recovery. I'd like to know if and how they do this.
The section:
> There wasn’t a lot of information out there about how best to set up and optimize OpenZFS for a pool of NVMe drives and a database workload, so we want to share what we learned. You can find detailed information about our setup in this GitHub repository.
> Our primary database server rapidly replicates to two others, including two locations, and is backed up daily. The most business- and compliance-critical data is also logged separately, outside of our database stack. As long as we can maintain durability for long enough to evacuate the primary (write) role to a healthier database server, that is enough.
Which sounds like traditional master/slave setup, with fail over?
For example, native at-rest encryption. dm-crypt/luks is adequate, but has significant performance problems[0] on many-core hardware with highly concurrent storage (NVMe) due to excessive queuing and serialization. You can work around these things by editing /etc/crypttab and maybe sysctls.conf, but the default is pretty broken.
[0]: https://blog.cloudflare.com/speeding-up-linux-disk-encryptio...
reply