> I'm not an expert (but, so most people are no experts in this area), but i have a feeling that ZFS needs huge amounts of memory (compared to ext). It does nifty stuff, for sure. But do i need all of that? Hardly.
ZFS needs more memory than ext4, but reports for just how much memory ZFS needs is grossly over estimated. At least for desktop usage - file servers are a different matter and thus that's where those figures come from.
To use a practical example, I've ran ZFS + 5 virtual machines on 4GB RAM and not had any issues what-so-ever.
> For example, i am wondering why i would want to put compression into the file system.
A better question would be, why wouldn't you? It happens transparently and causes next to no additional overhead. But you can have that disabled in ZFS (or any other file system) if you really want to.
> Or deduplication.
Deduplication is disabled on ZFS by default. It's actually a pretty nichely used feature despite it's wider reporting.
> Or the possibility to put some data on SSD and other on HDD. If i have a server and user data, it should be up to the application to do this stuff. That should be more efficient, because the app actually knows how to handle the data correctly and efficiently. It would also be largely independent of the file storage.
I don't get your point here. ZFS doesn't behave any differently to ext in that regard. Unless you're talking about SSD cache disks, in which case that's something you have to explicitly set up.
> I've seen some cases where we had a storage system, pretending to do fancy stuff and fail on it. And debugging such things is a nightmare (talking about storage vendors here, though). But for example, a few years ago we had major performance problems because a ZFS mount was at about 90% of space. It was not easy to pinpoint that. After the fact it's clear, and nowadays googling it would probably give enough hints. But in the end I would very much like that my filesystem does not just slow down depending on how much filled it is. Or how much memory i have.
ZFS doesn't slow down if the storage pools are full; the problem you described there sounds more like fragmentation, and that affects all file systems. Also the performance of all file systems is also memory driven (and obviously storage access times). OS's cache files in RAM (this is why some memory reporting tools say Windows or Linux is using GB's RAM even when there's little or no open applications - because they don't exclude cached memory from used memory). This happens with ext4, ZFS, NTFS, xfs and even FAT32. Granted ZFS has a slightly different caching model to the Linux kernel's, but file caching is something that is free memory driven and applies to every file system. This is why file servers are usually speced with lots of RAM - even when running non-ZFS storage pools.
I appreciate that you said none of us are experts on file systems, but it sounds to me like you've based your judgement on a number of anecdotal assumptions; in that the problems you raised are either not part of ZFS's default config, or are limitations present in any and all file systems out there but you just happened upon them in ZFS.
> i am wondering if it is just too much. For example, nowadays you have a lot of those stateless redundant S3 compatible storage backends. Or use Cassandra, etc. Those already copy your data multiple times. Even if they run on ZFS, you don't gain much.
While that's true, you are now comparing Apples to oranges. But in any case, it's not best practice to be running a high performance database on top of ZFS (nor any of CoW file system). So in those instances ext4 or xfs would definitely be a better choice.
FYI, I also wouldn't recommend ZFS for small capacity / portable storage devices nor many real time appliances. But if file storage is your primary concern, then ZFS definitely has a number of advantages over ext4 and xfs which aren't over-complicated nor surplus toys (eg snapshots, CoW journalling, online scrubbing, checksums, datasets, etc).
I'm not an expert (but, so most people are no experts in this area), but i have a feeling that ZFS needs huge amounts of memory (compared to ext). It does nifty stuff, for sure. But do i need all of that? Hardly.
For example, i am wondering why i would want to put compression into the file system. Or deduplication. Or the possibility to put some data on SSD and other on HDD. If i have a server and user data, it should be up to the application to do this stuff. That should be more efficient, because the app actually knows how to handle the data correctly and efficiently. It would also be largely independent of the file storage.
I've seen some cases where we had a storage system, pretending to do fancy stuff and fail on it. And debugging such things is a nightmare (talking about storage vendors here, though).
But for example, a few years ago we had major performance problems because a ZFS mount was at about 90% of space. It was not easy to pinpoint that. After the fact it's clear, and nowadays googling it would probably give enough hints.
But in the end I would very much like that my filesystem does not just slow down depending on how much filled it is. Or how much memory i have.
edit: Also, just to clarify. I think that Sun had one of the best engineers of the whole industry. Everything they did has been great. Honestly, i have huge respect for them and also for ZFS. I still think that ZFS is great, but in the end, i am wondering if it is just too much. For example, nowadays you have a lot of those stateless redundant S3 compatible storage backends. Or use Cassandra, etc. Those already copy your data multiple times. Even if they run on ZFS, you don't gain much. If you run ext4 and it actually loses data, the software cares about that. That's just one case, and of course it depends on your requirements. Just saying, those cases are increasing where the software already cares for keeping the important data safe.
ZFS is an industrial-scale technology. It's not a scooter you just hop on and ride. It's like a 747, with a cockpit full of levers and buttons and dials. It can do amazing things, but you have to know how to drive it.
I've run ZFS for a decade or more. With little or no tuning. If I dial my expectations back, it works great, much better than ext4 for my use case. But once I start trying to use deduplication, I need to spend thousands of dollars on RAM, or the filesystem buckles under the weight of deduplication.
My use case is storing backups of other systems, with rolling history. I tried the "hardlink trick" with ext4, the space consumption was out of control, because small changes to large files (log files, ZODB) caused duplication of the whole file. And managing the hard links took amazing amounts of time and disk I/O.
ZFS solved that problem for me. Just wish I could do deduplication without having to have 64GB of RAM. But, I take what I can get.
I think ZFS only needs lots of memory if you have certain features (notably, deduplication) enabled.
Meanwhile, you're still a lot better off, data-integrity-wise, with ZFS and non-ECC memory than you are with most other filesystems with non-ECC memory.
> I have an irrational fear of data loss, and ext4 has never failed me
You're a lot more likely to lose/corrupt data with ext4 than ZFS. Ext4 will happily corrupt data silently. The core conceit of ZFS is it doesn't trust the underlying hardware. ZFS even allows duplicated data on a single disk. You lose capacity but gain robustness.
It's not up for debate or interpretation that zfs relies on ram more than most other filesystems. They all use ram, because everything uses ram, and so that much is a baseline that cancels out between anything and anything else.
And zfs uses more and relies on it more than others, above that baseline.
My point was merely that your blanket statement doesn't really hold water, since any actual memory requirements by a filesystem would be implementation specific.
I will agree that ZFS should handle large pools once you clear the ~fixed minimum memory requirement.
I could very well have got it wrong but the way I understand it was that in a file system such as ext4, if a file is somehow corrupted in memory and written to disk you might not be able to recover from this and the file might become useless. However, with ZFS you would lose "all" your files in the file system. Again, I might have misunderstood.
> this can be avoided with ext4 by doing frequent backups
lol. Nothing is stopping you from doing manual backups using ZFS. One should never rely on just one backup anyways, if the data is critical. For me, snapshots are a great way to protect from "oh, I accidentally deleted this folder", which ext4 doesn't have. Yes, you can use replication to sync these snapshots somewhere else, but nothing is stopping you from continuing to do manual backups on a file level. It's just a file system, after all. So it doesn't really make sense as a justification as to why you are hesitant to use ZFS. In fact, it's one of the reasons I liked it so much: While you can do all these cool things with it, you don't have to. It doesn't pressure you to use these features. If you're ready, they're there, but until then, it's just a file system, and a very robust one at that.
That 1GB of RAM per 1TB of disk is a recommendation from the ZFS documentation, but let's remember the audience, and the feature sets enabled when we talk about it. In particular, that suggestion stemmed from running in a configuration with file deduplication enabled and heavy amounts of caching, which ZFS is made to take advantage of.
The high memory usage profile definitely isn't from an extra 4 bytes on a pointer, but from design and features of the filesystem.
All file systems have metadata which is good to
keep in memory. Building several 50+TB NAS boxes recently, it isn’t just ZFS either. And it isn’t some sort of linear performance penalties sometimes when you don’t have enough RAM. It can be kernel panics, exponential decay in performance, etc.
ZFS doesn't use that much RAM if you don't turn on deduplication, which is of less use in a system that serves large compressed video files than one that handles tons of read/write traffic as an office file server.
> 4GB RAM which is considered the minimum to properly use ZFS
This is mostly a myth with origins in the high-memory requirements of ZFS deduplication (which few should use, anyway). Of course, more memory allows for more caching, but that’s true of any filesystem on a “modern” OS.
My understanding is that even beyond deduplication, ZFS performance relies heavily on caching data in memory, and without that it's actually quite slow. As in, it'll work with even 1 GB of RAM, but becomes less and less useful.
Yes, it will use whatever memory you allow it to, but this is purely dynamic just as caching is in Linux. If you're not talking about the ARC, but about deduplication, of course it uses more memory - how would it not?
"IO performance is generally not great (lower than with "simpler" FS)"
This is sounding troll-ish, as that statement equates to: A filesystem that checksums all data and metadata and performs copy-on-write to protect the integrity of the on-disk state at all times is slower than a filesystem that does neither.
Well, of course.
But do those "simpler FS" have the ability to massively negate that effect by use of an SSD for the ZIL and L2ARC? There have been many articles showing higher throughput with large, slow 5400RPM drives combined with an SSD ZIL & L2ARC massively outperforming much faster enterprise drives.
"managing your pools isn't that easy once you get serious about it"
I'm fairly stunned by this statement, as I've yet to see an easier, more elegant solution for such a task. Before ZFS, I liked VXVM with VXFS, but I now consider it obsolete. Linux's LVM is downright painful in comparison. I've yet to play with btrfs, so I'll bite my tongue on what I've read so far on it.
The deep integration of volume and pool management directly with the filesystem, essentially making them one and the same, is simply beautiful. Having these things separate (md, LVM, fs) after years of using ZFS seems so archaic and awkward to me.
Disclosure: 100% of my ZFS experience has been on Solaris (10-11.1) and OpenSolaris/Nevada. I've not tried it on Linux, yet.
Yes, but ZFS is probably more memory hungry and somewhat an overkill just for compression alone. It's a pity simpler filesystems don't have transparent compression.
ZFS is fairly profligate with its RAM usage though. There are some good reasons for this, but ZFS will be dog slow compared to other file-systems in RAM-constrained situations, but behave competitively in non-RAM-constrained situations.
ZFS needs more memory than ext4, but reports for just how much memory ZFS needs is grossly over estimated. At least for desktop usage - file servers are a different matter and thus that's where those figures come from.
To use a practical example, I've ran ZFS + 5 virtual machines on 4GB RAM and not had any issues what-so-ever.
> For example, i am wondering why i would want to put compression into the file system.
A better question would be, why wouldn't you? It happens transparently and causes next to no additional overhead. But you can have that disabled in ZFS (or any other file system) if you really want to.
> Or deduplication.
Deduplication is disabled on ZFS by default. It's actually a pretty nichely used feature despite it's wider reporting.
> Or the possibility to put some data on SSD and other on HDD. If i have a server and user data, it should be up to the application to do this stuff. That should be more efficient, because the app actually knows how to handle the data correctly and efficiently. It would also be largely independent of the file storage.
I don't get your point here. ZFS doesn't behave any differently to ext in that regard. Unless you're talking about SSD cache disks, in which case that's something you have to explicitly set up.
> I've seen some cases where we had a storage system, pretending to do fancy stuff and fail on it. And debugging such things is a nightmare (talking about storage vendors here, though). But for example, a few years ago we had major performance problems because a ZFS mount was at about 90% of space. It was not easy to pinpoint that. After the fact it's clear, and nowadays googling it would probably give enough hints. But in the end I would very much like that my filesystem does not just slow down depending on how much filled it is. Or how much memory i have.
ZFS doesn't slow down if the storage pools are full; the problem you described there sounds more like fragmentation, and that affects all file systems. Also the performance of all file systems is also memory driven (and obviously storage access times). OS's cache files in RAM (this is why some memory reporting tools say Windows or Linux is using GB's RAM even when there's little or no open applications - because they don't exclude cached memory from used memory). This happens with ext4, ZFS, NTFS, xfs and even FAT32. Granted ZFS has a slightly different caching model to the Linux kernel's, but file caching is something that is free memory driven and applies to every file system. This is why file servers are usually speced with lots of RAM - even when running non-ZFS storage pools.
I appreciate that you said none of us are experts on file systems, but it sounds to me like you've based your judgement on a number of anecdotal assumptions; in that the problems you raised are either not part of ZFS's default config, or are limitations present in any and all file systems out there but you just happened upon them in ZFS.
> i am wondering if it is just too much. For example, nowadays you have a lot of those stateless redundant S3 compatible storage backends. Or use Cassandra, etc. Those already copy your data multiple times. Even if they run on ZFS, you don't gain much.
While that's true, you are now comparing Apples to oranges. But in any case, it's not best practice to be running a high performance database on top of ZFS (nor any of CoW file system). So in those instances ext4 or xfs would definitely be a better choice.
FYI, I also wouldn't recommend ZFS for small capacity / portable storage devices nor many real time appliances. But if file storage is your primary concern, then ZFS definitely has a number of advantages over ext4 and xfs which aren't over-complicated nor surplus toys (eg snapshots, CoW journalling, online scrubbing, checksums, datasets, etc).
reply