Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

On the contrary. The same principle applies to them. Spinning disks might not have NAND cells that wear out, but they have other parts that have manufacturing defects and can experience wear: bearings, actuators, heads, etc. If you have 100 identical cars that rolled off the assembly line in sequence, you would their failure date to be be clustered, rather than being evenly distributed.


view as:

On a tangent, wouldn't it make it safer if you didn't start a NAS with a bunch of identical drives, but you got them from different manufacturers and models?

They'll still fail, but hopefully they'd be less likely to do it at the same time.


Yes, that’s generally good advice.

this is news to me and taken away my excitement of having my first nas. it's very good to know though. i wonder about good-guy computer stores, mixing batches for naive customers like me, ordering 2 identical drives thinking it was best. it's crazy, it really defeats the purpose of raid 1 if both drives are likely to fail at the same time. i guess i should buy another different brand drive, and keep one of the current pair in a drawer as the future replacement.

To take some further wind out, with current disk sizes there’s a decent chance of a second disk failing while you’re still replacing the first. Reading all the data off a disk is a stressful operation, you see.

But that means you need RAIDZ2 or erasure coding to be reasonably safe, which takes you well outside what most of these turnkey systems can handle.


maybe i'll start a data-recovery savings account instead.. for whenever it happens

Backblaze works well enough.

SSDs in raids famously all failed at once. Spinning disks basically don't, outside of (typically) firmware faults. Out of 100000 spinning disks I've managed, I've never had them fail at the same time.

Mechanical things don't fail as evenly as digital things.


As someone that has set up a lot of NAS devices, we're talking thousands of disks over the years, I would not put my money on your statement.

There are a lot of simultaneous failure modes. HP had one in SMART firmware that killed disks at some fixed number of hours.

There are also temperature excursion events that can lead to arrays failing within 10s of hours of each other, not enough time to do a full rebuild of a single disk.

Seagate had really crappy 3/4 tb disks that loved to fail rapidly and nearly all at once.


Intel also had a famous bug / 'feature' where the SSD refuses to respond after a fixed number of writes.

If all your drives were Intel and part of the same array, they'd all fail at exactly the same time.

Reliable storage is hard.


So your examples of failures include a firmware example, which I said "outside of", and an example where the drives were abused?

And another poster os talking about SSDs?

Sigh


Legal | privacy