Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

>Found out a few years later that the drives were defective, and would often (but not always!) have buffer errors when burning discs close to capacity

there was a slew of software hacks in most cd burning suites that'd attempt to deal with that problem across the industry.

believe it or not : time dependent FIFO buffers are capital H Hard. Or at least they were back then.. I suspect the damage done by poorly constructed FIFO buffers now is just more hidden, not exactly lessened.



sort by: page size:

> In any case if you need reliable optical backups you should be testing the burn quality. Many don’t do this, but then how do you know the disc has sufficient signal margin?

I've got some original and new Verbatim M-DISCs that I burned full with a few hashes. Every now and then I leave them in the patio for a few weeks but they often get used as coasters in my office. When it gets time to test I rinse them in the sink with soap and water and dry them with the kitchen towel. I test them every few quarters. So far they've lasted a decade and haven't lost a bit. One of them has some slow spots but so far they're all perfectly fine. Good luck storing a hard drive through thunderstorms and in 100F days in the sun for a decade and reading the data off them. I'm not too worried about their durability. And since it's $10 or less for a replica I can make a few and still not spend a ton.

And sure, that 20TB drive is $280. For redundancy I'll need to buy at least two. But I've only got maybe a terabyte or two I actually care about, so per terabyte it's a terrible price especially after buying 2-3.

And then when syncing new stuff I'd need to move the drives around as I'm talking offline archives. Kind of a pain to go copy all the data at a relative's place hundreds of miles away, I'd need them to plug it in. Otherwise they can just stash the discs in a box for me when I mail them every quarter or so.


> edit: ah, you just have them in a mirror. In that case, the magnitude of the risk may be less as you only need to copy data once to rebuild. I’m not sure though as I’m not an expert on the topic.

Sadly, the problem still exists even for a simple mirror.

Though it can be mitigated by configuring a slow rebuild rate, so the new drive would have the time to perform the maintenance.


> The tape drives themselves are much more of an issue than the tapes

This is absolutely the pain point. Having to either keep multiple generations of drive around, and hope that they don't die in storage; or continually migrate your old tape to new tape is a huge pain in the arse.


> Burning disks was a ceremony where you carefully shutdown anything running and defragged your HDD to ensure nothing would interrupt the burning process.

"You think that's bad?" We had to literally watch the burn process. Walk away and it would fail.

Took a while to realise it was the vibrations of the wooden floor that would kill the burn process if you didn't sneak away very carefully..


> In fact, the reality is exactly backwards to what he has written here: with multiple terabytes of data on the array, a single drive failure results in a long, intensive rebuild process that can serve to hasten the failure of the remaining drives.

Are you talking about Unrecoverable Read Errors? [1]

[1] https://news.ycombinator.com/item?id=8306499


> What we discovered was that these problems only began to manifest themselves once the client reached max drive capacity.

I think it is common knowledge that flash drives perform badly when full - I'm a bit surprised that the author ("[...] despite spending decades as a chief technology officer") was surprised by this.

That's also why it's important that discard/fstrim is supported in the entire stack filesystem, operating system, hypervisor, storage controllers, storage device. I managed to fry a flash disk in a Linux software RAID because I neglected configuring md to pass discard commands to the underlying storage.


> Can you sustain 750 MB/s? It’s not so easy in practice. You throw away tape capacity if you get buffer underruns.

The drive doesn't pause while the buffer fills again? Considering the price of the drives, I find it very surprising that they don't do that.


Quote:

"If a disc is broken in half, you've still got 99 percent of the data still there," Gogolin says. "The media is quite elastic and the data is pretty much intact up to the cut line. There is, of course, a region that is destroyed near where the disc has been cut. But for most part, you didn't destroy the data, you just made it unreadable because you can't spin the disc."


> Although one might hope they do accelerated aging tests on their production

The experienced storage admins I know share that hope but don't trust vendors not to get it wrong. It's just too easy to miss a factor which turns out to matter.

> I was merely responding to the assertion that CD-Rs are entirely unsuited for archival purposes, and backed it up with my own experiences over a decade and a half.

Question: have you done bit-level checksum validation on that old media or is that just the ability to read without errors? There's a little bit of error correction built into the format but I wouldn't trust it for anything important.


> so it seems we're talking about 4 drives failing, not just 2.

Yes—I'm a bit unclear on what happened there, but that does seem to be the case.


> today we seem too willing to put up with being sold broken stuff.

i remember reading that when hard disks just came into the mass market they were so expensive that having some bad sectors was not such a big deal... and so hard disk would usually come with a sheet of paper listing the known broken sectors (detected at QA stage, i guess).

maybe someone older than me (i guess somebody in their 50ies or 60ies) could confirm that.


> Drive caches also used to not exist in the past. At that point, behavior was the same as it is on Linux today. It then regressed when drive caches became a thing.

You mean in the 1980s? Linux wasn’t used before this wasn’t a concern for sysadmins and DBAs. This concern has been raised for years - back in the PowerPC era the numbers were lower but you had the same arguments about whether Apple had made the right trade-offs, or Linux or Solaris, etc.

Given the extreme rarity of filesystem corruption being a problem these days, one might conclude that the engineers who made the assumption that batteries covered laptop users and anyone who cares about this will be using clustering / UPS were correct.


> as if all problems can be solved in software.

Isn't this one true to some extent though? We already have scrubbing to detect/fix bitrot. Why not make it (slightly) purposefully unbalanced, so you can avoid the simultaneous failure. I expect your time is worth more money than making one drive fail a few weeks early.


> Non-consumer drives solve the problem with back-up capacitance.

I’m pretty sure they used to be on consumer drives too. Then they got removed and all the review sites gave the manufacturer a free pass even though they’re selling products that are inadequate.

Disks have one job, save data. If they can’t do that reliably they’re defective IMO.


> I went through the extra effort of sourcing disks from as many different vendors as possible.

This is very good advice!

If you already built your array, consider advice: "replace a bad disk with a different brand, whenever possible".

Over time, you naturally migrate away from the bad vendors/models/batches. After following this practice, it seems ridiculous to me now to keep replacing the same bad disks with the same vendor+model.


>> Hard drives are much less reliable than tapes; hard drives aren't made to be stored on shelves;

I can back this up. For some reason, someone decided to back things up to hard drives and even though they are stored in anti-static bags in a 600lb fire safe, when I remove the drives from 1 year ago to re-write with that months backup, I've had up to 1/2 the drives not spin back up. Ugh.


>Drive failures are so rare nowadays that I will reinstall more often because of hardware changes.

2 HD failures in the last 4 years. Not rare for me :-(


> So, what's the problem?

Eventually, sourcing replacement media and drives.


> I was using just regular desktop grade disks so that happened every 18 months or so.

That seems unusually bad. Up until earlier this month my old raid array has been mostly desktop drives(just upgraded it to IronWolf NAS drives though). I did finally do the upgrade to new drives because one died, but that drive was manufactured in 2011, and the rest of the drives are from 2010, 2013, and 2015. To be fair I've not gone five years with out a failure, I think the 2015 drive was added about 3 years ago(being my actual desktop drive before that), but otherwise have had good luck. In fact, I think my most recent failure may have been accelerated by my server being put into a temporary case with poor air flow.

next

Legal | privacy