Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

There's progress indeed, but 4x 16TB isn't cutting it. You'd need extra disks for redundancy, extra space for filesystem overhead (including data and metadata checksums), and a sizable amount of ECC memory.


sort by: page size:

Just curious: Did you consider XFS? It would avoid the 16TB limit with ext4.

Yev from Backblaze here -> That's a factor of time and power! The 16TB drives consume more power so that's part of the calculation when we swap from the 4TBs, but we are monitoring them b/c we certainly want to avoid cluster failures and have what we call "rolling migration" projects going on all the time where we perpetually migration hard drives and hardware based on durability projections, power balancing, physical capacity, etc...

I'd be fine with 16 and 1TB SSB, but only if both are upgradeable...

What order of magnitude? There are 60 drive chassis in 4U you can get[0][1], but even a 4TB -> 6TB drive + 15 more drives isn't exactly an order of magnitude.

And they have 2 10GiB ports, though I'm sure you can work with them and get more..

It's disingenuous to call this not "state of the art" when it's really quite close, and obviously is meeting a backblaze design goal.

[0]: http://www.newisys.com/Products/4600.shtml [1]: http://www.aicipc.com/ProductDetail.aspx?ref=RSC-4H


It's like a docking hub. Personally I see no need for anything other than larger disk (is it too much to ask for 4TB?)

> We'll be generous and say that 4TB can do 150MB/s. A single run through the data at maximum efficiency will cost you ~8 hours. Since we've restricted ourselves to a single box, we're also not going to be able to keep the data in memory for subsequent calculations/simulations/whatever.

I agree with your overall point, but of course the actual limits for a fairly cost effective single box are fairly high these days. E.g. from my preferred (UK) vendor, 4x PCIe based 960GB SSD's (each card is really 4 SSD's + controllers) adds up to an additional $7500. We often get 800MB/sec with a single card.

Your point still holds - the actual thresholds are just likely to be quite a bit higher than what the performance of a single 4TB hd might imply.

Tack on 40+ cores, and 1TB RAM, and the cost to purchase a server like that is somewhere around the $60k mark in the UK (without shopping around or building it yourself), but that adds up to "only" about $2k/month in leasing costs. That makes a single server solution viable for quite a lot larger jobs than people often think.


A 96 bay 4U chassis is readily available http://www.raidinc.com/products/object-storage/ability-4u-96... but that's still not a magnitude.

Re: 4Tb drives, I do the dollars-per-mb calculation before buying hard drives. The most recent time I included the enclosure cost and found that it was actually cheaper to go huge. Granted the enclosure was a Synology, but buying 16TB drives is the closest I’ve been so ‘solving’ storage in a long time. Formatting them and adding them to the array was brutal, and they are noisy, but it has been worth it.

I have a NAS that holds four drives. If I upgrade 8TB drives for 16TB drives, I would have no further use for the 8TB drives. I also don't have half-TB drives lying around.

Certainly anything is possible if you're a large enough (or strategic enough) customer, but I tend to doubt that Dropbox had a year or two head start on these models. For one thing, I wouldn't build out my entire storage system with one model from one vendor that hasn't been field proven yet.

With 4T drives, you can easily achieve raw storage of 2.9PB in a rack. The previous top of the line (10T drives) yield 7.4PB in a rack. 14T pushes that up to 10.4PB.

Higher densities are possible as well. The most I'm aware of is 90x 3.5" in 4U; available from Supermicro, Dell and others. The depth of the chassis can be problematic, and the way Supermicro designed it isn't super great in my opinion.

Sacrificing hot swap capability could potentially yield higher density (~100-ish drives), which I think is a reasonable tradeoff if you're deploying a ton of these hosts and you can let dead drives hang out for a while. I'm not aware of anyone who makes a chassis like that that can be bought off the shelf though.

At these drive sizes though, I would be leery of having that much storage attached to a single host from a fault domain perspective. I think the next step is either a highly dense 2U chassis (HP has something like this, but only 28 bays) or a two server 4U chassis (Dell has something like this).


There are Supermicro chassis' out there with 106x14TB drives in 4u, super deep racks.

1PB is nothing today.


I think you forget the redundancy. You probably need to double it to 256 drives per day.

You can actually create multiple volumes and then stripe them together so you should be able to get above 16TB with a bit of work.

The reason we focus on SSD exclusively for most of our services is that there are obvious performance benefits and as we look into the future the prevalence of spinning disk will continue to decrease over time.

That being said, if we are only providing SSD there are going to be customer use cases where it's not going to make sense, but that's also part of looking into the future.

So you should be able to create a larger striped volume to run through your batch processing and then depending on how you plan to access that data afterwards you can ship that off to an object store if it's going to be accessed infrequently.


I agree with the theme of the article. My reply was to parent comment which has a 6 TB working set.

It’s cheaper per TB to buy 16s rather than 2x 8 TB, or was when I did it a 2 weeks ago. The cost of more bays pushed me there more than anything.

Supermicro's got a top-loader 90-bay chassis that fits into 4U. Filling it with 16 TB drives will give you 1.44 petabytes (not pebibytes, because hard drive salespeople), minus whatever redundancy you'd like to build in.

I wouldn't consider that chassis to be an "off the shelf consumer part" though. Also comes in 45 and 60 bays. :)


sort of, the network part of that ends up being a huge bottleneck then too, with 16 drives at 5GB/s (max i've seen so far) each you've got 80GB/s you need for the network to each server. You start getting into the really expensive side of things speed wise.

Indeed - I remember hearing about someone setting up a 5-disk RAID set of floppy drives, but 16 seems a little excessive :)

Isn't 40TB just a paltry 40 hard drives (maybe 50 with redundancies)? A dedicated data center seems a bit overblown for that?
next

Legal | privacy