Hacker Read

vonnik | karma 4900 | avg karma 4.01 · 2015-12-10 20:04:45+00:00

I'm glad they're open-sourcing this, but I have to say that making 8 GPUs work together is not that big of a deal. Companies like Cirrascale are making up to 16 GPUs scale linearly with a blade.

glxc | karma 10 | avg karma 0.19 · 2015-12-10 21:19:46

so if it's not the best, then it's not worth it?

spiantino | karma 283 | avg karma 4.16 · 2015-12-10 22:40:01

We have distributed implementations that allow us to use more than 8 GPUs to train a single network. These machines have 8 cards each, but we aren't limited to a single machine (though having more GPUs connected via PCI than via Ethernet helps)

mrlase | karma 150 | avg karma 1.69 · 2015-12-11 16:16:06+00:00

Any particular reason you don't have Infiniband on these for interconnects?

Having this chassis with Infiniband and a local disk would be a dream if the manufacturing cost is right as we scale up in our local datacenter where I'm at.

reply

vonnik | karma 4900 | avg karma 4.01 · 2015-12-11 17:00:34

That's interesting. But from what I understand, the problem isn't getting many GPUs to train a single network -- the problem is to get them to scale linearly; i.e. to get each additional GPU to accelerate the process as much as the last. Throwing boxes at neural net training is easy, but people run into serious plateaus scaling GPUs.

dgacmu | karma 5827 | avg karma 4.49 · 2015-12-10 22:43:52

It's irrelevant if you can't balance the cost, CPU, shared DRAM, and - possibly most importantly - PCIe and network connectivity in and out of it. A typical mid-range Intel Xeon has 40 PCIe lanes. A high-end GPU requires 16. You can do the math. Facebook has decided upon a balance that works for them and their workloads; it's quite likely that going to 16 GPUs in the chassis resulted in overall worse utilization because of PCIe bandwidth limits, or socket and QPI count, etc. What's a big deal is someone having gone and done a lot of work to calculate and experiment with finding a balanced design that's also cost-effective and easy to maintain.

wmf | karma 46152 | avg karma 2.46 · 2015-12-11 00:00:25+00:00

Let me guess, this "balanced design" is 4x8 on one socket and 4x8 on the other.

spiantino | karma 283 | avg karma 4.16 · 2015-12-11 04:39:57+00:00

Actually the PCI express topology is configurable, which is one of the innovations. You can put all 8 gpus on a single CPU bus or have 4 on each CPU (but have to use QPI between them)