It's 24 Nvidia DGX-1 servers, which contain 8 GPUs each. It's worth noting that Nvidia already have their own 124-node DGX-1 installation, which would have 992 GPUs.
This will give us servers for deep learning that can have 8 GPUs and a couple of NVMe disks on PCI 4.0 (32 GB/s). With very good inter-GPU I/O and access to NVMe, it will enable commodity servers that are competitive with Nvidia's DGX-1 or DGX2, that include SXM2 (Nvlink with 80GB/s between GPUs).
I'm glad they're open-sourcing this, but I have to say that making 8 GPUs work together is not that big of a deal. Companies like Cirrascale are making up to 16 GPUs scale linearly with a blade.
Actually the PCI express topology is configurable, which is one of the innovations. You can put all 8 gpus on a single CPU bus or have 4 on each CPU (but have to use QPI between them)
> Nvidia uses a new NVLink Switch System with 36 NVLink switches to tie together 256 GH200 Grace Hopper chips and 144 TB of shared memory into one cohesive unit that looks and acts like one massive GPU
We're pretty excited near-term for getting to sub-second / sub-100ms interactive time on real GB workloads. That's pretty normal in GPU land. More so, where this is pretty clearly going, is using multiGPU boxes like DGX2s that already have 2 TB/s memory bandwidth. Unlike multinode cpu systems, I'd expect better scaling b/c no need to leave the node.
With GPUs, the software progression is single gpu -> multi-gpu -> multinode multigpu. By far, the hardest step is single gpu. They're showing that.
This actually seems like just a very clever market segmentation solution since the GPU was already limited to 8x PCIe lanes (its a laptop GPU see https://www.notebookcheck.net/NVIDIA-GeForce-RTX-4060-Laptop...). The 'addition' of the M.2 SSD makes it a unique offering. Limiting it to only one drive is another way to keep the thermal envelope down. Kudos to the Asus design and product development folks.
It'll be interesting to see if this goes the Larrabee route (a flat array of CPUs which share everything) or the traditional GPU route (multiple levels of shared resources).
reply