That was just from some quick benchmarks I did a few months back on some 10,000 particle N-body simulations. The performance boost will depend on the task, though. For the kinds of computations I did in grad school it would have been less effective since I was only looking at 3--5 objects, so there's just less parallelism to take advantage of.
in 2011 I was building 40 000 atom models on a 2001 SGI Fuel simulating them on a single 8 core node at 1ns/week -> 8000 atoms/core. They're about 7 times faster then me 8 years later.
So from a computation point, meh.
From a tools point this is cool. It isn't trivial to build or analyze a model of 40 000 atoms, never mind 1E9 atoms.
That is pretty fast. Obviously not like infiniband or what you use for message passing, but pretty considerable. It would be difficult to scale to thousands of nodes that need to communicate, but it is definitely good for most applications. When they mentioned the biostatistics group, I thought they might have been making a pass at scientific HPC type applications, but I guess this is more specialized. Prices are really reasonable, too.
He compares - with snark - the papers' authors' 160-core cluster (etc.), as reported by those authors, with a re-implementation running on his laptop. That is, the papers get 160 cores to his 1-2, and he still wins.
I never got into high performance scientific computing, but I believe the stuff that was done in my department at university was all MPI based and required very high interconnect speeds (like with Infiniband). It looks like your offering is much more standard, what's the thinking there, or am I just wrong/out of date?
I work at a University HPC centre and our new 72 core Icelake nodes have 512gb of RAM, and even our older 40 core Cascade Lake nodes had 256gb as standard, so I always scratch my head at who these libraries serve. That's pretty standard hardware for a HPC cluster, which is where people doing this stuff should be working anyway.
A supercomputer is a decade or two ahead of consumer-grade hardware of the time.
And it's not just a matter of processor speed. Parallel disk I/O, total memory size, cache, and a number of other tricks factor in.
Today's cluster supercomputers are mostly just ("just") racks and racks and racks of fairly standard-grade CPUs, with high-speed switches.
GPUs have introduced their own curve into things. They work well for highly-parallelisable problems. Unfortunately, the ones that seem to have caught many people's attention are ultimately somewhat boring.
In what sense? The ranking is still based on LINPACK scores, and it's intended to represent the fastest machines for scientific computing.
We can argue about whether modern simulation codes look like LINPACK (they don't) but neither Google's nor Amazon's cloud is competitive for that type of workload. Even if the clouds are physically bigger, nobody at Google or Amazon is running really big numerical jobs of the sort that DOE does. Many are running smaller scale HPC and ML jobs, but it's expensive, and for latency-bound physics simulations (most of what we run at LLNL, ORNL, etc.), cloud networks will not perform at scale.
The question of whether China's machines ever performed well for big multi-physics applications is up in the air, but they were certainly beating us on LINPACK. The Chinese set out to show that they're serious about HPC, and I think they accomplished at least that. I wouldn't be surprised if you saw this race go back and forth a few times in the coming years.
Yes, my calculations used far more time than DM, with the caveat that my jobs ran on CPUs while most of their training is on accelerators.
I believe we reported that Exacycle provided 700K high speed Xeon cores for over a year, and the actual number is far higher (I can't share it because you'd be able to make a reasonable estimate on how many computers Google has).
Seems like those guys knew nothing about HPCs. Why didn't they run LINPACK test? It's essential to measure any parallel computing system even of two cores. Also, any first grade CS student knows that the most significant part of an HPC is not the cores, but the network. You need to connect hosts using Infiniband or alike. Using regular ethernet is futile because of high latency, you will waste 90% of CPU cycles in data exchange/syncronization wait loops. I bet they could achieve a way better results on just 1/3 number of cores or even less.
Good question. Technically speaking just running HPL on a cluster is sufficient to get it ranked on top-500.
In reality, most HPC sites use high performance interconnects, such as infiniband or a proprietary high performance ethernet. They are also less likely to use virtualization, and every node looks 'bare metal'. The software stacks are very different, from everything between the distributed memory model, compilers, to the system schedulers and diagnostic software.
Nevertheless, there is a great deal of convergence happening between HPC and hyperscale data centers, particularly as hyperscale uses more machine learning, which has a similar flavor to HPC. Many believe that the FANG companies have exaflop capabilities already, but they just aren't well optimized for scientific workloads.
This is… amazing misinformed. I am assuming you’ve never done scientific simulation work if you think that. Physical simulations in many fields get better due to increases in compute, memory, and bandwidth faster than they do from algorithmic improvements (there are only so many algorithmic improvements one can make to a PDE solver). And certain problems simply can’t be simulated until a certain amount of compute (and more importantly memory) are available.
And while some of the time the entire cluster will be given to a single large scale project, most of the time it will be acting as a massive GPU farm for all sorts of research. A win-win for everyone.
Nope, computational science and engineering tends to use commodity server hardware (Xeon or similar) with or without GPUs. There may be many nodes or even many racks, but it's quite different from Z-series, which are a very poor pricepoint.
It's not remotely a fair comparison, as typical scientific computing requires high-performance communication to run with any modicum of efficiency. It's not just a matter of scale---the architecture of a machine like this differs dramatically from AWS of a similar size. [Never mind the national security concerns.]
Is that 150GB/s between elements that expect to run tightly coupled processes together? Maybe the bandwidth between chips is less important.
I mean, in a cluster you might have a bunch of nodes with 8x GPUs hanging off each, if this thing replaces a whole node rather than a single GPU, which I assume is the case, it is not really a useful comparison, right?
> In due course, the team found that the Sun workstations that they attached to the supercomputer could run their software just as quickly as the Cray itself.
reply