Was there actually a barrier at exascale? I mean, was this like the sound barrier in flight where there is some discontinuity that requires some qualitatively different approaches? Or is the headline just a fancy way of saying, "look how big/fast it is!"
Back in the 2010 timeframe, there were articles about how an Exascale Supercomputers might be impossible. Would be interesting if someone could go back and assess where those predictions were wrong and where they held, and how the architecture changed to get around those true scaling limits.
Exascale is the primary target decided among everyone in the HPC community in the mid-late 2000s because FLOPS (floating point operations per second) is the unit used to benchmark, as there are various different variables like compiler, architecture, etc are very difficult to account for.
It's functionally the same as arguing that GB or TB are arbitrary units to represent storage.
Out of curiosity what makes this an exascale computer and not, say, an AWS or Azure datacenter? Just the fact that they are open about benchmarking #pflops?
Good question. It's because some extreme scale computer science and application work can only be valid with the high core counts/networking available on what is now just one single supercomputer in the U.S. - That machine is now booked, leaving many unable to complete research or have to wait until 2024. Some research can only be done with that scope of system, and people have been waiting years to test exascale software on this machine. I hope this answers the question.
What? Summit had been in development for years, and is ridiculously efficient for a machine with so much performance. It's a major milestone on the path to exascale computing.
I found a much better source regarding this in the form of a November 2018 white paper[1]. Interesting is that they say it will take 3.2-3.5 years of Blue Waters computing time.
See this for example. Different applications have different scales at which they reach similar problems. Exascale is decent upper bound for most of the fields. If you really dig in deep, the bound may be found at slightly lower value > 500 Pflops. But it's a good rule of thumb to consider 1 EFlop/s to be safe.
You're arguing at a particular granularity and even that doesn't hold uniformly.
A64FX is HBM2 at equivalent bandwidth to GPUs (with lower power). CCX is much finer granularity than an entire GPU, so not a direct comparison. L3 bandwidth on EPYC is multi-TB/s.
Fat GPU nodes can readily overload network interfaces so if bisection bandwidth is your concern, CPU nodes are good.
> HPC however, is specifically programmed to be bandwidth-bound instead.
This is wishful thinking. Lots of applications used to justify the US exascale program (and others) are latency-bound. Climate, weather, unsteady CFD, and much of mesoscale materials science and molecular dynamics are run at their latency limit in most scientific studies (one-off scaling studies notwithstanding). There's an unfortunate disconnect between what scientific computing actually needs versus what funders and the media portray.
There are different types of molecular dynamics that non-ex people run, and different modes of the same code may be more or less latency sensitive (e.g. two "standard" Gromacs benchmark cases I'd have to look up). While I'm unconvinced of the overall merits of exascale, it's clearly not true that you can generally do all the large-scale, high-resolution, or whatever, calculations in an embarrassingly parallel way. I haven't kept up with the public applications for CORAL et al, but there are definitely enough for petascale and above.
My bitter experience, particularly with "big data" people, is that they just won't be told by those with long relevant research computing experience in HPC and similar. It's hardly that HPC people don't know what systems can do, particularly if they live and breathe things like distributed linear algebra. I despaired, and largely gave up, when someone strode into a chair From Industry and assured us that the university didn't do Big Data, notwithstanding LHC, astronomy, sequencing, synchrotron work, etc., and was then going to build a Big Data Cluster from a few knackered PCs (and in a basement!) to better the HPC systems. Then I had MPI explained to me.
The article's comparison is invalid - in HPC computing speed is as much about the network infrastructure as it is the raw number of cores; simulations and graph-processing require interaction between the processing elements. The scores on the Top500 reflect this. In an embarrassingly-parallel application such as bitcoin mining, their peak performance is far higher.
In fact, the theoretical peak of Titan's 19k GPUs would be around 90 single-precision petaflops, comfortably higher than the estimated peak of 2 million reasonably recent x86 processors in Google's data centers (unlikely to be top-of-the-range number crunchers).
I've worked in HPC with a variety of actors for over a decade; I am in no doubt that classified machines exist with more power than Titan and Sequoia (#1, #2), which together make up most of the computing power in the top500 (it follows a power distribution, appropriately).
An exaflop is still a really, really big number though. I can hazard a few guesses at non-public machines in the US that would reach or beat Titan. Perhaps the US Govt commands an exaflop of power spread amongst several agencies, but I wouldn't place any bets on it.
That the bitcoin mining network has reached this scale is both astounding and depressing. That's an awful lot of computing power going to waste.
We'll probably end up using just MPI at Exascale. I was at Supercomputing 2011, I swear half the speakers might have just gone up on stage, stuck their fingers in their ears, and yelled "NAH NAH NAH CAN'T HEAR YOU, MPI IS ALL WE NEED NAH NAH NAH".
Skimming it the answer appears to be “because they had that cluster”. They say they only used 20 machines (each with 24 CPUs, so I guess these are fairly beefy)
Given the 224 GB (binary) size of the proof memory usage might be a problem, too.
reply