Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Was there actually a barrier at exascale? I mean, was this like the sound barrier in flight where there is some discontinuity that requires some qualitatively different approaches? Or is the headline just a fancy way of saying, "look how big/fast it is!"


sort by: page size:

Back in the 2010 timeframe, there were articles about how an Exascale Supercomputers might be impossible. Would be interesting if someone could go back and assess where those predictions were wrong and where they held, and how the architecture changed to get around those true scaling limits.

The Exascale barrier is an actual barrier in HPC/Distributed Systems.

It took 15-20 years to reach this point [0]

A lot of innovations in the GPU and distributed ML space were subsidized by this research.

Concurrent and Parallel Computing are VERY hard problems.

[0] - http://helper.ipam.ucla.edu/publications/nmetut/nmetut_19423...


Exascale is the primary target decided among everyone in the HPC community in the mid-late 2000s because FLOPS (floating point operations per second) is the unit used to benchmark, as there are various different variables like compiler, architecture, etc are very difficult to account for.

It's functionally the same as arguing that GB or TB are arbitrary units to represent storage.


Out of curiosity what makes this an exascale computer and not, say, an AWS or Azure datacenter? Just the fact that they are open about benchmarking #pflops?

In High Performance Computing the most recent Top1 machine ils counted in Exaflops, so there's quite some talk aubout exascale computing.

Good question. It's because some extreme scale computer science and application work can only be valid with the high core counts/networking available on what is now just one single supercomputer in the U.S. - That machine is now booked, leaving many unable to complete research or have to wait until 2024. Some research can only be done with that scope of system, and people have been waiting years to test exascale software on this machine. I hope this answers the question.

Okay article. I wish it had gone a bit in depth about the challenges though.

What’s prohibiting deployment and running of large scale distributed systems in Exascale scenarios that is not present in Petascale?

My understanding is that power/heat issues can be managed (see: Cloud vendors).

Are there any software/PL challenges?


What? Summit had been in development for years, and is ridiculously efficient for a machine with so much performance. It's a major milestone on the path to exascale computing.

I found a much better source regarding this in the form of a November 2018 white paper[1]. Interesting is that they say it will take 3.2-3.5 years of Blue Waters computing time.

[1]: https://www.exascale.org/bdec/sites/www.exascale.org.bdec/fi...


It's still a lot faster than your average physics department's 32-node Xeon cluster.

It's not that uncommon in HPC codes. I.e. one of the places where speed matters.

https://www.osti.gov/servlets/purl/1902810

See this for example. Different applications have different scales at which they reach similar problems. Exascale is decent upper bound for most of the fields. If you really dig in deep, the bound may be found at slightly lower value > 500 Pflops. But it's a good rule of thumb to consider 1 EFlop/s to be safe.

Also see this https://irp.fas.org/agency/dod/jason/exascale.pdf


You forgot this part, they used Spartan HPC.

"It consists of 72 nodes, each with four NVIDIA P100 graphics cards, which can provide a theoretical maximum of around 900 teraflops."

I don't know how much time and computational power they used, but it explains why the problem was not solved for a long time.


You're arguing at a particular granularity and even that doesn't hold uniformly.

A64FX is HBM2 at equivalent bandwidth to GPUs (with lower power). CCX is much finer granularity than an entire GPU, so not a direct comparison. L3 bandwidth on EPYC is multi-TB/s.

Fat GPU nodes can readily overload network interfaces so if bisection bandwidth is your concern, CPU nodes are good.

> HPC however, is specifically programmed to be bandwidth-bound instead.

This is wishful thinking. Lots of applications used to justify the US exascale program (and others) are latency-bound. Climate, weather, unsteady CFD, and much of mesoscale materials science and molecular dynamics are run at their latency limit in most scientific studies (one-off scaling studies notwithstanding). There's an unfortunate disconnect between what scientific computing actually needs versus what funders and the media portray.


There are different types of molecular dynamics that non-ex people run, and different modes of the same code may be more or less latency sensitive (e.g. two "standard" Gromacs benchmark cases I'd have to look up). While I'm unconvinced of the overall merits of exascale, it's clearly not true that you can generally do all the large-scale, high-resolution, or whatever, calculations in an embarrassingly parallel way. I haven't kept up with the public applications for CORAL et al, but there are definitely enough for petascale and above.

My bitter experience, particularly with "big data" people, is that they just won't be told by those with long relevant research computing experience in HPC and similar. It's hardly that HPC people don't know what systems can do, particularly if they live and breathe things like distributed linear algebra. I despaired, and largely gave up, when someone strode into a chair From Industry and assured us that the university didn't do Big Data, notwithstanding LHC, astronomy, sequencing, synchrotron work, etc., and was then going to build a Big Data Cluster from a few knackered PCs (and in a basement!) to better the HPC systems. Then I had MPI explained to me.


The article's comparison is invalid - in HPC computing speed is as much about the network infrastructure as it is the raw number of cores; simulations and graph-processing require interaction between the processing elements. The scores on the Top500 reflect this. In an embarrassingly-parallel application such as bitcoin mining, their peak performance is far higher.

In fact, the theoretical peak of Titan's 19k GPUs would be around 90 single-precision petaflops, comfortably higher than the estimated peak of 2 million reasonably recent x86 processors in Google's data centers (unlikely to be top-of-the-range number crunchers).

I've worked in HPC with a variety of actors for over a decade; I am in no doubt that classified machines exist with more power than Titan and Sequoia (#1, #2), which together make up most of the computing power in the top500 (it follows a power distribution, appropriately).

An exaflop is still a really, really big number though. I can hazard a few guesses at non-public machines in the US that would reach or beat Titan. Perhaps the US Govt commands an exaflop of power spread amongst several agencies, but I wouldn't place any bets on it.

That the bitcoin mining network has reached this scale is both astounding and depressing. That's an awful lot of computing power going to waste.


We'll probably end up using just MPI at Exascale. I was at Supercomputing 2011, I swear half the speakers might have just gone up on stage, stuck their fingers in their ears, and yelled "NAH NAH NAH CAN'T HEAR YOU, MPI IS ALL WE NEED NAH NAH NAH".

Anyone here use it for HPC? Very difficult to know if it will be big in the future or if it is a failed attempt.

Paper is at https://arxiv.org/abs/1910.03740.

Skimming it the answer appears to be “because they had that cluster”. They say they only used 20 machines (each with 24 CPUs, so I guess these are fairly beefy)

Given the 224 GB (binary) size of the proof memory usage might be a problem, too.

next

Legal | privacy