Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

We'll probably end up using just MPI at Exascale. I was at Supercomputing 2011, I swear half the speakers might have just gone up on stage, stuck their fingers in their ears, and yelled "NAH NAH NAH CAN'T HEAR YOU, MPI IS ALL WE NEED NAH NAH NAH".


sort by: page size:

Ugh, MPI. I guess it uses MPI behind the scenes but MPI is the wrong level of abstraction for most things, for today. This[0] convinced me of that.

[0] https://www.dursi.ca/post/hpc-is-dying-and-mpi-is-killing-it...


If only there was a standardized, robust, widely available cross-platform Message Passing Interface that could do this.

I don't grok why people outside of HPC seem to be shunning MPI. The shared-nothing memory model and asynchronous nature of MPI makes it very similar in spirit to a lot of the current web dev tech, AFAICT.


Well, we will.

Just not only MPI (at least not without significant extensions).


Too bad he didn't talk about GPGPU killing MPI too or not. I don't know enough to say.

I'm not familiar with the HPC space but I thought a lot of new work, at least in machine learning, was migrating to GPGPU instead of traditional CPUs. The compute per $ or per watt payoff is too large to ignore.


these things get run in phases. phase 1 they insist on novel architectures, rethinking of basic premises, required involvement by academics.

by the end of phase 3 when they actually procure machines its mostly 'just give me one of what you're already shipping, and it better run MPI well'

doe exascale was supposed to be fundamentally different. because by the time you got there all the incremental improvements in power and latency management weren't nearly enough anymore...my guess is that they'll just make big gpu clusters and call it a day

its probably not that much money in the scheme of things, but the sad (or good) depending on your perspective is that HPC got so far out of the mainstream that when corporations finally got around to worrying about scaling, they just did their own thing. so I guess they can thank darpa/doe for infiniband? not really?

and they're still running fortran/mpi


What do you use now instead of MPI?

MPI is the HPC horizontal scaling strategy, for when you need to go from 1 core to 500k.

"Running MPI programs on a cluster of commodity hardware" was perhaps a sufficiently novel idea to warrant a unique name back in the 90s, but these days it's more or less the standard for all but the most exotic and expensive of scientific computing... and even those expensive custom supercomputers tend to still be x86 processors running MPI code.

Basically you don't hear about them any more because it's all Beowulf clusters now.


I don't think it's because MPI is better, it's just most of the supercomputers I have access to require use of MPI like constructs.

If it doesn't support MPI or a functional threading system, then it will never be used for HPC.

That's an excellent essay. I agree with about every point. I worked MPI over ten years ago. It seemed like an assembler language for cluster computing: something for raw efficiency or to hold us over until we get a real language. I was playing with Cilk and other parallel tools then. Much better but little adoption or optimization.

The examples given in Spark and Chapel show how much better it can be. Such methods deserve much more investment. The bad thing is that the DOE labs are clinging so hard to their existing tools when they're in the best position to invest in new ones via huge grants they have. They built supercomputing before anyone heard of the cloud. Their wisdom combined with today's datacenter engineers could result in anything from great leaps in efficiency to something revolutionary. Yet, they act elitist and value backward compatibility over everything else. Starting to look like those mainframe programmers squeezing every ounce of productivity they can out of COBOL.

I think the change will have to come from outside. A group that works both sides with funding to do new projects. They either (a) improve MPI usage for new workloads or (b) get the alternatives up to MPI performance for HPC workloads. Then, they open the tool up to all kinds of new projects in both HPC and cloud-centered research (esp cloud compute clusters). Maybe the two sides will accidentally start sharing with each other when contributing to the same projects. Not much hope past that.


Do you say that MPI is obsolete? I thought most legacy codes in HPC based on MPI?

I'm helping to organize a workshop about alternatives to MPI in HPC. You can see a sample of past programs at [1].

But you're right: today, MPI is dominant. I suspect this will change, if only because HPC and data center computing is converging, and the (much larger) data center side of things is very much not based on MPI (e.g., see TensorFlow). Personally, I find it reassuring that many of the dominant solutions from the data center side have more in common with the upstart HPC solutions, than they do with the HPC status quo. I'd like to think that at some point we'll be able to bring around the rest of the HPC community too.

[1]: https://sourceryinstitute.github.io/PAW/


I wonder if it would be better to scale down cluster paradigms (MPI stuff), rather than trying to somehow scale up shared-memory paradigms.

Again I'm not sure if I agree or disagree with this. My hatred of MPI is only outweighed by the fact that I can use it... and my code works.

I think a large part of the inertia behind MPI is legacy code. Often the most complex part of HPC scientific codes is the parallel portion and the abstractions required to perform them (halo decomposition etc). I can't imagine there are too many grad students out there who are eager to re-write a scientific code in a new language that is unproven and requires developing a skill set that is not yet useful in industry (who in industry has ever heard of Chapel or Spark??). Not to mention that re-writing legacy codes means you're delaying from getting results. Its just a terrible situation to be in.


The lion in the room is that the DOE National Laboratories have a huge amount of code tied up in MPI and they continue to spend millions of dollars both on hardware and software to support this infrastructure. If you look at the top 500 list:

http://www.top500.org/lists/2014/11/

Four out the ten computers are owned by DOE. That's a pretty significant investment, so they're going to be reluctant to change over to a different system. And, to be clear, a different software setup could be used on these systems, but they were almost certainly purchased with the idea that their existing MPI codes would work well on them. Hell, MPICH was partially authored by Argonne:

http://www.mcs.anl.gov/project/mpich-high-performance-portab...

so they've a vested interest in seeing this community stay consistent.

Now, on the technical merits, is it possible to do better? Of course. That being said, part of the reason that DOE invested so heavily in this infrastructure is that they often solve physics based problems based on PDE formulations. Here, we're basically using either a finite element, finite difference, or finite volume based method and it turns out that there's quite a bit of experience writing these codes with MPI. Certainly, GPUs have made a big impact on things like finite difference codes, but you still have to distribute data for these problems across a cluster of computers because they require too much memory to store locally. Right now, this can be done in a moderately straight forward way with MPI. Well, more specifically, people end up using DOE libraries like PETSc or Trilinos to do this for them and they're based on MPI. It's not perfect, but it works and scales well. Thus far, I've not seen anything that improves upon this enough to convince these teams to abandon their MPI infrastructure.

Again, this is not to say that this setup is perfect. I also believe that this setup has caused a certain amount of stagnation (read huge amount) in the HPC community and that's bad. However, in order to convince DOE that there's something better than MPI, someone has to put together some scalable codes that vastly outperform (or are vastly easier to use, code, or maintain) the problems that they care about. Very specifically, these are PDE discretizations of continuum mechanics based problems using either finite different, finite element, or finite volume methods in 3D. The 1-D diffusion problem in the article is nice, but 3-D is a pain in the ass, everyone knows it, and you can not get even a casual glance shy of 3-D problems. That sucks and is not fair, but that's the reality of the community.

By the way, the oil industry basically mirrors the sentiment of DOE as well. They're huge consumers of the same technology and the same sort of problems. If someone is curious, check out reverse time migration or full wave inversion. There are billions of dollars tied up in these two problems and they have a huge amount of MPI code. If someone can solve these problems better using a new technology, there's a huge amount of money in it. So far, no one has done it because that's a huge investment and hard.


I'm a sysadmin for a university HPC facility. We basically have users split into two groups. Single node and Multinode. We even have separate clusters for them. We have one cluster with infiniband interconnect and the other with gigabit ethernet.

The multinode users love MPI because it screams over infiniband. These are people running genome mapping simulations or fluid mechanic simulation etc. This is actually a minority of our users. Most of our users are perfectly content with our dual socket hex core westmere cluster. They use various applications for their simulations, but most of them have difficulty scaling past 12 cores anyway.

So, in my experience, MPI is great because the hardware becomes the limiting factor, and the other implementations are a little more software bound. So if you have a couple hundred cores available for a single job, you are stuck with MPI. If you are sticking to a multithreaded implementation, the other languages you mention might be a good solution.


In HPC?

An important secret in HPC is that MPI is rarely required to achieve your objectives. In many ways, vendors just use MPI as a way to sell expensive systems. If you can find any way to make your system scale using threads on a single machine, or use non-latency-sensitive networking, do so.
next

Legal | privacy