Hacker Read

war1025 · 2020-01-13 04:23:53+00:00

The thing I finally realized after being big on parallel algorithms and such since first learning about them in university is that, honestly, most things don't need to be cleverly parallel.

The majority of tasks are "embarrassingly parallel" as they say. Which is just to mean, there are a lot of standalone tasks that need to be done. For example, any web server is embarrassingly parallel. You don't need to be clever, you just need to have the processing power available to handle them.

reply

jamiesonbecker | karma 410 | avg karma 0.6 · | 2017-11-16 02:30:41

True, a lot of tasks are not embarrassingly parallel, as you point out, but most tasks that run on supercomputers are embarassingly parallel. That's why those supercomputers have value in that problem domain in the first place.

nurettin | karma 3395 | avg karma 1.38 · | 2020-04-09 14:07:51+00:00

Exactly, pretty much any parallel algorithm is embarrassingly parallel (gather or calculate a bunch of data, process it and merge it together) so I question the need for continuously needing in-process parallelism instead of solving it trivially.

dontlaugh | karma 1405 | avg karma 1.54 · | 2023-05-12 12:38:04

Embarrassingly parallel problems are relatively rare. Even GPU cores are quite big and they’re explicitly dedicated to mostly embarrassing parallelism.

The complexity is sadly mostly inherent in the problems being solved.

reply

nottorp | karma 9517 | avg karma 1.87 · | 2023-06-18 03:58:43

Hmm so the embarrassingly parallel task wasn't embarrassingly parallel because of an implementation detail, basically?

juliangregorian | karma 652 | avg karma 1.22 · | 2015-07-30 15:57:10

Can we please stop calling things "embarrassingly parallel". Just what the hell is embarrassing about highly parallelizable algorithms?

lorenzhs | karma 5079 | avg karma 2.97 · | 2015-07-30 16:09:18+00:00

The term "embarrassingly parallel" doesn't refer to algorithms, it refers to problems which can be computed in such a manner, stating that they're not very interesting for parallel algorithms research. But yeah, it's kind of a stupid term. Would you be okay with "pleasingly parallel"? ;)

BeetleB | karma 12033 | avg karma 2.34 · | 2017-01-25 06:06:28+00:00

I'm really surprised at the arguing in this thread.

Embarrassingly parallel algorithms are trivial in pretty much every language. That is why is embarrassing.

>And then you also need to implement mailboxes to receive new data

If you did, it is not embarrassingly parallel.

You know, a huge amount of parallel work is done in C/C++. It was the language used for multiple parallel algorithms courses in my university. Being able to do basic MPI type stuff in C does not make you a top programmer. And not being able to do an embarrassingly parallel algorithm in C would ring alarm bells.

reply

sgift | karma 7698 | avg karma 3.77 · | 2017-02-13 08:08:18+00:00

People down vote this, so I will provide a bit more explanation: Parallelism is 'easy' as long as you have natural separated problems. This is the reason embarrassingly problems are so great for many-core systems. Every single sub-problem can be done without having to synchronize with other parts of the system. Unfortunately, most problems aren't that way, but instead are interlocked on multiple axes:

- Minimal granularity: You can never run more cores than you have problems at the same time, but you also cannot just split problems as long as you want or the overhead of splitting will kill all performance gains you'd had (x+y in an extra thread/process is technically possible, in reality pretty stupid)

- Synchronization points: Most problems are not completely separate from each other, so you can do part of the problem in parallel but at some point you have to converge and do something with the combined result.

- Comparable task size: In an ideal world all of your tasks would take exactly as long as the other, because if one task takes far longer than the others you have to wait for it. So, the minimal run time of your parallel problem is bound by the time of your longest sub-problem. If one of the problems takes far longer than the others (and they depend on each other) you've lost.

The last one is more of a "meta" point: Complexity doesn't scale linear for dependent problems. That's another reason embarrassingly parallel problems are so nice. You cannot only run them all in parallel, you can think about each one as a completely separate problem. The moment you have dependencies you have to think about how to bring them all together and that gets hard very fast if you have many moving parts.

So, to sum it up: Many small things conspire against just scaling something which works great for two or four cores up to eight, 16, 32 or whatever. If you happen to have an embarrassingly parallel problem you're golden, but unfortunately only a small subset of interesting problems are that way and for other problems scaling is hard.

reply

jblow | karma 5690 | avg karma 6.41 · | 2021-03-13 20:32:54+00:00

If you do not understand that "embarrassingly parallel" is a technical term and that it's generally understood that most programs are not easily parallelizable, there is not a discussion we can have here.

rm999 | karma 5745 | avg karma 6.31 · | 2012-10-11 15:01:32+00:00

> even the most parallel programs tend to have bottlenecks

There's a big class of "embarrassingly parallel" tasks, which usually are only limited by how quickly you can move your data to where it needs to be.

reply

gpderetta | karma 12081 | avg karma 1.83 · | 2020-01-13 13:00:46+00:00

trivially parallel workload is trivial in pretty much any language though.

queensnake | karma 1239 | avg karma 2.51 · | 2008-10-03 01:36:15+00:00

That's called an 'embarrassingly parallel' problem, it's not the general case. I guess, just like 'big O', people have to take a class to appreciate the space of what it's possible to parallelize and how much it buys you.

npsimons | karma 4272 | avg karma 2.33 · | 2021-03-23 16:11:07

Not to offend, but yeah, if you've already partitioned the work to be done, then of course it's not hard.

I'm pretty sure that anyone who's come into even vague contact with parallel programming understands that the hardest part is communication and synchronization.

Your post feels incredibly pedantic.

reply

Aloha | karma 12950 | avg karma 2.76 · | 2013-12-21 18:45:27

Yes, and while you can solve many problems with parallelism, that doesn't mean you should.

chrisseaton | karma 36438 | avg karma 2.64 · | 2017-05-02 10:12:36

It doesn't matter how much of an expert you are if your algorithm fundamentally isn't parallel though. We just don't know how to parallelise some things.

chrisseaton | karma 36438 | avg karma 2.64 · | 2015-05-08 23:56:02+00:00

It's just we see so many parallel collections, parallel streaming, parallel map, parallel for-loop efforts, and they always rely on the problem being embarrassingly parallel in the first place!

isoprophlex | karma 9087 | avg karma 4.81 · | 2023-03-19 12:00:24

Parallelizing a turd across hundreds of machines doesn't mean you're doing something genius, it just means you now have a hundred machines that have to deal with your shit.

keeperofdakeys | karma 1563 | avg karma 1.77 · | 2012-05-15 14:42:18+00:00

Most programs don't use anywhere near enough cpu to make parallel algorithms pay off. So you may as well keep it simple, especially if you aren't hitting the limits of your current algorithms. Simplicity means less code, which means less bugs.

msandford | karma 6257 | avg karma 2.87 · | 2020-03-09 02:17:36

I think the idea is that your locks and threads are so close latency wise that you only need a few to get the job done. Versus trying to figure out how to parallelize things, some of which are inherently non-parallelizable. But sometimes you don't know until you try! Ain't life grand?