> OS-level multitasking won't be able to achieve the same level of concurrency
Do you have a source for this claim? I've seen it repeated many times, especially in the node.js community but I've yet to see any evidence to back it up. From what I've read, a synchronous threaded model can be just as fast as an event-based system [1].
To be fair, that link is almost 15 years old. Back then we had 32-bit address spaces, and that was the main limiting factor for threads (because you'd often allocate 2MB of address space for each stack). And we didn't have multi-core processors.
These days you could actually reasonably have 10k threads. In theory switching between threads shouldn't be much different performance-wise than switching between callbacks in an event loop (either way you take some cache misses), and the thread stack is probably more cache-friendly than scattering objects all over the heap (and certainly easier to use).
But now you have the problem that synchronization between threads (whether mutex locking or by lock-free algorithms) is complicated and surprisingly slow, specifically because you have to worry about all the ways simultaneous memory access might confuse the CPU or its caches. Whereas with single-threaded async each callback is effectively a transaction, without requiring any slow synchronization.
Of course if you're doing single-threaded async then you probably aren't fully utilizing even one core. You see, even if you think you are doing everything in a non-blocking way, that's not really the case all the way down the stack. If you try to access memory that is paged out, guess what? You are now blocked on disk I/O. And because you aren't using threads, the OS can't schedule any other work while you wait. And even if you're pretty sure you never touch memory that is paged out, you surely do sometimes touch memory that is not in the CPU cache, which also takes a while. If your CPU supports hyper-threading, it could be executing another thread in the meantime... but you don't have any other threads.
And then multicore. The previous paragraph was a lot more interesting before multicore, but now it's just obvious that you can't utilize your CPU with a single thread.
The heavy-duty high-scalability servers out there (like nginx and I'd guess HA Proxy) actually use both threads and async, but while this gets the best of both words, it also gets the worst: complicated synchronization and callback hell.
A big problem with one-thread-per-connection is that you open yourself to slowloris-type DoS attacks.[1] Normal load (and even extreme load) is fine, but a few malicious clients can use up all of your threads and take down your server.
This is touched upon in the slides you linked to. On slide 62 (SMTP server) a point says, "Server spends a lot of time waiting for the next command (like many milliseconds)." A malicious client could send bytes very slowly, using up a thread for a much longer period of time. If the client has an async architecture, it can open multiple slow connections with little overhead. The asymmetry in resource usage can be quite staggering.
You seem to be imagining a case where you only allocate a small fixed thread-pool and when it runs out you just stop and wait. I think the slide deck is advocating that you just keep allocating more threads.
I'm talking about hitting OS or resource limits. Let's say a server is configured to time-out requests after 2 minutes. A malicious client could do something like...
Every second:
1. Open 40 connections to the server.
2. For all open connections, send one byte.
Repeat indefinitely.
Steady state would be reached at 4,800 open connections. At 1 byte of actual data per second per connection, data plus TCP overhead would use around 200KB/s of bandwidth. The server would have to run 4,800 threads to handle this load. Depending on memory usage per thread, this could exhaust the server's RAM.
There are ways to mitigate this simple example attack, but the only way to defend against more sophisticated variants is to break the one-thread-per-connection relationship.
What i am truely missing is a good benchmark and comparisons between async vs sync. It seems true that everybody says that async is best but i don't see much evidence.
For example, how should 4800 threads exhaust the servers RAM when the thread stack size can be as small as 48kB. That's a round 200MB of memory.
I'm not saying that the threaded approach is better, but that almost everyone comes around with some theoretical statement but nobody seems to care to find hard evidence.
You are right to distrust these claims. The reality is that threads can be significantly faster than async -- async code has to do a lot of bookkeeping and that bookkeeping has overhead. OTOH, threads have their own kind of overhead that can also be bad.
The slide deck that bysin linked above is pretty good:
This is by Paul Tyma, who at the time worked on Google's Java infrastructure team with Josh Bloch and other people who know what they're doing. Apparently he found threads to be faster in a number of benchmarks.
Ultimately which is actually faster will always depend on your use case. Unfortunately this means that general benchmarks aren't all that useful; you need to benchmark your system. And you aren't going to write your whole system both ways in order to find out which is faster. So probably you should just choose the style you're more comfortable with.
Async is kind of like libertarianism: It works pretty well in some cases, pretty poorly in others, but it has a contingent of fans who think they've discovered some magic solution to all problems and if you disagree then you must just not understand and you need to be educated.
(Note: The code I've been writing lately is heavily async, FWIW.)
Why is 4800 threads a problem, and 4800 heap-allocated callbacks not a problem? Are you assuming a thread consumes significantly more memory than the state you'd need to allocate in the async case? This isn't necessarily true.
Do you have a source for this claim? I've seen it repeated many times, especially in the node.js community but I've yet to see any evidence to back it up. From what I've read, a synchronous threaded model can be just as fast as an event-based system [1].
[1] http://www.mailinator.com/tymaPaulMultithreaded.pdf
reply