Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The experiment is about Java app, but the tweaks are at the O/S level. Does it mean any app (Java/not, Loom/not) can achieve target given correct tweak?

Also, why are these not default for the O/S? What are we compromising by setting those values?



view as:

There's always trade-offs. It would be very rare for any server to reach even 100K concurrent connections, let alone 5M. Optimising for that would be optimising for the 0.000001% case at the expense of the common case.

Some back of the envelope maths: https://www.wolframalpha.com/input?i=100+Gbps+%2F+5+million

If the server had a 100 Gbps Ethernet NIC, this would leave just 20 kbps for each TCP connection.

I could imagine some IoT scenarios where this might be a useful thing, but outside of that? I doubt there's anyone that wants 20 kbps throughput in this day and age...

It's a good stress test however to squeeze out inefficiencies, super-linear scaling issues, etc...


Open, idle websockets can be a use case for a large amount of tcp connections with a small data footprint.

Also IMAP has this unfortunate property.

20kbps should be sufficient for things like chat apps if you have the CPU power to actually process chat messages like that. Modern apps also require attachments and those will require more bandwidth, but for the core messaging infrastructure without backfilling a message history I think 20kbps should be sufficient. Chat apps are bursty, after all, leaving you with more than just the average connection speed in practice.

I have a memory of some chat site, maybe discord, sending attachments to a different server, thus exchanging the bandwidth problem with extra system complexity

That's how I'd solve the problem. The added complexity isn't even that high, give the application an endpoint to push an attachment into a distributed object store of your choice, submit a message with a reference to the object and persist it the moment the chat message was sent. This could be done with mere bytes for the message itself and some very dumb anycast-to-s3 services in different data centers.

I'm sure I'm skipping over tons of complexity here (HTTP keepalives binding clients to a single attachment host for example) because I'm no chat app developer, but the theoretical complexity is still relatively low.


No, it doesn't. The reason the tweaks are at the OS level is because, apparently, Loom-enabled JVMs already scale up to that level without needing any tuning. But if you try that in C++ you're going to die very quickly.

There have been userspace thread libraries for c++ for decades.

Sure, I wrote some myself. Q is what libraries you can use on top of the userspace thread package that are aware of the userspace threads rather than just using OS APIs and thus eg blocking the current OS thread.

There are .so interposition tricks that can be used for that.

I think Pth used to do that for example.


Could you elaborate?

For example: https://www.gnu.org/software/pth/pth-manual.html#system_call...

See the hard system call wrapping. This is just one option.


With C++ co-routines and a runtime like HPX, not really.

However there are other reasons why a C++ applications connected to the internet might indeed die faster than a Java one.


You need both your operating system and your application environment need to be up to the task. I'd expect most operating systems to be up to the task; although it might need settings set. Some of the settings are things that are statically allocated in non-swappable memory and you don't want to waste memory on being able to to have 5M sockets open if you never go over 10k. Often you'll want to reduce socket buffers from defaults, which will reduce throughput per socket, but target throughput per socket is likely low or you wouldn't want to cram so many connections per client. You may need to increase the size of the connection table and the hash used for it as well; again, it wastes non-swappable ram to have it too big if you won't use it.

For application level, it's going to depend on how you handle concurrency. This post is interesting, because it's a benchmark of a different way to do it in Java. You could probably do 5M connections in regular Java through some explicit event loop structure; but with the Loom preview, you can do it connection per Thread. You would be unlikely to do it with connection per Thread without Loom, since Linux threads are very unlikely to scale so high (but I'd be happy to read a report showing 5M Linux threads)


Legal | privacy