Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Node does not multithread requests, but it surely can process many requests simultaneously if these requests are waiting for async operations: database, external APIs or other types of I/O usually. That's the very core idea of evented servers.

Within a single request, Node can async its dealing with outside services (databases, api's, etc), but it is still only processing one request at a time. There is no 'synchronized' keyword in javascript. ;-)

Check this out:

https://devcenter.heroku.com/articles/http-routing#heroku-he...

There is an interesting header in there: X-Heroku-Dynos-In-Use. From what I understand, this header is the number of dynos that a router is communicating with. For us, this is always around 2-3.

I suspect that the router is just a dumb nginx process sitting in front of my app. It is setup to communicate with 2-3 of my dyno's in a round robin fashion. If any one of those dyno's doesn't process the request fast enough, then requests start to back up. Once requests start to back up past 30s worth of execution, the router starts just killing those queued requests instead of just leaving them in a queue or sending the requests to another set of dyno's. Even worse is if you have a dyno that crashes (nodejs likes to crash at the first sign of an exception). I suspect that is why we see 2 or 3 in that header.

I think that part of the problem is that the routers don't just start talking to more dyno's if you have them available. So, it doesn't matter if you have 50, 200, 500 dyno's because the router is always only talking to a small subset of them. Even if you add more dyno's in the middle of heavy requests, you are still stuck with H12 for the existing dyno's. A full system restart is necessary then.



sort by: page size:

> Node handles one request at a time. It isn't multithreaded. It will receive a request, process that request and return a response. If another request comes in at the same time another request is in process, it is queued until the currently processing request is finished.

I'm missing something here. Node does not multithread requests, but it surely can process many requests simultaneously if these requests are waiting for async operations: database, external APIs or other types of I/O usually. That's the very core idea of evented servers.

So, my model is that i.e. if a node process receives 100 requests over the period of 1 second, and each request takes 3 seconds to process but most of that time is spent waiting for async, then the 100 responses will be sent back essentially 3 seconds after they arrived, no queuing to speak of.

From your description, routers do not send multiple requests to the same dyno even if dynos could handle them, and only have a limited amount of dynos they talk to. So queuing is happening in the routers, while dynos idle away waiting for async.

This would be complementary to the problem described by Rapgenius, and mean that the Heroku architecture does not play well with any type of server, neither evented (Node, yours) nor sequential (Rails, as shown by Rapgenius) nor presumably multithreaded or multiprocess (which effectively behaves like evented to the outside world). A huge mess indeed!


Node handles one request at a time. It isn't multithreaded. It will receive a request, process that request and return a response. If another request comes in at the same time another request is in process, it is queued until the currently processing request is finished. I googled around, here is a good explanation for you. http://howtonode.org/understanding-process-next-tick

The way my application worked is that we had 'long' running things like saving data to a database happening before we return a response to a client (in this case an iphone app). Sometimes mongo, the dyno, networking, phase of the moon, talking to the facebook api, etc... we would get 'slow' processing and it would take a few seconds for a response to make it to the client. As soon as this happens, on a heavily loaded system, the heroku router would get backed up (since it only routes to 2-3 dynos at a time) and would start throwing H12 errors.

So, what we did was rewrite the entire app to do minimal data processing in the web tier, send the response back to the client as quickly as possible. At the same time, we also send a rabbit queue message out with all the instructions in it to process the data 'offline' in a worker task. There is no spinup since these workers are running all the time. We even have several groups of workers depending on the message type so that we can segregate the work across multiple groups of dyno workers. This also allows us to easily scale to more than a 100 dynos to process messages. It works great. Rabbit is a godsend.

I say 'long' and 'slow' above because the longest amount of time we should be taking is a couple seconds at most. Unfortunately, the way that the heroku router is designed is fundamentally broken. As soon as you get a lot of 'slow' requests going to the same dyno's they start to stack up and the router just starts returning H12 errors. It doesn't matter how many dyno's you have because the router only talks to 2-3 dyno's at a time. We get H12's with 50, 100, 200, 300, etc dynos.

We also saw very strange behavior with the dyno's. We use nodetime to log how long things take and we'd see redis/mongo take only a few ms, but we'd have >15s just for the request to complete... somewhere things are slow and we can't figure out where. Until this whole mess came out, Heroku just pointed fingers at everyone else but themselves.

Oh by the way, as soon as you get around 200-300 dyno's deploys start error'ing out as well because heroku can't start up enough dyno's fast enough and that whole process times out too. You can't tell if a deploy worked or didn't. They didn't seem to care about that at all either.

Anyway, I could keep going... but once again, I'll repeat that I'm glad that the Rapgenius guys are calling Heroku out in public on this stuff. There is some big issues here that need to be addressed and the H12/router stuff is the big issue. I'm looking forward to see how they pull out of this one.


> Servers built with Node.js are typically super fast and can handle thousands, tens of thousands, even hundreds of thousands of concurrent connections — something very difficult to achieve with threaded servers. How can that be?

I've read similar sentiments in a bunch of places but whenever I push any real traffic at the nodejs version of our API server it falls over pretty quickly (even with ulimit increases). What exactly are people doing to handle tens or hundreds of thousands of connections?

Incidentally, if any nodejs gurus are interested we're looking for a remote freelancer to audit our nodejs stuff and help us with a deployment strategy (assuming we can get to the 100s of thousands of connections mark).


So let's assume thats $500 in dynos, that's approximately 14 dynos. You have 30k concurrent users per dyno using Node.JS?

It's not that I don't believe you, I just think you don't understand what you're saying.

(Actually, it is that I don't believe you)

That said, if this is if you're doing 500 requests/sec (that's very different than 500 concurrent users) per dyno, good for you. My main bottleneck was not so much CPU on the web machines (I hit memory limitations), but the database layer.


>Not any more than this is the case with Python, PHP, Rails, etc -- which also don't do multi-threaded (or don't do it well and not by default), and which on top of that don't have asynchronous capabilities (again, not by default) and are even less performant than a single Node app.

uWSGI makes running a threaded or multi-proc python webapp trivially easy (and as of Python v3.6 async comes as standard)...

>Which is why a simple Node running with its single process and single threaded execution can e.g. beat a Python server with two dozens of workers (e.g. gunicorn) in handling simultaneous connections (assuming Node code is properly async in the most part).

Node can beat a threaded python app for sheer volume of concurrent connections to clients, yes. But for a lot of traditional backend work (e.g. talking to a DB) async is no faster (indeed it's often slower) than a threaded approach.

Node (or async in general) is great for terminating inbound client connections; talking to local, in-memory caches or making backend calls to remote, non-local REST services.

For making local DB connections or doing any CPU work (e.g. parsing XML docs returned from an API) single-threaded async rarely yields better performance over threads/multi-proc. A good illustration of this is pgbouncer (async on the client facing end; threaded - i think? - on the db facing side).

Basically, all node really does is reduce the number of front end app servers you need to serve X incoming client connections. Just because node can handle a high concurrent connection count doesn't mean the rest of your backend services can. Regardless of whether connections originate from a single node instance or a large fleet of php/python/rails servers; you still need reverse proxies like pgbouncer/haproxy/twemproxy/squid to manage and shape those connections before they get to things like your DB or internal micro-service APIs.

Because node is single-threaded you also need to keep a very close eye on any CPU bound activity to avoid blocking all your connections. This is not always obvious and can crop up in unexpected ways (see: https://news.ycombinator.com/item?id=15477419)


> As node.js is not multi-threaded, we spin up 4 instances of node.js per server, 1 instance per CPU core. Thus, we cache in-memory 4 times per server.

And why not use a shared memory server?

> Operations started adding rules with 100,000s of domains, which caused a single set of rules to be about 10mb large ... If we weren’t using node.js, we could cut this bandwidth by 4 as there would only be one connection to the Redis cluster retrieving rule sets.

Maybe a 10mb json string isn't the best design decision.....

Or you know, you could have one node process connect to the Redis server, and have the local processes read from a shared memory server.. Or you could not store your rules as a 10mb friggin JSON string..

> When rule sets were 10mb JSON strings, each node.js process would need to JSON.parse() the string every 30 seconds. We found that this actually blocked the event loop quite drastically

Well then do it in another thread and save it to shared memory. Maybe, just maybe, JSON strings aren't the tool for the job here.


>I assume you can imagine some cpu bound tasks that would have node.js at 12 connections/sec or less.

Certainly. That would fall under "newb/clueless" design, though. Anyone who would throw a CPU-bound task into a primary Node server shouldn't be allowed near architectural design decisions. Whereas code in PHP written using best practices can easily end up with a server that can barely hit 100 queries per second.

Imagine, for instance, a situation where the client needs to do 50 requests to the server to render a page [1][2], and each query ends up with 20ms of latency on the PHP side; assuming you're running 8 threads (and the client makes 8 concurrent requests), a single page query could block your server for 125ms. A slow client or network might even block your PHP threads for longer. Node could crunch through ten thousand requests like that per second when running on four CPUs, meaning 125 of these bloated pages rendered per second, compared to ... 8 or less.

Even with decent client pages that can render with ONE server query, a couple dozen database lookups are par for the course, sometimes including an authentication lookup on a third-party OAUTH server. That could be 125ms all by itself, and in PHP your thread is blocked while the lookup happens. With the async model, once the query is off, the server is doing work on other requests until the query has returned data.

Many CPU-bound tasks like "convert an image" are already coded in Node to happen in the background, triggering a callback when they're done so you can then send the result to the client. And in Node it's absolutely trivial to offload any likely CPU-bound task to a microservice, where the NodeJS server just queries the microservice and waits for the result. Which you'd want to do, of course, if a task is CPU-bound, because you would want a faster server than V8 running it anyway. Go would be a likely candidate, and Go handles threading either through light/async threads or via actual threading, as necessary. It's quite awesome.

And if you really can't trust your developer to write code without extensive time-consuming calculations, then make them use Elixir or Erlang. It will use preemptive multitasking at the VM level if a thread takes up too much time, and even if they foolishly write a task that takes hundreds of milliseconds to complete, it will still task swap and serve other clients.

But arguing that pathologically bad code in Node can make it perform as badly as PHP does all the time isn't exactly a ringing endorsement for the language.

[1] In 2014 the average number of objects a web page requested was 112, and seemed to continue to be going up, though I'm assuming a lot of those are static resources and third party requests, like for analytics and ads. http://www.websiteoptimization.com/speed/tweak/average-web-p... I've personally seen pages with 70-80 requests against a PHP backend to render one page.

[2] And I wouldn't call a client page needing 50 requests a best practice, but I'm assuming that we're talking about the server side here, and that we are being forced to deal with an existing client that behaves that way. So call it "best practices on the server."


> you can run multiple processes on a single core system just fine — even if one of those processes hangs.

Yes, I can. Node.js doesn't do that.


> Does this mean that a web server written in node is running single-threaded?

Yes.

> But doesn't running on a single thread put an upper limit on the amount of work a server can handle?

Is not about how much work it can handle, it's about how much it can offload. Async servers can handle a much greater volume of I/O bounded tasks. So it can handle more connections. When the task is CPU bounded you can either create a thread (which does not really scale well) or offload it to some other servers that can scale horizontally (i.e doing micro services)

> And, if the solution is to spin up more servers with access to the same database, doesn't that mean that we are now having multiple threads accessing the database concurrently? Much like, say, Python Django?

Yes, to take advantage of all CPU cores you have to create more server instances. But why would they talk to the same database instance? It could be a replica or a shard. You can even have a pool of shards connections per server instance.


>Does this mean that a web server written in node is running single-threaded?

Yes -- node is by default a single threaded, single process server.

>But doesn't running on a single thread put an upper limit on the amount of work a server can handle?

Not any more than this is the case with Python, PHP, Rails, etc -- which also don't do multi-threaded (or don't do it well and not by default), and which on top of that don't have asynchronous capabilities (again, not by default) and are even less performant than a single Node app.

Which is why a simple Node running with its single process and single threaded execution can e.g. beat a Python server with two dozens of workers (e.g. gunicorn) in handling simultaneous connections (assuming Node code is properly async in the most part).

>And, if the solution is to spin up more servers with access to the same database, doesn't that mean that we are now having multiple threads accessing the database concurrently?

Databases take care of serialization of multiple queries for you -- and for more complex cases (with or without transactions for fuller control).


The article has a point, but poorly told. The problem is the following. Consider the following workload:

- 1 request to http://server/fiboslow

- 1000 requests to http://servir/fast-response

The point is that node will not process any of those 1000 requests until 1st one is finished, while any multithreaded/multiprocess server will do just fine, and process those 1000 requests in parallel.

<smug note> It's funny that the same people that criticized Java for its AWT EDT as poorly designed, are not praising the same thing in Node.js ;)


> Node.js can (when used correctly) handle loads of connections in a single process

Node uses an evented IO layer, that is completely orthogonal (and thus irrelevant) to streaming responses. You can stream responses with blocking IO, and you can buffer responses with evented IO.

> Phusion Passenger is clearly trying to evolve their model to achieve similar if not fully comparable results.

If you think that can happen, you're deluding yourself. Ruby+Rails's model means you need one worker (OS-level, be it a process or a thread does not matter) per connection. With "infinite" streaming responses this means each client ties up a worker forever. OS threads may be cheaper than os processes (when you need to load Ruby + Rails in your process) but that doesn't mean they're actually cheap when you need a thousand or two.


The whole target of his dissatisfaction is this quote on the Node home page: "Almost no function in Node directly performs I/O, so the process never blocks. Because nothing blocks, less-than-expert programmers are able to develop fast systems." That's bullshit. Evented programming isn't some magical fairy dust. If your request handler takes 500ms to run, you're not going to somehow serve more than two requests per second, node or no node. It's blocked on your request handling.

And all that stuff Apache does for you? Well, you get to have fun setting that up in front of the nodejs server. Your sysadmin will love you.

Basically if you're doing a lot of file/network IO that would normally block, node is great. You can do stuff while it's blocked, and callbacks are easier to pick up and handle than threads. But how often does that happen? Personally my Rails app spends about 10% of its time in the DB and the rest slowly generating views (and running GC, yay). AKA CPU-bound work. AKA stuff Node is just as slow at handling, with a silly deployment process to boot.


> NodeJS is suitable for apps that do plenty of short lived requests.

I'm confused. Isn't Node's event-loop style programming ideal for long-lived requests? I.e. ones that, under a synchronous i/o, block other requests?


Node suffers similar problems, although I would describe them differently to the author.

Essentially:

1. All Node's async IO is lumped together into the same threadpool.

2. There is no distinction between the nature of each async IO task.

3. Async CPU tasks (fs.stat hitting the fs cache, async multi-core crypto, async native addons) complete orders of magnitude faster than async disk tasks (SSD or HDD), and these can be orders of magnitude faster than async network tasks (dns requests to a broken dns server).

4. There are three basic async performance profiles, fast (CPU), slow (disk), very slow (dns), but Node has no concept of this.

5. This leads to the Convoy effect. Imagine what happens when you race trucks, cars, and F1... all on the same race track.

6. The threadpool has a default size of only 4 threads, on the assumption that this reflects the typical number of CPU cores (and reduces context switches).

7. 4 threads is a bad default because it leads to surprising behavior (4 slow dns requests to untrusted servers are enough to DoS the process).

8. 4 threads is a bad default because libuv's memory cost of 128 threads is cheap.

9. 4 threads is a bad default because it prevents the CPU scheduler from running async CPU tasks while slow disk and slower DNS tasks are running. Concurrent CPU tasks should rather be limited to the number of cores available, while concurrent disk and DNS tasks should be given more than the number of cores available (context switches are better amortized for these).

10. Because everything is conflated, hard concurrency limits can't be enforced on fast, slow or slower tasks. It's all or nothing.

There are efforts underway to support multiple threadpools in Node (a threadpool for fast tasks sized to the number of cores, a threadpool for slow tasks sized larger, and a threadpool for slower tasks also sized larger). The goal is to get to the point where we can have separate race tracks in Node, with F1, cars and trucks on separate race tracks, controlled and raced to their full potential:

https://github.com/libuv/libuv/pull/1726


> Your DNS example is a corner case

One of the many corner cases that will kill your application or open it to DoS (malicious or not).

I.e. you can DoS any nodejs application if

    * you can trigger it in making 4 DNS queries
    * and it does disk i/o (or uses any other core module using the thread pool)
> There are discussions around it

I've seen tickets opened since more than a year on this, without anything showing a willingness to improve that. Version 0.9 even removed the possibility to increase the number of thread (which they re-added in 0.10).

> such issues impact all frameworks

When you start using node, you don't expect that your bottleneck is a thread pool.

In non async frameworks you know you'll have this kind of problems, you can design around it, and a DNS query in some module can't block I/Os for the whole application.

> If you want to scale node, you would use multiple processes.

By "unscalable" I meant libraries using O(n) or O(n^2) algos, with 'n' the number of users or the size of your data, where it would have been easy to do it in O(log) or O(1).

> Promises not being in core is a good thing

Why ?

> Eventually many of those use cases will switch to using ES6 generators

It hope it will improve, but we are discussing the current state of nodejs


> And Node is perfect because it'll be working with Javascript on the clientside, and not dealing with the http protocol.

I'm confused. What?

On the results, node didn't budge from 30 to 1000 concurrent connections, while the Lua servers just plain crashed, despite being faster at first. Conclusion: Lua is perfect for servers (?).


My first post was talking about the C API. A sync C API can be made into an async node.js API using the thread pool. That's what node does internally for DNS resolution, fs and other APIs.

99% of Node applications do not need any load balancing. 1% are Twitter or something. And even a lot of the 1% would do it at the application layer using multiple VMs.

And I bet that most of the Heroku apps on Node that use multiple dynos really only need multiple dynos because Heroku's dyno isn't giving them realistic resources on the dynos, i.e. less than one small AWS or Linode or whatever.

Forever will not keep your Node application running. If you Node application crashes, forever will let it crash, and then run it again, and if it crashes again, it will run it again, and it will crash again.

There used to be basic gotchas in HTTP/Express that made it really easy to have an exception thrown that would take down a Node web server. I am not sure those are already there, but anyway I do what every Node expert advises against and include and uncaughtException handler in my Node web servers. That keeps them running. If I take it out, the effect is that they go down briefly, and forever restarts them. So honestly I see no benefit in most cases to using forever except to make it a little bit harder to figure out how launch your application.

And the thing with automatically restarting when a server reboots, honestly my servers very rarely reboot. If they are going to reboot then the VM (VPS) provider gives me warning and lets me control it. So for 99% of applications out there, an upstart or whatever to restart your Node application really isn't that important.

Contrary to popular belief, you do not need to install nginx in front of every Node application.

I will give you the SSL one, that could easily take a good UI developer 2 or 3 days to find the right instructions on Google.

next

Legal | privacy