Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

This is my one big problem with Postgres. Their heavy client/thread model prevents these sort of architecture where you may have a large number of connections. PGBouncer helps a lot, but it's not quite the same.


sort by: page size:

FWIW, if you’re using 75+ connections, it’s probably a good idea to add some kind of Postgres proxy in the middle like pgbouncer. Postgres doesn’t handle lots of connections well because each connection forks off its own full OS process, and the performance can degrade noticeably when you have too many connections to Postgres as a result.

That’s been my experience, at least.


Pgbouncer is your friend for postgres dB's with lots of client connections.

Larger pool size only helps if they're freed up quicker than they are used.


Shouldn’t you use a postgres connection pooler like pgbounce for such a scenario? AFAIK it can be switched to transaction mode, which allows to have way more connections from clients to pgbounce than available postgres connections.

I agree. PgBouncer (or something better) should be baked into Postgres.

Connection management in Postgres is definitely one of its sore spots. Each connection even an idle one consumes roughly 10 MB of overhead and introduces additional coordination overhead on queries. If you're running at any large scale production you want pgbouncer in place which can help manage the idle connections so they're not a hit to your system. Hopefully one day this improves directly in core.

Out of curiosity, if the problem of connections being expensive is solvable by PGBouncer-style connection multiplexing, why doesn't Postgres just do that by itself?

You may be right that it's easy to set up, but pgbouncer doesn't help with this problem most of the time. It's a problem that needs to be solved within postgres.

There are three pooling modes:

- Session pooling. Doesn't help with this issue since it doesn't reduce the total number of required connections.

- Transaction pooling / statement pooling. Breaks too many things to be usable. (eg. prepared statements...)

See the table at https://www.pgbouncer.org/features.html for what features cannot be used with transaction pooling.


MySQL can handle many more connections per instance than postgres. Often when you have a high-transaction database in prod with postgres, you need something like pgbouncer to handle connection pooling or you'll have a bad time.

I agree though that postgres has some fantastic features.

edit: I think they may have addressed some of this in a recent version? I'm basing my knowledge on postgres 12


PgBouncer just pools the connection, but each connection still needs its own process in PostgreSQL itself. Each query blocks the whole process. That limits the amount of queries that can run in parallel/concurrently to the amount of connections. Long-running queries can easily clog up everything. No tool can fix this, you need to be aware of it and consider it in your design.

The real problem is working with postgres’ “each connection is a process” model. Pgbouncer puts an evented pool in front of postgres to deal with this. Apps that are aggressive with DB will not benefit from having an evented pool in front. However, web apps (think rails) will have connections checked out even if they don’t need them. Pgbouncer helps here. If your app recycles DB connections when not in use, that leads to connection thrashing and higher latency, which pgbouncer can help with. But you’re right that at some point, the DB is the bottleneck. For most people, it’s the number of connections, because postgres makes a process for each connection.

Just curious, how does that happen? I've never found myself in a situation where PGBouncer can't handle the load but Postgres can.

I'm guessing it's a situation where you are running a large volume of very easy queries, for example primary key lookups with no joins?


We run a similar setup at Kloudless [1]. We use PgBouncer [2] for connection pooling, which connects to pgpool2 to load balance between our Postgres servers. We've noticed PgBouncer is more performant at handling thousands of simultaneous connections.

[1] https://kloudless.com [2] http://wiki.postgresql.org/wiki/PgBouncer


Indeed.

Also the server the DB is on is RAM constrained. And Postgres making a process per connection brings it right down versus using pgBouncer to pool connections.

It seems complicated but it's actually quite simple.


You would likely need to use something like PgBouncer, which sits in front of Postgres, to keep connections under control.

On-topic tangent: reminder or heads-up (depending on if you’ve already seen this) that Postgres is experimenting with thread-based instead of process-based connections (which would pretty much obviate the need for pgBouncer if it works out and becomes the connection model going forward).

HN discussion from a few months ago, with lots of commentary relevant to any pgBouncer scenarios: https://news.ycombinator.com/item?id=36393030


Are you using a connection pooler for postgres? Like pgbouncer or something.

I've been running pgBouncer in large production systems for years (~10k connections to pgbouncer per DB, and 200-500 active connections to postgres). We have so many connections because microservices breed like rabbits in spring once developers make the first one, but I could rant about that in a different post.

We use transaction level sharing. Practically, this means we occasionally see problems when some per-connection state "leaks" from one client to another when someone issues a SQL statement that affects global connection state, and it affects the query of a subsequent client inheriting that state. It's annoying to track down, but given the understanding of behavior, developers generally know how to limit their queries at this point. Some queries aren't appropriate for going through pgbouncer, like cursor based queries, so we just connect directly to the DB for the rare cases where this is needed.

Why so many connections? Say you make a Go based service, which launches one goroutine per request, and your API handlers talk to the DB - the way the sql.dB connection pooling works in Go is that it'll grow its own pool to be large enough to satisfy the working parallelism, and it doesn't yield them for a while. Similar things happen in Java, Scala, etc, and with dozens of services replicated across multiple failure domains, you get a lot of connections.

It's a great tool. It allows you to provision smaller databases and save cost, at the cost of some complexity.


I'm glad that pretty much the whole thing is 'use pgbouncer.'

This was my exact situation at a previous position. We needed up to 20k connections to a single pgsql master. Even the most monster postgres server falls over around ~600 connections (up to 1k depending on usage).

Using a pgbouncer-per-webserver we easily got to 20k.

I will say this was with 9.3 though, things may have changed on that front. Nowadays I use pgsql primarily for analytics, so there's only ever 20-30 connections tops, albeit doing quite heavy querying.


IMO the biggest reason folks use pgbouncer is not for load balancing (which it can do, -ish) but instead for connection pooling. Postgres connections are expensive for the db server (one process per connection not one thread) so if you have say thousands of web application pods you need to use pgbouncer or similar as a proxy to multiplex those thousands of connections down onto a more manageable number (~200). So no, not really.

(EDIT: if you don't know this already - the _establishment_ of connections is also super expensive. so another reason to pgbounce is to keep connections persistent if you have app servers that are constantly opening and closing conns, or burst open conns, or such like. Even if the total conns to pg doesnt go super high, the cost of constantly churning them can really hurt your db)

next

Legal | privacy