Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

To some extent, Postgres just follows the path that Oracle did to optimize performance. To see what Postgres needs to do next, just look at what Oracle had to do.

Postgres forks a listener process for each connection, which is relatively heavyweight. Oracle used to do that, but implemented a separate multi-threaded listener process for performance reasons. MySQL is multi-threaded from the beginning.



sort by: page size:

Postgres forks an OS process for each connection, which is relatively heavy weight. Oracle has a similar architecture to Postgres, and they solved it with a "multi-threaded listener". MySQL, in contrast, uses threads, which makes connections lighter weight.

It's fundamentally a question of how the connection listener communicates with the rest of the database, e.g., using shared memory or some other IPC mechanism, work queues, etc. Having too many connections results in problems with concurrent access and lock contention independent of how heavyweight the actual listening process is.


Oracle, in the past, was also a multi-process model on Linux. It looks like the multi-threaded model was an optional change at some point around Oracle 12.

To get around the inefficiencies of spawning many processes, I put pgbouncer in front of PostgreSQL.


They mention this in the article. But to sum up, each connection in PG is handled by its own OS process. Postgres behind the scenes is composed by multiple single-threaded applications.

This comes with the advantes for Pg developers (and us!) that they don't need to deal with tons of data races issues, but the trade off is that memory wise, a process takes way more memory than a thread.


Oracle doesn't handle TCP connections, the Oracle Net Listener (a separate process) is responsible for that. It supports connection pooling.

I'm no expert on how this scales compared to Postgres as it is hard to find benchmarks, thanks to Oracle licensing terms.


PostgreSQL uses processes not threads which tend to be heavier.

I am more familar with MySQL as well, but I heard that Postgres uses forks, wheresas MySQL uses threads. So for heavy load (many users) like with public websites, threads or thread-pools scale better. Can someone tune in and answer how you handle such a workload with Postgres? Maybe some cache or worker-pool in front of the DB?

Postgres has a limitation on number of open connections. This is because 1 connection = 1 process. MySQL uses threads, which scales better but has other downsides. The thread-based approach is also possible with Postgres using 3rd party connection pooling apps, e.g. pgbouncer.

MySQL can handle many more connections per instance than postgres. Often when you have a high-transaction database in prod with postgres, you need something like pgbouncer to handle connection pooling or you'll have a bad time.

I agree though that postgres has some fantastic features.

edit: I think they may have addressed some of this in a recent version? I'm basing my knowledge on postgres 12


From what I gather postgres isn't doing conventional IPC but instead it uses shared memory, which means the same mechanism threads use but with way higher complexity

That's one part where having multiple completely independent processes is a disadvantage to just having threads. I never benchmarked this, but I always heard that Postgres processes are expensive and you should always use a connection pool. But this is probably more noticeable on the DB server side, where each process uses memory. I never used it myself, but PG bouncer might be useful to you in this case.

The number of requests still feels slow, and it isn't clear to me from the blog post whether that is DB limited or CPU limited on the application servers. Even with writes on each access Postgres should still be bored at that kind of load.


It’s not higher level concurrency mechanics, it’s different, and changing between assumptions of single process concurrency to multi-threaded are generally non trivial for a long established project in c with a vibrant ecosystem (note the evaluation on tcl, perl, and python runtimes). And in general PostgreSQL is a user space application, albeit a database, and generally portable. WRT to metal, this is a few layers removed, but yeah most databases care about their concurrency model, trade offs on x clients, n transactions, y queries, z nodes. In this case the difference between processes and threads causes the concern re switching. In most oss projects this type of change if it happens might take some time. Per my other comment in this thread, pg has a vibrant ecocsystem community both oss, commercial and within org that this will also impact.

Not the author but PG connection scalability being blamed on per-process model is somewhat misunderstood. You can easily have many hundreds of thousands of processes, they are mostly equivalent to threads on Linux anyway (with some minor differences ofc).

Instead the poor scalability stems from the amount of resources allocated to each connection. If postgres was to switch to threads without changing this architecture it's likely scalability wouldn't be majorly improved.

By the same token a connection pooler that is fork based but only allocates a very small amount of resources per connection is still likely to a net win as it can do statement based pooling of a smaller number of postgres connections.


I'm more familiar with MySQL, but if PostgreSQL does not use threads, I'd eat my hat.

They're just too useful in helping to spread the load of all the IO required to return even one result.


Yes, I was just reading through that. The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work. This makes sense for PostgreSQL, but has some fundamental limitations (e.g., it requires duplicated copies of a hash table to parallelize a hash join).

Has someone played with threading model within Postgres.

I was reading the documents, looks like for every client request Postgres forks a new Process and uses shared memory model.

Using multi-processor threads/coroutines might be useful for scaling it further.


This is my one big problem with Postgres. Their heavy client/thread model prevents these sort of architecture where you may have a large number of connections. PGBouncer helps a lot, but it's not quite the same.

Postgres forks off a new process for each connection.

This introduces lot's of overhead across the board.

* Cross process communication is a lot more expensive (done via shared memory or as here via the file system)

* Switching between processes is a lot more expensive because each process has its own memory space, hence a switch flushes the TLB. Also more bookkeeping for the OS.

This is especially bad for a DB, which will usually spend most of its time waiting for IO, so can switch execution context all the time.

* Each process also has a distinct set of file descriptors, so those need to be cloned as well

* A dB needs lots of locks. Cross process locks are more expensive.

* ...

These things add up.


> ...postgres server process listens to one network port, the default port is 5432

> Whenever receiving a connection request from a client, it starts a backend process. (And then, the started backend process handles all queries issued by the connected client.)

> To achieve this [server] starts ("forks") a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go. [1]

So, Postgres is using process-per-connection model. Can some explain why this is? And why not something like thread-per-connection?

[1] https://www.postgresql.org/docs/9.6/tutorial-arch.html


You might be thinking of

> In POSTGRES they are run as subprocesses managed by the POSTMASTER. A last aspect of our design concerns the operating system process structure. Currently, POSTGRES runs as one process for each active user. This was done as an expedient to get a system operational as quickly as possible. We plan on converting POSTGRES to use lightweight processes available in the operating systems we are using. These include PRESTO for the Sequent Symmetry and threads in Version 4 of Sun/OS.

From: The implementation of POSTGRES - Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama

Hat tip to Thomas Munro. I think he pointed this quote out to me in the past.

next

Legal | privacy