Hacker Read

manigandham · 2017-02-15 11:34:42

Scaling (horizontal + vertical) and HA are the biggest weaknesses of Postgres. It's modern in features and flexibility but still struggles with core infrastructure issues and performance.

endisneigh | karma 11792 | avg karma 2.65 · | 2022-08-16 21:26:11

Postgres doesn't scale well across machines. There are things like Citus that help, but still.

manigandham | karma 24990 | avg karma 2.69 · | 2016-03-26 02:19:30+00:00

It doesnt scale vertically either. Postgres is single-threaded meaning it can't make use of multiple CPU cores on the same machine. There have been some slow improvements to this and 9.6 seems to hint at some parallel aggregation changes but overall Postgres is strong in features but weak in scaling (in any direction).

jimbokun | karma 17789 | avg karma 3.01 · | 2024-03-29 16:21:34

Horizontal scaling is still challenging in Postgres. That’s where “NoSQL” systems still have their place. Or one of the SQL systems layered on to a NoSQL architecture, with the trade offs that entails.

manigandham | karma 24990 | avg karma 2.69 · | 2017-01-18 04:24:21+00:00

Postgres neither scales vertically or horizontally.

v9.6 finally brought some parallel scans but is still limited in scope and far behind the commercial databases. Same with replication and failover, although there are decent 3rd party extensions to get it working.

reply

tmountain | karma 3116 | avg karma 4.76 · | 2023-12-15 16:57:59

It’s Postgres running on AWS, so it should scale as well as one would expect with that combination. What issues are you referring to specifically?

manigandham | karma 24990 | avg karma 2.69 · | 2017-06-23 05:37:05

That postgres is behind in scaling (multi-core, replication, sharding, multi-master) against all modern relational databases is verifiable truth. Your judgement is clouded by your proximity to the project. Here's just 1 prior thread between us: https://news.ycombinator.com/item?id=13841496

And if you want more feedback:

-> Add (and enable) multi-core queries.

-> Build-in pgbouncer (or better connection management).

-> Enable 1-line command to setup basic replication (whole node or per db level) without having to edit anything else. Look at redis or memsql for inspiration.

-> Enable 1-line command to easily failover, remove hotstandby requirements.

-> Add DDL support to logical replication because I guarantee this will be the #1 source of problems for anyone that tries to use this.

-> Fix the poor defaults in all the configs.

-> Document the existence of pg_conftool and make it a first-class tool that can also edit pg_hba.conf automatically so I no longer have to deal with any config files. Store any other settings in the database cluster itself. Inspiration from sql server on linux or cockroachdb here.

IT/admin UX at this point is more important than the diminishing core upgrades. Postgres is good enough internally but has sorely fallen behind on usability with obsolete defaults, convoluted configs and way too many 3rd party tools needed for basic scaling. Saying use X tool is effectively the same as just use Y database that already does it. MySQL, MariaDB, even SQL Server are not standing still.

Solid in-cluster and cross-dc replication is the reason we stuck with memsql (+ built-in columnstore), not because it was possible but because it actually works easily and reliably with their fantastic memsql ops program. I've shared all of this feedback online over the years (along with many others) so the lack of understanding is really more a reflection of the postgres team and poor priority planning.

reply

Zanfa | karma 804 | avg karma 3.36 · | 2023-04-25 02:40:53

In my experience more often than not, Postgres performance problems aren't really caused by the database, but either badly designed schemas or queries. For a lot of developers, the thinking goes that 10s of millions of rows sounds like a lot like big data, so they must start building microservices, distributed systems, use K/V stores and horizontally scale a la Google, whereas their entire dataset could actually fit in RAM of a single server and they could avoid the majority of the complexity if they just scaled vertically, which is usually much, much easier.

franzwong | karma 157 | avg karma 0.82 · | 2016-07-18 15:13:50

I tried to scale Postgres few months ago but I found that it is quite tedious.

ThinkBeat | karma 3644 | avg karma 2.66 · | 2024-03-25 13:37:11

How far can one get these days with vertical scaling of Postgres? I dont know how well the engine could make use of the extra resources. If it scales well vertically, I expect it to cover 80% - 95% of use cases for people using Postgres. Probably with less complexity, and less overhead than scaling horziontally

e12e | karma 13838 | avg karma 1.49 · | 2016-05-10 00:32:49

I'm curious what you mean by Postgres not having a "scale-up" strategy. Are you saying that eg MS SQL works better with a few TBs of ram and 128 cores than Postgres does?

eljulio | karma 3 | avg karma 1.5 · | 2016-07-18 09:02:02

One missing point Is the scalability. Postgres is awesome until you need to setup a db cluster.

Thaxll | karma 4756 | avg karma 1.87 · | 2017-04-13 18:59:50

Well PostgreSQL is a single master for writes so it doesn't scale well to say the least.

manigandham | karma 24990 | avg karma 2.69 · | 2019-01-16 23:31:39+00:00

The main thing to compete on is distributed scale-out and high-availability which PostgreSQL does not have a good answer for.

yawaramin | karma 3520 | avg karma 1.85 · | 2023-12-16 21:06:07

But you don't need to horizontally scale if you use a beefy Postgres instance. Horizontal scaling is a problem created by using an inefficient DB in the first place.

gshulegaard | karma 1967 | avg karma 3.59 · | 2021-05-11 03:59:38+00:00

I am not sure I agree with the general idea that Postgres can't or even--albeit a bit less strongly--that it is hard to scale. Even in 2008 people were running petabyte-scale warehouses using Postgres:

https://www.toolbox.com/tech/data-management/blogs/2-petabyt...

Since 2008 improvements in parallel query execution (and numerous other improvements) in the core project plus the availability of forks/extensions which abstract and/or modify various bits for improving scalability (see Citus and Timescale) it's never been easier to scale Postgres to some truly staggering heights.

While I wouldn't want to speak in absolutes, there are very few applications where I think Postgres wouldn't be a viable choice as a data warehouse.

Emphasis on warehouse as I wouldn't want to suggest Postgres as an ideal candidate to be a data lake. The difference between them for me being whether or not the data is structured/processed. Similar in definition to this article:

https://medium.com/@distillerytech/data-warehouse-vs-data-la...

Personally, I have experience scaling core PostgreSQL (9.4) to handle ingestion of monitoring data for web servers to the tune of 2-3 terabytes a day. Not the grandest of scales, but enough to have seen a few bumps along the way...and, for what it's worth, I think it is surprisingly easy to scale.

I wouldn't want to sign up to scale Postgres to handle exabyte data loads, but single digit petabytes? Sure.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

And at petabyte-scale, I personally think it qualifies as a data warehouse.

reply

orthecreedence | karma 6560 | avg karma 3.18 · | 2023-04-12 11:04:45

Have you used postgres for a work queue before? I have, and it did not scale even for smaller workloads. It's possible there are new primitives that make it more efficient, but back in 2014/2015 it didn't work.

coderzach | karma 738 | avg karma 3.32 · | 2016-02-05 20:36:46+00:00

Postgres isn't designed from the ground up for high availability or scalability. Most of the in-built functionality for setting up HA wasn't introduced until 9.0, and the process of setting up HA postgres with automatic failover requires third-party tools. As for scalability, postgres can only scale vertically.

If you need a solution that can scale writes to more than than one node, or a solution that has first party support for HA with automatic failover, you shouldn't be using postgres.

As an aside, I find the dogmatic "just use postgres for everything" choir just as bad as the marketing BS associated with noSQL databases.

reply

tomc1985 | karma 11141 | avg karma 2.76 · | 2021-12-17 22:31:45

Dude you are seriously underestimating postgres' versatility. It does so many different things, and well!

eivanov89 | karma 44 | avg karma 1.83 · | 2024-03-19 15:23:39

It's really interesting, that Postgres is extremely efficient, but same time can't scale vertically well enough, when there is synchronous replication. Unlike its distributed counterparts.