While there is some truth in this, I think you are missing the context.
NoSQL first became popular in 2008/09, when web traffic was exploding. In those days 500GB disks were the largest available and were very expensive in server form. CPUs were less powerful, and RAM was much smaller.
All this meant that many sites really were running into the limits a single server database. The most common solution back then was master slave MySQL replication, which had a whole set of problems of its own. Don't forget Postgres replication was pretty rudimentary, and back then MySQL really had a performance edge if you could compromise things like transactional integrity(!) - which many did.
Things like Hadoop solved the reliability issues with distributed MySQL, MongoDB (and CouchDB) tackled the horrible developer ergonomics and Redis tackled performance.
In that context they all made a lot of sense.
Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.
NoSQL was trying out new ideas in data storage. It was exciting to try out new or re-imagined core concepts, and some of those young projects had teething issues. But several are still around and remain popular, but they're popular for certain niches they excel at (and those niches were largely discovered through trial and error).
In the SQL-sphere a lot of people skipped Postgres because MySQL had, at the time, the momentum in the free/cheap relational database space. Between then and now Progres has grown more elegantly than MySQL, and people are rightfully looking to it.
I think the problem is that NoSQL has been the cool new thing, which causes people to use it without really understanding why it is better/worse than other solutions.
Tools like MongoDB can do things that are extremely difficult or impossible with MySQL/PostgreSQL and they are a great choice for those situations. Using it simply out of laziness or misunderstanding, though, is probably going to create problems later.
> Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.
Yeah, pg will beat the pants off of pretty much any NOSQL DB on a single node, but what happens when you need master-master replication of sharded data in each datacenter as well as between datacenters?
Sure, denormalize and enforce data integrity at the application layer, shard the data, use queues for data updates to ensure every data-center gets all the data (eventually), and so on.
Now you're essentially treating the RDBMS as an eventually consistent NOSQL data store, horizontal scaling within a DC is still going to be annoying once you can't scale the shards vertically anymore, interruptions of network connectivity (or just increased latency) between DCs create huge headaches, and the (right) NOSQL DBs will beat the pants off of it for an equivalent hardware or hosting budget.
That said, NOSQL datastores solve problems that your startup will only encounter once you achieve product-market fit and enter a hypergrowth phase. Your MVP does not need an eventually-consistent distributed NOSQL backend, dammit.
Heck, your MVP may not even need PostGres or MySQL either. Just use SQLite, back up your server, and redeploy on pg only when (if) you start getting some traction.
A while ago I'd say! Last time I heard anyone seriously excited about NoSQL was several years ago. It still has its place, but it seems like Postgres is the hype these days.
The thing about the NoSQL trend is that it's still very valid in some areas. Even big and slow companies have some project or supplier that uses a software with a NoSQL service in it.
You just rarely see those bundled redis or elasticsearch servers that are crucial for some app to do its session management and search engine.
The most disturbing part of the NoSQL trend is that people are treating it like a battle between two competing systems.
I have at least one major system under my own belt that uses relational SQL for backend data, ES for search engine and cassandra for TSDB.
I need and trust all those services to work as one unit.
From my POV the rise of 'NoSQL' some years back was tied into a number of things:
- Misunderstanding by most developers of the relational model (I heard a lot of blathering about 'tabular data', which is missing the point entirely).
- The awkwardness and mismatchiness of object-relational mappers -- and the insistence of most web frameworks on object-oriented modeling.
- The fact that Amazon & Google etc. make/made heavy use of distributed key-value stores with relatively unstructured data in order to scale -- and everyone seemed to think they needed to scale at that level. (Worth pointing out that since then Google & Amazon have been able to roll out data stores that scale but use something closer to the relational model). This despite the fact that many of the hip NoSQL solutions didn't even have a reasonable distribution story.
- Simple trending. NoSQL was cool. Mongo had a 'cool' sheen by nature of the demographic that was working there, the marketing of the company itself.
I remember going to a Mongo meet-up in NYC back in 2010 or so, because some people in the company I was at at the time (ad-tech) were interested in it. We walked away skeptical and convinced it was more cargo-cult than solution.
I'm _very_ glad the pendulum is swinging back and that Postgres (which I've pretty much always been an advocate of in my 15-20 year career) is now seeing something of a surge of use.
The number one reason for why people want NoSQL is horizontal scaling and for that PostgreSQL is terrible, with all available solutions being hacks that don't work.
Many people here are comparing this feature of Postgres with MongoDB or other document-oriented NoSQL dbs. I think the comparison is just wrong and unfair.
Even if I'm amazed by the performance of Postgres for this particular task (considering also that is a relatively new feature), I don't think performance is the reason why people are using NoSQL dbs. The problem that NoSQL dbs are helping with is scaling. I don't see this as a priority for an RDBMS such as Postgres. Take for instance MongoDB (just because it was named by many of the other comments here, but I guess the same apply to Couch or others): it's relatively simple to deploy a cluster with automatic sharding, replica, failover, etc.. because these are all builtin features.
I mean... yeah MongoDB got a lot of hate, but I think the broader point is that it was one of the first technologies to popularize the domain of NoSQL. No one knew how to use it properly; so we adapted SQL-like schema design, and when it became obvious that didn't work well the hate started spilling over to the first technology to arrive at the party.
The elephant in the room is, I suppose, that the modern internet literally would not be possible without NoSQL. It may be possible without SQL; that seems likely to me. Part of that is because NoSQL is a big umbrella, and covers extremely critical databases like Redis or even databases like Cloudflare's proprietary edge cache. But, even document stores are extremely critical to enterprise scaling; during Prime Day 2022, DynamoDB peaked at 150M rps. There's no SQL setup on the planet that could handle volume like that while still maintaining all the things that Make It SQL; you could start throwing read replicas and removing joins and cross-table lookups and sharding data and ope, you just re-invented a document store.
Here's the couple conclusions I have started operating by:
1. Document stores are, today, a strong choice at both low and high scales on the spectrum of systems scaling. Its great at low scales because you can avoid thinking about it too much. Its great at high scales because once you have the space to think about it you can attain substantially higher efficiency (performance+cost).
2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).
3. SQL-the-language sucks. There I said it; I'll die on that hill. The language was invented at a time +/- 2 years of Scheme, ML, Prolog, and Smalltalk. Our industry has rejected all of those. We haven't rejected SQL (yet). SQL is demonstrably, as a syntax, just as bad as those languages, evidenced by all the ORMs, injection attacks, etc. Databases tend to have a longer shelf life than programming languages, but SQL-the-language will die.
4. But, duh, SQL is fine. Go ahead and use it. In some situations it makes sense. In others it doesn't. Its a tool; one that has had 60 years to form around solving as many problems as possible.
> For the majority of use-cases out there, NoSQL databases offer enough consistency.
For applications with a nontrivial data model, ensuring that each logical operation only does a single document update (or that multiple nontransactional updates cause no conflict in maintaining consistency in the presence of other logical operations) is actually really challenging - and it adds a substantial design overhead to every new feature added. I think you're being extremely optimistic in your assertion that NoSQL systems are that widely safely applicable. My experience has been that NoSQL-based systems stay mostly-consistent because they happen to experience very little concurrent activity, not through understanding/design on the users' part.
This is not to make light of the situations where NoSQL systems shine, but the idea that higher levels of consistency are rarely useful does not match my experience at all.
> NoSQL movement, if you wanna call it that, took off because most apps (including Google, including Financial Services, even Health Care) don't need some types of consistency offered by relational databases.
I'd say that Mongo (for example) took off because:
- They really nailed the setup experience for new users (which RDBMSs historically sucked at).
- The data model is much easier for simple apps.
- They had some fairly creative techniques for making their system look good - unacknowledged writes, and claims of support for replication which didn't really fully work.
- Most programmers don't really understand the ins-and-outs of maintaining consistency.
> The reason "NoSQL" dbs got popular are because in my experience Monolithic large relational databases are hard to scale.
I've met a lot of people whomst thought they had to scale that big. Very few handled anything that couldn't run off a beefy postgres installation.
The purpose of a system is what it does. People don't use nosql to scale because they don't need to scale, so what does it do? People use nosql to not write schemas. That's what it's for, for the majority of users.
If I need a key value store, I use a key value store. There's no flashy paradigm there. If I need to put a container up on the interwebs, I do it. What's serverless? Nosql is an "idea", "paradigm", "revolution", or at least the branding of one. Just the same, serverless.
I will continue to ignore nosql and serverless.
The industry sure does change, but do you know how much of that is moving in a real direction and how much is a merry-go-round? Let's brand it "Carousel" and raise 10 million. And in 20 years we can talk about serverless being the new hotness, again.
My last few jobs were heavily Postgres-based. From what other stuff I've read on NoSQL, the benefits against relational databases just didn't outweigh learning a bunch of new NoSQL concepts. I know the MonogoDB drivers had some data loss issues (you were required to check an error code each time) that they've since then fixed.
But the general consensus from blogs I've read say that NoSQL should really be used more as a quick cache. Redius seems to replace memcache more than it acts as a real data-store. People who try to go the full NoSQL route tend to have to use several systems just to get what they had with a regular relational db.
What bothers me more than anything is that it seems like PostgreSQL is it. MySQL/MariaDB/Percona show the fragmented former MySQL landscape.
I've heard some people talk about Firebird, which I really need to check out. But is that really it? In the OSS world, are we down to PostgreSQL and Firebird (will MySQL really continue to grow under Oracle's control? Or is it on a dead end path?)
I think NoSQL works for a very limited number of cases. Most of the time you just end up moving all the schema checking and integrity checking (foreign key enforcement) to the application level and then you're squarely worse off.
Plus MongoDB, which is the poster child of the revolution, has been a technical pile of shit for a decade. It's slower than JSONB on Postgres, and loses data in the default configuration, was insecure in the default configuration, failed Aphyr's Jepsen distributed system consistency tests every time - in other words promised things about consistency that were lies in their documentation.
They have improved a lot, but I feel so sorry for people who jumped on that bandwagon early.
My interest lies in databases and database internals, and I spent 8 years building databases in my spare time. I always advise everyone that they should use a relational database unless they are very, very sure their requirements don't require it and won't ever require it. It's much easier to start with Postgres and switch to NoSQL later if you really need to go that way than to do it the other way around. Most times you find you don't need to switch to NoSQL. The imagined hordes of users never materialized to overwhelm your trusty relational database.
I highlighted the quote that bothered me; it may not be relevant to your point but I actually think it is.
Why is Node.js popular now? Because people want to create web applications that are responsive without requiring the client to constantly poll the server to see if there's new data, and Node.js does that better than existing tools.
Why are NoSQL databases popular now? Because right now, it's a lot cheaper to scale up by buying more computers than it is to scale up by buying better computers, and current relational databases are bad at scaling up that way (compared to other parts of the stack). Now, I think that NoSQL sometimes throws out the baby with the bathwater, and I do think there are cargo cult NoSQL users who are convinced that if they go around throwing out babies they'll eventually get some dirty bathwater. But NoSQL does solve real problems, and (to the point here) those are problems that are really difficult to predict that you'll need to solve from 15 years before now, because you need to know how the economics of buying computer hardware will change in the future.
I don't think it is possible, at this point in the development of these technologies, to future-proof your skills if you want to work on new projects. You can either keep learning the new things, or waiting until other people find out which of the new things are actually going to last and then learn those things. You can use your current skills to maintain legacy systems. Or you can get into something like management, where you can leave specific technical details to other, younger people and focus on broader issues, like relations between your developers and the people they're serving.
It certainly seems that the NoSQL movement is largely fueled by two facts:
1. MySQL sucks
2. Oracle is expensive
I'd love to see Postgresql get more attention, as I feel that they have scaling up and scaling out handled fairly well, whereas MySQL/InnoDB has a hard time even scaling up (which is why the Drizzle project even exists).
"You make it sound like the NoSQL people were the first to observe this. That is very far from the truth -- IMS, Codasyl, Berkeley DB, object-oriented DBs, etc. have all been around for a long time."
Yes. The whole "NoSQL movement" doesn't make any sense until you recognize it not as a brilliant technical development, but as a backlash against hordes of people who always answered "What is the best way to store X?" with loud shouts of "SQL! And you're a moron if you disagree!" without ever examining the nature of X.
(Yeah, I'm obviously exaggerating. But only a bit. Don't even pretend that there haven't been people running around and shouting this at every available opportunity.)
"NoSQL" is ultimately more about the observation that relational databases indeed are not the be-all, end-all answer to every problem ever. In a technical sense they're not even remotely new; what's new is the cracking of the SQL dogma in common perception, brought on by an increasing number of workloads that SQL databases just can't handle economically. (Which is to say, even if there exists a SQL DB and a DB server that may meet some need, SQL doesn't win if the server is actually more expensive than using a "NoSQL" solution.)
Too much marketing was delivered and too little technology. Compare to something like CouchDB or Riak where they really are working hard to deliver innovative scalable distributed datastores.
Plus the fact that many people gave their heads a shake and realize that you CAN do a basic key-value noSQL data store in an SQL database with a simple schema. And of course, both PostgreSQL and MySQL have responded with optimized key-value store capabilities in their databases right alongside all the SQL goodness.
NoSQL first became popular in 2008/09, when web traffic was exploding. In those days 500GB disks were the largest available and were very expensive in server form. CPUs were less powerful, and RAM was much smaller.
All this meant that many sites really were running into the limits a single server database. The most common solution back then was master slave MySQL replication, which had a whole set of problems of its own. Don't forget Postgres replication was pretty rudimentary, and back then MySQL really had a performance edge if you could compromise things like transactional integrity(!) - which many did.
Things like Hadoop solved the reliability issues with distributed MySQL, MongoDB (and CouchDB) tackled the horrible developer ergonomics and Redis tackled performance.
In that context they all made a lot of sense.
Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.
reply