Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> One of the selling points, which is now understood to be garbage, is that you can use different databases.

It was a major selling point back in the days. You can say that's now a legacy but it was definitely a thing pre-cloud / SAAS.

Lots of software used ORM to offer multi-database support, which was required when you sell a license and users installed it on-premise. Some organizations strictly only allowed a certain brand of database.

You couldn't spin up a random database of flavor in AWS, Azure or GCP. There were in-house DBAs and you were stuck with what they supported.



sort by: page size:

>There was a much-hyped database a few years ago that failed to live up to the hype because of exactly that architecture.

The only database I can remember that did something like this was FoundationDB. They had the core layer as a K/V, and a layer above that was FSQL or similar. They got acquired by Apple, and most of their material was taken off line. I don't know if they were a failure (IIRC, some people were pretty impressed with the core NoSQL DB), but an exit to Apple may not have been all bad - although I would bet that Apple is not using the technology internally.

IME, my beef with the new NewSQL solutions is that they tend to be worse at both NoSQL and SQL workloads (specifically none are "the best" at OLAP and OLTP) except at one specific use case - global-replication. Getting global replication requires a cost (either monetary like Spanner, or performance wise like everything else), and global-replication just doesn't seem like a killer feature to me. Most times I need data in two datacenters globally, I wouldn't replicate the data anyways (don't want EU data in the US), and other times my SLAs just aren't tight enough to warrant the cost.


> why not use different databases? They cost nothing and provide perfect separation.

I understand the sentiment, but This is a pretty simplistic take that I very much doubt will hold true for meaningful traffic. Many databases have licensing considerations that arent amenable. Beyond that you get in to density and resource problems as simple as IO, processes, threads etc. But most of all theres the time and effort burden in supporting migrations, schema updates, etc.

Yes layered logical separation is a really good idea. Its also really expensive once you start dealing with organic growth and a meaningful number of discrete customers.

Disclaimer: Principal at AWS who was helped build and run services with both multi tenant and single tenant architectures.


>Probably an unpopular opinion, but I think having a central database that directly interfaces with multiple applications is an enormous source of technical debt and other risks, and unnecessary for most organizations. Read-only users are fine for exploratory/analytical stuff, but multiple independent writers/cooks is a recipe for disaster.

In my org I've felt the pain of having centralized DBs (with many writers and many readers) a lot of our woes come because of legacy debt some of these databases are quite old - a number date back to the mid 90's so over time they've ballooned considerably.

The Architecture I've found which makes things less painful is to transition the the centralized database into two databases.

On Database A you keep the legacy schemas etc and restrict access only to the DB writers (in our case we have A2A messaging queues as well as some compiled binaries which directly write to the DB). Then you have data replicated from database A into database B. Database B is where the data consumers (BI tools, reporting, etc) interface with the data.

You can exercise greater control over the schema on B which is exposed to the data consumers without needing to mass recompile binaries which can continue writing to Database A.

I'm not sure how "proper" DBAs feel about this split but it works for my usecase and has helped control ballooning legacy databases somewhat.


> * our enterprise db was bursting at the seams containing Literally Everything. Now, every part of the split up monolith has it's own self contained data store tailored to what is appropriate for that particular thing. (some use MariaDB, others Redis etc etc)

Why do you consider an enterprise DB "bursting at the seams" to be a bad thing? Isn't that what enterprise DBs are built for? Seems like you traded having everything in one large database to having everything scattered in different databases. You probably sacrificed some referential integrity in the process.

> * developing, building, testing and deploying took ages. Eg if I only needed to capture some new detail about a business partner user (eg their mfa preference app vs sms) I would still have to do deal with the unwieldy monolith. Now, I can do it in the dedicated business partner user service, which is much easier and faster.

You traded a clean codebase with a solid toolchain for probably a template repository that you hope your users use or everyone is reinventing some kind of linting/testing/deployment toolchain for every microservice

> * the whole monolith, including business partner facing operations, could go down because of issues to do with completely unrelated, non critical things like eg internal staff vacation hours.

This could apply to any software. Sure, a monolith can have a large blast radius, but I can guarantee one of your microservices is critical path and would cause the same outage if it goes offline.

> The few callers that do need to obtain both pieces of data just make concurrent calls to both and them zip them into a single result.

Almost like a database join?


> Of course you run into issues with transaction occurring across multiple databases but these problems are hard but solvable.

The only thing you need to do to fix this is run all the services on the same DB.

> This sounds crazy. I don't know any large companies that have successfully implemented it. This is basically arguing for a giant central database across the entire company. Good luck getting the 300 people necessary into a room and agreeing on a schema.

You don't need every service to use the same schema. You only need transactions that span all services. They can use any data schema they want. A single DB is only used for the ACID guarantees.


> It’s never one big database. Inevitably there are are backups, replicas, testing environments, staging, development. In an ideal unchanging world where nothing ever fails and workload is predictable then the one big database is also ideal.

But if you have many small databases, you need

> backups, replicas, testing environments, staging, development

all times `n`. Which doesn't sound like an improvement.

> What happens in the real world is that the one big database becomes such a roadblock to change and growth that organisations often throw away the whole thing and start from scratch.

Bad engineering orgs will clutch defeat from the jaws of victory no matter what the early architectural decisions were. The one vs many databases/services is almost moot entirely.


> I think having a central database that directly interfaces with multiple applications is an enormous source of technical debt and other risks, and unnecessary for most organizations.

I think that the operative word here is "over time". So what is meant is not necessarily supporting many applications at the same time, but rather serially.

So the message is supposed to be: Apps come and go as they can be rewritten for so many reasons, but there will be a lot less reasons to redesign / replace a "valuable" database.


> Use One Big Database.

I emphatically disagree.

I've seen this evolve into tightly coupled microservices that could be deployed independently in theory, but required exquisite coordination to work.

If you want them to be on a single server, that's fine, but having multiple databases or schemas will help enforce separation.

And, if you need one single place for analytics, push changes to that space asynchronously.

Having said that, I've seen silly optimizations being employed that make sense when you are Twitter, and to nobody else. Slice services up to the point they still do something meaningful in terms of the solution and avoid going any further.


>I am curious what HN thinks as major reasons for why everyone seems to have moved away from 1 big SQL database

For the places I worked:

1. We transitioned to microservices

2. Performance, 1 BIG database slows that

3. Ops/maintenance is very hard in a huge DB

4. In a huge DB there can be a lot of junk no one uses, no one remembers why is there, but no one is certain whether that junk is still needed

5. We had different optimization strategies for reads and writes

6. Teams need to have ownership on databases/data stores so we can move fast instead waiting for DBAs to reply to tickets.


> You need to evangelize a product on which the life of someone’s project or company depends.

I've actually seen the results of a database product being picked, the company behind which ceased to exist. It wasn't pretty, since at one point the product ceased to be supported and therefore neither any updates/fixes were made, it wasn't available in the repositories for new OS distros and eventually even the documentation for it went offline. Having to support a system that integrated with it was an unpleasant experience, all the way to it being eventually replaced with something else instead.

Therefore, it probably makes a lot of sense to base something as critical as your data storage layer on proven technologies that have demonstrated that they'll probably be supported in one form or another for the following years or even decades, unless you have a good reason for choosing something else.

Those reasons might deal with particular workloads or requirements, e.g. clustering solutions for PostgreSQL/MySQL/MariaDB/..., geospatial extensions, solutions to integrate with it through REST interfaces or even GraphQL or something like that, with a stable and proven piece of software still at the core of it all.

In case anyone is wondering, the product in question was Clusterpoint, about which you can read a bit more here: https://en.wikipedia.org/wiki/Clusterpoint a NoSQL database that actually predates MongoDB by a few years, as far as i know. Of course, now it seems like even their homepage is offline.


> When databases were shared by many apps, it made sense to think of them as their own services. When each database is built and run to service exactly one app, that makes less sense, plus it's also easier to push everything onto the devs (as above).

I think this is a really good point. The move towards microservices architectures means smaller and smaller databases. But more importantly, the data is distributed, so a lot of query complexity moves to services and the data is effectively modelled in a nosql style.

I wonder if newsql databases will bring back the database-as-service paradigm. At some point the costs of running hundreds of tiny databases is going to catch up to companies and the prospect of having a single critical instance to care and worry about is going to look attractive. Maybe we will see the return of the DBA in a new avatar over here.


> The question that often popped up was where to store data. Deno Inc. provided several guides to connect to different cloud services.

Sure.

> But they want the friction reduced to a simple `await Deno.openKv()`.

Do “they”?

If so, who’s using it to solve that problem? …because it seems the big uses of deno deploy are not using it, fine with that and it’s pretty unclear who the “they” is in this circumstance.

Still, if it’s a thin layer over foundation db or some other established database product and this is just part of the lock-in for their cloud offering, fair enough.

It’s not like others (eg firebase) don’t do the same thing.

The messaging “we’re building a database” and “we’re offering a hosted database service based on existing mature reliable technology” are different things though.

The latter all cloud vendors do.

The former is ridiculous, and it really really wasn’t clear that wasn’t what was happening.


> They took great pains to keep data in sync across A and B datastores and I'm not so sure that extra cost was worth the perceived stability of this approach.

Such great pains come with huge systems. What's the alternative?

Taking the platform offline for a few hours? Management will say no. Or maybe Management will say yes once every three years, severely limiting your ability to refactor.

Doing a quick copy, and hope nobody complains about inconsistencies? Their reputation would suffer severely.


> Recently I see a lot of advocates of dropping NoSQL databases and moving back to Postgress or other SQL databases.

This is just from a vocal minority on HN. You just need to look at the facts.

Companies like Mongo, Datastax, Aerospike etc are growing bigger by the day, with increasingly higher valuations. Old school database companies like Teradata are now all about datalakes incorporating Hadoop and Mongo. And technologies like Spark, Impala are now on the front line for many data analytics and processing work.

In the enterprise at least SQL databases are increasingly being relegated to a small part of the whole data pipeline i.e. storing the consolidated, integrated data model.


> There was a much-hyped database a few years ago that failed to live up to the hype because of exactly that architecture. I can't even remember the name now when I was trying to find the post-mortem analysis for it in Google.

FoundationDB, acquired and taken internal by Apple. There were a few different blog posts about the layering aspect of it, with mixed opinions.

They built a transactionally consistent distributed database that was mysql client compatible and within 50% of mysql single node performance. If creating that and getting it bought by Apple is failing, I hope I'm lucky enough to fail like that.

> I'm also skeptical of a database claiming to be good at both OLAP and OLTP. One requires a column store, the other a row store. You can be half-decent as an OLAP store and good as a OLTP store.

This is a rigid and narrow view. There's lots of other design possibilities such as fractured mirrors, column decommission only within pages, etc.

I wouldn't call a single db that can do both OLAP and OLTP easy or a slam dunk, but I also wouldn't rule it out. VoltDB is well proven and worth looking at.


> It is also terrible design - having data in two places means you now have to implement access control, logging, auditing, data access and so on twice.

I've understood and implemented differently. With Spectrum (or Polybase for SQL Server / Synapse), you can extended into the data lake. Copy over aggregate/curated data or something you need to special use cases on. Leave the structured, columnar data within the cheap storage. You pay per scan but it is cheap (at least to a point).

Also, Databricks took the Lakehouse moniker and sprinted with it. AWS was late to the game from what I saw (at least for marketing terminology adoption).


> But too often people get in this mindset that old commercial products are meritless or whatever.

There is a lot of that assumption. And then, there is the vendor lock-in licensing terms that commercial DB vendors utilize, in that database switching cost is just prohibitively high and impractical.


>Databases we have today aren't designed for the kind of consumer applications we use today.

Yes they are? Relational databases are very good for consumer applications with relational data. NoSQL databases are good for everything else.

>Data(bases) should be owned by users, and synced with friends and colleagues.

Why? You use words like should - why SHOULD it be that way? Users are able to create / buy / rent infrastructure to manage their own data - and yet they don't ... why?

>Shouldn't be centralized in this fashion. Data(bases) should be owned by users, and synced with friends and colleagues. So instead of a single centralized database, we need a million tiny little databases which are synced with peers.

Nice in theory (maybe). The problem is that centralization provides enormous quality and performance metrics. Distributed systems are resilient, but slow, and require a lot of care to maintain consistency and integrity of the network. Take email as an example. It took us 20 years to get email spam somewhat under control and a major part of that was creating a centralized infrastructure (from using centralized email providers to third party whitelists and blacklists). It's gotten to the point where hosting your own mail server is a huge hassle and pretty much impossible for regular people. I spent about a week trying to figure out why our Office365-originating mail wasn't being received by one of our customers. It turned out, their mail provider matched a particular phrase in our automatically-inserted disclaimer and completely rejected the mail.

Decentralized systems also make controlling your data all but impossible. If I want to remove my data from Facebook, that's the only entity that I need to deal with and it's relatively easy for regulators to set sound policies in place. In your distributed system, once your data leaves your node, you've lost all control all of it forever.


> This is a case where the db server should use the entire resources of a single server

They have thousands of clusters. They didn't design/architect anything.

They're likely just trying to regroup databases because they are heavily underutilized and noone knows WTF they are running. And the organization will keep growing like that, adding new databases every day.

next

Legal | privacy