Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

>>IBM and Oracle for years and I can tell you, that kind of behavior is exponentially worse

These dynamics are common in any system where you have a pyramid structure and resources are limited.



sort by: page size:

> A better example is frequency capping. Ever watch something on Hulu and see the same ad 4 times in a twenty-minute commercial? Or even, worse, back to back?

Yeah, but when that happens I usually don't think, oh hey they are lacking an optimal in memory distributed database solution.

I think, well... their engineers suck. Or they don't care. Pick one.

edit: His point is vague, so there is nothing technical to respond to. I am very much interested in a good technical example - but the things mentioned so far are by all appearances relatively straight-forward and linear, hence lack of effort or bad engineering are the only reasonable assumptions left.


> I’m in the data space and a company like Snowflake has a model where you pay for compute by the second and storage by the byte. Very simple, transparent and everyone is aligned.

Not sure everyone is aligned. Sounds like Snowflake got no incentive on optimizing queries. They even got an incentive doing the opposite. They must keep their infrastructure as-is without any optimization on compute time nor storage to earn the same amount every month.


> Once you're at scale and deploying a redundant database, your hardware costs start to vanish anyway.

This is really, really false. It would make my life a lot easier if it were true.


>>What exactly would cause them to never deliver

concurrency issues, latency issues, poor database schemas, scaling issues, there's a lot of reasons internet scale things can grind to a halt.


> We maintain things like queue ordering by adding an ORDER BY clause in the query that consumers use to read from it (groundbreaking, we know).

The dude's being a bit too self-deprecating with that (sarcastic quip).

But there's valuable something buried there - if it is possible to efficiently solve a problem without introducing novel mechanisms and concepts, it is highly desirable to do so.

Don't reinvent the wheel unless you need to.

> You could set the prefetch count to 1, which meant every worker will prefetch at most 1 message. Or you could set it to 0, which meant they will each prefetch an infinite number of messages.

O what the actual fuck?

I'm really hoping he's right about holding it wrong, because otherwise ???


> Oracle may be fast, but it's also expensive.

So? That's capitalism: one gets what one pays for, and Oracle is not just fast, it can do a lot, and it can be configured to be paranoid about protecting data, and it has clustering technology meant for scaling, RAC.

Truth be told they could have picked PostreSQL and it would have still been a better solution than Cassandra and MySQL.


>Also, in my experience, the database is almost always the main cause of any performance issues

More generically, state stores are almost always bottlenecks (they tend to be harder to scale without some tradeoff)


> A cheap, slow, dirty solution today tends to win over a expensive, quick, elegant one next year.

I disagree with this platitude, one reason being the sheer scale of the hidden infrastructure we rely on. Just looking databases alone (Postgres, SQLite, Redis etc.) shows us that reliable and performant solutions dominate over others. Many other examples in other fields like operating system, protocol implementations, cryptography.

It might be that you disagree on the basis of what you see in day-to-day B2B and B2C business cases where just solving the problem gets you paid, but then your statements should reflect that too.


> Sounds incredibly inefficient

Just like having redundant servers for your applications and backups for your data.


> 90% of our bill is database related currently.

Huh. With the context of the rest of the comment, I realize (the very obvious comparison) that a database engine designed to shard to many thousands of small workers could potentially be a very attractive future development path.

Iff the current trends in cloud computing (workers, lambda, etc) continues and some other fundamental doesn't come along and overtake.

Which is probably (part of) the reason why this doesn't exist, since I think I've basically just described the P=NP of storage engineering :)


> It's actually very rare that this happens

So I have a little consultancy gig for a few decades now where I spend a few days a month optimising bad software for performance (it is what I like; I don’t do anything else but ‘make shit faster’). I can tell you that the the past 10 years 99% of optimisations I did are fixing MySQL queries and indexes that table scan. I had projects that literally have table scanning queries over 50% of the queries ran. The result, as you know but apparently is not very common knowledge, is that these sites and apps run to a grinding halt (after incurring bizarre bills on aws rds; I moved many app from $100k/month bills to $10/month) when even a little traffic comes in.

Or; table scans should be rare but are not.

Edit; removed ‘time’ as that was not a good way of expressing this


> Data pipelines are a real problem though

Can you please elaborate more, thanks.


> oracle database is dying an its niche is shrinking.

This is true, but misleadingly not the whole truth. On prem RDBMS is dying and its niche shrinking (at least in this part of the cycle)... having said that, Oracle cloud offerings are doing very well.

Amazon made a herculean effort, and made some headlines when they migrated most of the business off Oracle last year. That's wonderful, for Amazon. Most of Oracle's enterprise customers are not Amazon.

> I'd also prefer a galera or an innodb cluster

The thing is.. Oracle sells so much more than just an DB engine, and people are buying. Oracle RDBMS is not an inferior product to open source competitors, in most ways superior, and yet, it is really just a proverbial loss leader.


> A key feature of these environments is that state is not generally persisted from one request to the next. That means we can’t use standard client-side database connection pooling.

So we introduced so many optimizations just because we can't persist state. I can't help but think this is being penny wise pound foolish; the problems this article is solving wouldn't have been problems in the first place if you choose a boring architecture.


> second, by placing computers under constant demand, they were more likely to fail

Really? In what sense? Hardware fails or what? The complexity usually brings everything down.

Anyway, looking forward to cockroachdb, it seems it solves some high-availability / multi-site problems.


> This is a case where the db server should use the entire resources of a single server

They have thousands of clusters. They didn't design/architect anything.

They're likely just trying to regroup databases because they are heavily underutilized and noone knows WTF they are running. And the organization will keep growing like that, adding new databases every day.


> Whatever happened to use the proper data structures for the job?

This so much. People too often treat databases as magical black-boxes that should handle anything. Database is most often the bottleneck and choosing the proper storage engine with appropriate data structures can be 100x more efficient that just using the defaults. 1 server vs 100 can definitely make a noticeable difference in costs and system complexity.

While premature optimization is bad, choosing the right tool for the job is still somewhat important and will usually pay off in the long run.


> It used to be a given that nobody wanted their bank to run on anything, but mainframes. Now we'd rather they used cloud computing and Postgres. Mainframes have had their day. They may have a future, but they need modern databases and development tools.

I suspect you ask random Joe on the street they would say they would rather be on a mainframe.

Also, I would much rather my bank didn't host on AWS, GCP, Azure, etc.


> there were a handful of bottlenecks that you caught immediately

Exactly. I was responding mostly to the point that most CTO's/management belief that you should just let hardware handle it while programmers should just deliver fast as they can. He says it is always a balance; you cannot pay for optimized assembly when writing a crud application, but I claim we completely swung to the other side of the spectrum. For instance, a financial company I did work for had no database indices besides the primary key and left AWS to scale that for them. And then we are not even talking about Mongo (this was MySQL); Mongo is completely abused as it is famous for 'scaling' and 'no effort setup', so a lot of people don't think about performance or structure at all in any way; people just dump data in it and query it in diabolical ways. They just trust the software/hardware to fix that for them. I recently tried to migrate a large one to MySQL, but it is pure hell because of it's dynamic nature; complete structured changed over time while the data from all the past is still in there; fields appeared, changed content type etc and nothing is structured or documented. With 100s of gbs of that and not sure if things are actually correctly imported, I gave up. They are still paying through the nose; I fixed some indexing in their setup (I am by no means a Mongo expert but some things are universal when you think about data structures, performance and data management) which made some difference, but MySQL or Postgresql would've saved them a lot of money in my opinion. Ah well; at least the development of the system was cheap...

next

Legal | privacy