Flamebait. You have cleaner and better examples of a world-class kernel and database engine? You want to do rocket surgery, you gotta bend a few scalpels.
Am I the only one who is getting tired of, "I found a new thing! The old thing is dead!"
Pretty much everything I've ever needed to do was easily accomplished with a DBMS with alternate indexing, either BTree or bitmap transaction. If I run into a jam, I'd be glad to take a look at additional options. I won't get rid of my DBMS any sooner than I'd get rid of my car just because of the advent of the personal helicopter.
Literally anything. There is so much to do better. Faster I/O, kernel bypass and async filesystem, new persistent data-structures, alternative lock-free concurrency resolution schemes…
Disclaimer: I am highly biased, as I am funding/researching/developing a DBMS myself. Out of necessity though, as we constantly hit bottlenecks in the persistent I/O layer. We are not selling or offering anything, but will soon share some fresh internal results on aforementioned topics.
Probably many huge codebases could be reduced to a handfull of good old SQL queries and nothing more...
If you trash things like: framework boilerplate, complex graphical UI libs, devops complications, auto-testing ecc....
A famous P. Greenspun quote may be savagely hacked for the topic at hand: Any sufficiently complicated No/New/WhateverSQL engine contains an ad hoc, informally-specified, bug-ridden, slow implementation of a RDBMS.
Or, as a wise man once said, "Premature optimization is the root of all evil."
It's a different part of the field of course, but so many people yield opportunity cost chasing some difficult architectural problem that will likely never impact you, and if it does you'll have resources to throw at it.
Many startups won't even have slave production databases, much less have to worry about network distributed sharding, complex caching strategies, database optimization beyond query analysis.
Databases are severely constrained by the architecture choices from when they were designed, you can't back port modern database architecture and computer science to e.g. an Oracle or DB2. To integrate new computer science advances you often need to write a new kernel from first principles. I sunset the designs I license to companies every several years, starting over from a clean sheet.
Most new high-performance database engines are intended to give the developing company a new qualitative uplift in capability, scale, or operational efficiency. No one sells public licenses these days. You've heard of the organizations that are buying building these semi-bespoke database engines but they are intended for internal use only.
The reason no one sells these capabilities as a product anymore is pragmatic: it is extremely expensive to design a database engine for general public consumption and the economics are difficult to justify as an investment. But many large companies are willing to pay many millions of dollars for a narrowly focused database capabilities, and the reduced scope makes the development cost more palatable.
Databases are a miracle product. If you think of an application as a car, the database is the engine.
The idea that you have a platform that can do everything without the abstraction of a separate data storage/query platform, that exists too. I'd argue that FileMaker, Lisp, MUMPS, and a few others basically do this in different ways. I used to be a DBA at a company where the entire company ran on Informix 4GL code (which was sort of like the Informix version of PL/SQL) within the database. Also a similar approach.
But... they also have significant drawbacks. You're permamently married to that app/database stack. If any component of the system doesn't scale... you're fucked.
By chunking out the solutions to include databases, app tiers, etc, you gain complexity but lose a lot of risk. If you cannot afford Oracle anymore, you can invest in labor to move to Postgres. If you're hitting a limitation with MySQL, you can move to Oracle. If you wrote your app in PHP, it goes viral, and you cannot scale it, you can migrate to a Java Application Server layer.
"If you are working with a database, I would hope that you do not allow your expectations to be set by the marketing department. If you don't do your due diligence then you deserve to be bitten."
That's the real definition of "hard to use": you have to research everything yourself and send the product through QA just to use it.
There's a very high value in products where you don't have to do a lot of research on the implementation quality and caveats. If you start using it, and it appears to work for your needs, you won't be bitten too badly later. In my opinion, PostgreSQL is an example of such a product.
Of course there is always some opportunity to do the wrong thing. It's a question of degree.
Following your advice would essentially mean "only big companies can ever release anything" because you'd need a team of full-time people to sit around doing research and QA for libc and the kernel and everything else you depend on.
DBAs are done. A certain level of database proficiency is demanded of all the other engineers and non-engineers. And if you are really working on a super-awesomely-complex system that needs super awesome scaling, then..... you make shitty queries anyway and just increase the size of the memory cache or the number of database server instances. What!??? Heresy! Well okay then, go find a bad query and improve it so you get a nice bonus later. See, look, still no DBA on staff.
Major products are released and operating right now with incredibly low performant queries and uses of databases. The infrastructure people and services keep the databases running.
I have immense respect for and look up to anybody who can do something as complex and low level as DB kernel development (mem management, persistence/file system, caching), especially with the long list of features you have to support to make the DB desirable/useful (ACID transactions, connectors).
I also hope DB implementers know how to get acquired by big, companies, because I don't see how they can really compete with these big legacy companies that have been developing their core DB's for decades and have an army of support/sales to back it up. Not to mention, where's the safety and accountability with mission critical data with a new product versus say DB2 or oracle? Maybe the new DB is a better implementation/more fault tolerant, but there's no big corporation to blame if something goes wrong, rather a small company you gambled on.
As someone who works in the data-{engineering, science,etc} space, I really don’t understand the communities obsession with unwieldy tools like DBT.
It’s like there’s some obsession with ignoring any kind of practice in wider software engineering, or innovation in the PL theory space, in favour of…gluing sql together using string templates?
It seems like for the most part-there’s very little innovation or progress in the space. It’s just 15-variations on a theme of configure some yaml for your ninja-interpolated strings to dump ever more data into your cost-centre black hole known as a “modern data lake”.
I’m sure there’s some interesting things going on in small corners, but it’s difficult to find, and if it exists, it’s being studiously ignored by mainstream tooling.
It's a little dogmatic in places. For example, stored procedures and custom functions to improve database performance are a perfectly legitimate approach, but not the only one, and it comes with limitations as well as benefits. :P
Also, way to totally omit NoSQL data stores and automated testing. Selenium anyone? no one? bah.
Glad to see there’s still some love for DB knowledge. Seems like everyday someone is coming to me with an issue that could be quickly solved by leaning on a mature RDBMS, instead of an ORM/NoSQL/flavor of the year tool.
Too bad most roles in this industry (that pay well and don’t work you to premature balding) need niche specialists, instead of deep generalists.
And all the faddery (see: MongoDB, MySQL, DocumentDB). “We don’t need a full-fledged RDBMS. We’ll ride this malnourished pony all the way past the finish line. We gotta stay lean; we gotta stay hungry. There’s no ‘OLTP’ in first to market.”
> Most enterprise systems start off with requirements similar to those you think of with a database - a lot of data with high expectations of performance.
Not where I work.
And even then, this is no excuse to stick to a zeroth order heuristic, and make big programs every time. Some systems can be cleanly separated in simple components. Failing to see that is a waste.
reply