I am referring to the title that is generated and displayed on HN for the link (I don't know if it's done through scraping). Twitter has proper title tags on status pages by the way (e.g. https://twitter.com/wikileaks/status/235818483588407296)
That title is supplied by the submitting user according to some informal recommendations. Every so often, there will be an argument over whether the submitted title is appropriate for the content. HN doesn't scrape anything for it.
There is. It isn't an official policy, but dupe check only compares to values in some cache somewhere. Duplicate articles are FINE on HN, given that after a time period the ability to comment is removed from an article (unless of course, the suggestion is that there is nothing valuable to discuss about something n days after a conversation starts, ever).
Indeed, it's a shame that QUEL died. I recently daydreamed about Teaquel, a CoffeeScriptesque take on SQL. I'm afraid it would have little chance to take off, though...
htsql is a next-generation query language built on top of SQL - you can express in severals lines of htsql what would be a screenful of SQL. www.htsql.org
QUEL was pretty clean. Far more consistent than SQL.
It died, mostly because SQL was used by Oracle and IBM DB2 which were better marketed than Ingres. The DB world ended up standardizing on SQL because of this.
It is cleaner, although it doesn't feel _that_ different to SQL to my eye. I guess if we're now thinking about replacing SQL then I'd want to get a lot in return. (I do know that I'm sort of moving the goalposts here - thank you for contributing the QUEL info)
It looks almost like SQL; the main philosophical differences:
a) it embraces order (that is, every query result has an implicit "running index" field (called i, starting at 0).
This single handedly solves a lot of inconsistencies in practical SQL having to do with order, which is abhorred by the relational data model, thus not a first class concept, but is often required in practice, and thus inconsistently bolted on.
b) columns can nest, and do not have to have the same type - which means that aggregations like count, sum, max, min and distinct are not special in any way.
c) there's a simple underlying programming language, so if you have an intermediate select result that you need twice, you just give it a name by prepending a "name: " to the select.
temp: select from grades where age>10;
b1: select from temp where eyecolor=`blue;
b2: select from temp where eyecolor=`brown;
(compare to the mess that is correlated sub-queries, or alternatively, horrible "create temporary table x as ..."
I definitely see why (a) is good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language; I think most databases do support the ROW_NUMBER() function now though. I think (c) has been introduced as Common Table Expressions in SQL.
I'm not sure I fully understand (b) - do you have a link so I can learn more?
As you can probably guess, I'm very interested in this stuff!
> good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language;
First of all, there's some tautology here - a "relational database" (and similarly, relational calculus, relational algebra, etc), BY DEFINITION deal with sets of tuples ("relations") which are (again, by definition) unordered. So let's just ignore the word "relational" in this discussion.
> I think most databases do support the ROW_NUMBER() function now though.
Many do. But do compare (sql 2005):
SELECT * FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS ROW_NUMBER,
COLUMNS
FROM tablename
) foo
WHERE ROW_NUMBER <= 10
To (kdb+ / q):
select from tablename orderby asc sort_key where i<10;
And usefulness of order goes why beyond time-series data (and sorting): let's say you have a tiered pricing scheme for widgets you sell:
a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)
b) not duplicate data, but have crazy subselects (among all with count > from_count, select the one with maximum to_count) or stuff like that.
When you actually have order, you have operator that embrace that order - e.g. kdb+/q's "bin" which finds the "bin" (as in "bucket", not as in binray) something fits in:
select unit_price[from_count bin order_count] from table
There are many other use cases involving running sums (e.g, you have a table of weights and priority; select the list of highest priority items whose weight sums to 100lbs or less).
Order is really really missing in SQL, but it's one of those things people are not aware of because they've never used anything that does support it properly.
nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables). Many one-to-many relationships in sql that need additional tables can just be done within the same table in kdb+/q
regarding Common Table Expressions - I wasn't aware of them, they do help a lot. The syntax is horrible, but I guess they do work...
edit: more info on kdb+/q can be found in http://kx.com/q/d/kdb+1.htm - you have to get to section 8 before they start discussing the query language, but it's short and to the point. There's a lot more in http://kx.com/q/d/ if you are interested.
a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)
That's never how I have done it. Since this is a tiered pricing model, you just:
select * from widget_price where min_size < ? ORDER BY min_size desc limit 1;
We actually do something almost identical for sales tax rate changes in LedgerSMB. You look to the date of the transaction and take the most recent rate. No range types, etc.
Where range types are handy (and why I am looking forward to them in 9.2) is where you have to deal with things like a financial transaction that should be amortized over a period of time. This would allow you to adjust the transaction incrementally as a reporting, rather than an accounting, function. But I wouldn't use them for cases where you don't want overlaps or gaps. The best thing there is to just put in floor values and select the next highest floor.
There you go. Again order by and limit/offset do pretty much what you want without doing crazy inline views like you are doing in your example. BTW, I see inline views as an antipattern in SQL best avoided if you can.
nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables).
PostgreSQL supports this btw although back in 7.3 or so if I remember, I found that tables with tuples in columns were write-only but that was fixed pretty quickly (first with a "don't do that" check and then with a real fix).
The point of CTE's is to give you a stable intermediate result set.
.... and you've just embraced (in a limited inconsistent way). using "ORDER" is outside of the relational model, where order plays no part.
ORDER was there from the beginning, but CTEs, nested columns, window functions, connect-by recursive selects and similar stuff is being added because SQL and the relational model are actually quite limited when it comes to real world problems.
Of course, SQL will keep getting extended to solve real world problems; however, that does not mean SQL is "the best solution out there" (or "the worst solution except all others that were tried").
There are certain things I really like about SQL at least compared to other programming languages. I am not sure it is the best possible db query language. Personally I have always thought Quel was elegant. However compared to app languages, I would far rather spend my time debugging 200-1000 line SQL statements than 200-1000 line subroutines in Perl, Ruby, or C++.....
Regarding ordering. Relations are defined to not be in a meaningful order. This doesn't mean you couldn't define some derivative that is ordered or that the relations might not be ordered in some way that is ignored by the relational math.
Suppose we have a relation R. We may order this relation physically in order to help the computer retrieve data faster, clustering on an index for example. However, clustering on an index does not mean that we are guaranteed to get the same order back when we do a select * from.... (we probably will but we aren't guaranteed to). And if we add a join or other relational transformation, the order will probably not be the same. In other words, ordering is outside the scope of relational math per se.
However that doesn't mean you can't have pre-ordered relations. It just means the ordering is meaningless as far as the math goes. The ordering may however be of great practical importance as the computer goes about grabbing the relevant tuples from the relation.
Similarly I don't see a reason why ordering can't happen after the relational math is done either, in this case for humans.
What this tells me is that the relational model is not entirely complete in itself in that ordering is an orthogonal consideration largely ignored which means there are certain questions you cannot answer directly with relational math (such select from R a relation L such that it includes the tuples with the five highest values of R(2) lower than 25. I think that's mostly what you are getting at. But that's a matter of relational math being incomplete for real-world scenarios, not SQL (since SQL implementations do provide for ordering).
a is accomplished by windowing functions. These can also do running totals among other things which is really helpful in accounting environments.
b has been supported since at least 8.3. I think a column can actually be an array of complex types in 8.4 and higher (it can be a tuple in 8.0-8.3 at least though I reported a bug in this in 7.3 which resulted in a "don't do that" check).
c is handled using common table expressions.
Examples for b and c:
CREATE TABLE foo (id int, value text);
CREATE TABLE bar (id int, values foo[]);
INSERT INTO bar (values) values ({row(1, 'test')}); -- not sure if this is quite the syntax. Might take some playing around with.
My point was (and still is) that standard SQL is horrible, and there are way better solutions.
Real world usage makes SQL vendors extend SQL to make it less sucky; some of these extensions were later encoded into standard, and some are still proprietary.
Windowing functions are nice and all, but are a complex solution to a problem that would hardly exist if you actually embraced order as fundamental.
Ok, how about the most useful kdb+ extension (which I forgot about earlier): foreign key chasing: if table t has field a which has a foreign key reference to table s (which has field b which has a foreign key reference to table r (which has field c which has a foreign key ...)
in kdb+, you do:
select a.b.c from t
Does pgsql have something similar? Or do you have to spell out all the joins?
can foreign key chasing handle composite primary/foreign key joins?
You can build something to do this in PostgreSQL using stored procedures and a (a.b).c syntax but that's kind of advanced stuff. To do this you have to create a b function such that b(a) returns tuple of type b which has column or function c.
Example:
create table address (...)
create table employee (...., address_id);
create function address(employee) returns address as $$...$$;
select (employee.address).country from employee; will then return the country field from the address returned by address(employee).
So yeah, kinda, if you build your own.
edit: I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so. I don't know what the performance ramifications would be of throwing this into the column list.
> can foreign key chasing handle composite primary/foreign key joins?
Yes. The only requirement for foreign key chasing to work is that it uniquely identifies one record in the foreign table. Whether that key is atomic or composite is of no consequence.
(internally kdb+ stores a pointer to the foreign record when it verifies the existence of said record on insert, so it doesn't have to do a join query - it always knows exactly which record to bring in. So in practice, it is very efficient regardless of what kind of indexes you might have in place, the size or the composition of the foreign key field)
> So yeah, kinda, if you build your own. I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so.
pgsql is a wonderful beast. I really like it. And I would be even happier if they adopted some kdb+/q syntax and semantics, though I don't think that's likely to happen.
SQL is fundamentally about sets, thinking of it in any other way provokes unhappiness. Your differences are set-subverting add-ons which could be accomplished by any higher level front end to SQL.
SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway. These are the sort of people who use PHP and MySQL.
Nope. "Relational Algebra" / "Relational Calculus" / "The Relational Model" is about sets.
SQL is about bags (orderless like sets, but each item might be repeated multiple times). It's also about order ("ORDER BY" clause) in a horrible inconsistent way.
> SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway
No, SQL is not comparable to assembly in any meaningful way (you could replace "assembly language" with "danish" in your statement and would be equally true)
While assembly language is more verbose, it is more fundamental than everything else in the sense that eventually everything must be expressed in assembly language (machine code, actually, which is equivalent to a proper subset of assembly language) to be executed. Thus, going down to assembly language might be more up-front work, but it is guaranteed that you can match or improve on run-time results from any other language.
SQL is an inconsistent abstraction that makes some things simple, some things hard, and some things essentially impossible -- and many of the things it does do, it does in a way that's inherently inefficient. (And don't tell me about the possible smart query optimizer - it doesn't really exist any more than Intel's Itanium optimizer that makes code properly utilize the VLIW; or a Unicorn).
By the way, the fundamental concern I have with SQL is the issue of ambiguity regarding NULLs. We are taught that NULL means one of two things, but really it means one of three:
* Unknown value
* Not applicable value
* Value does not exist
This is a big issue, because you would expect operators to treat these cases differently. known || unknown is obviously unknown, but known || not_applicable should probably be known, and known || does_not_exist should be known. In sane RDBMS's there is a possibility of magic values which provides a sane way to handle not_applicable (for example an empty string as distinct from NULL and yes I am calling into question the sanity of Oracle). However, you still have the fact that the first and third cases are ambiguous although you hope not in any given query (the third case implies a missing value from an outer join), the ambiguity could in fact happen.
This is a fundamentally broken aspect of SQL. The problem with ambiguity is that if your data is ambiguous mathematically, then it cannot be reliably transformed using math.
You're getting downvoted, but I'll reply anyway -- I agree completely.
If you're typing raw SQL for getting reports out of a database then you're probably fine, but for web apps you're not typing queries, you're constructing them as strings using another language.
I've always hated the idea of writing one language in another, it feels like a giant eval() in JavaScript/PHP/etc. Not to mention it opens you up to injection attacks.
I like programatic access like MongoDB has, it certainly has its downsides but I prefer talking to a database via an API.
Surely any time you have a data store that can work with multiple programming languages and platforms you're going to end up with an abstraction layer because of different programming languages' data representation?
Even for things as simple as integers, do you have unlimited precision, unsigned value support and null values?
Opening you up for injection attacks? Only if you're using a language that doesn't support parametrized queries/prepared statements. Add ORMs to that, and you are using an API to talk to your database.
There are certain cases I run into where PostgreSQL doesn't support parameterization. For example, utility statements in PostgreSQL have no associated plan, so they have no parameters. If you want to run a utility statement (like CREATE ROLE or DROP ROLE) you have to do so via string concatenation. This is true even in stored procedures which gives you the uncomfortable possibility of SQL injection occurring inside a stored procedure already running with elevated permissions.
This is actually one area where PostgreSQL really shines. With LedgerSMB we define our interfaces in the db (as stored procs) and then have a very simple query mapper function which looks up the stored procedure in the system catalogs and then figures out the arguments. We have a second function which then generates a query based on supplied args and runs it.
No other app code since we started this (at least code in the new framework) includes any SQL. All the SQL stuff is done by one simple function. The real programming is in the database for this interface. Our approach isn't fully developed. I expect we will be working on an object-oriented interface inside PostgreSQL soon which will make the queries look like:
SELECT (f).* FROM (select entity(?, ?, ?, ?, ?).save) f;
save(entity) will then handle actually saving the data.
But individual apps can change that on the session level, right?
So still no guarantees and you more or less have to audit every app connecting to your db to make sure it isn't tampering with the sql_mode if you value data integrity.
The point though is that you lose a set of guarantees with MySQL that you don't lose with PostgreSQL. Yeah, computers make it possible to make more mistakes faster than any invention since handguns and tequila but when you lose the ability to guarantee that your declarative constraints are actually followed, you lose the ability to prove that certain types of errors are not being made. That's a big deal.
I think ceejayoz took you a little too literally. MySQL continues to scare the hell out of me. It was bad enough before Oracle go a hold of it: the designers seemed completely unconcerned with data integrity. It has many patches to deal with individual issues, but when the creators just fundamentally don't care???
The GLP-not-LGPL license sounds like a booby trap gladly left in place by Oracle from before they got it. I can't see Oracle wanting to do anything other than make it light and fast (at expense of correctness), either, since if you want "data integrity", they have a solution for you.
In all seriousness, Facebook's requirements are for high availability, with data integrity only mattering to the point at which it affects the usability of Facebook.
In other words, Facebook can tolerate a high degree of shit in the system before it becomes a problem. That isn't bad in and of itself, but you need to investigate your own needs before using something because Facebook/Google use it.
MySQL is at its roots an SQL-like database specializing in content management. The sorts of data integrity problems that can occur in MySQL are entirely tolerable in a content management environment for the most part. It was designed to be a fast backend for web sites with an SQL interface. A lot of the issues it has are entirely due to that legacy, but those issues don't matter at all when you are using MySQL for content management.
Every single one of those examples are probably using MySQL for some sort of content management.
So it isn't just whether they can tolerate some crap in their systems. It is what sort of crap they can tolerate. If the MySQL gotchas don't result in intolerable crap, then it doesn't matter.
Now, personally I wouldn't run an accounting system on MySQL particularly if I was expecting many apps to access the same db. This is because the sorts of issues that MySQL has are real show-stoppers in these environments. But content management? Why not? Heck many of these data integrity problems may be features in these environments.
For example, suppose MySQL@Twitter truncates your tweets to the maximum length silently without issuing an error. Bug or feature? Suppose it truncates numbers in your accounting software? Those are two completely different cases and while they may in theory be comparable, in practice they are not.
It is powering the core function of the sites i.e. it is the primary databases in all cases.
I am sorry but are you that deluded as to think so many of the world's leading IT companies are going to choose a database that silently loses data ? Do you really think they are that stupid ? I mean come on.
Are you sure? Do we have evidence that the accounting end of the advertisement network, etc. is handled on MySQL? Or is it the backend for the public side only?
Let me give you an example. I have a customer that uses MySQL for some processing on their web site and all data gets batched up nightly and entered into the PostgreSQL-based accounting system. The MySQL db is accessed only by the app which does the processing and the connectors to the PostgreSQL db. The PostgreSQL db has a lot more logic in it and is a lot more complex.
In their case, their environment is not subject to any of the MySQL gotchas. They can make sure the credit card processing software runs in an acceptable SQL mode, and if something goes wrong between the PostgreSQL export of reports and the MySQL public side (which has happened for reasons other than MySQL's fault), they can track down and fix the problem. That's what loosely coupled systems are about.
IT in this case is about risk management. How can you guarantee that specific important data will not be lost. You can do this with MySQL for some set of environments (either because the data truncation isn't a big deal, regarding your tweets, or because it can be retrieved from another source, or because there is only one app accessing the database so you don't have to worry about apps turning off strict mode), but you can do it with PostgreSQL for a much larger set of environments and that's what the complaint is.
> MyISAM was the default storage engine for the MySQL relational database management system versions prior to 5.5
This is what the anti-MySQL crowd is talking about - the dark days when MySQL lacked transactions, referential integrity, and concurrency. It was basically the SQLite of its day.
There's nothing wrong with all that if you just want a key-value store, which is what most web apps are.
And it generally doesn't "silently lose data" unless you are touching that data. It tends to just silently do stuff that doesn't quite make sense. Like if you post in a thread, and your "post count" goes up but your post itself fails, it's not a huge problem. Unlike in an accounting app, where you really don't want a transaction to partially work (e.g. money falling through the cracks).
When MySQL broke on a website, I think most people just assumed it was internet gremlins.
It's not just MyISAM though. The fact is that these things actually make sense when you think about MySQL as being a content-management backend for the web. The problem actually has to do with SQL modes, the fact that MySQL will silently substitute table backends for you, and much more. Again in some environments this is ok, and if you have total control and tight control over a small number (preferably only 1) of applications hitting your db it isn't the end of the world.
But PostgreSQL folks look at MySQL from a different perspective. It's a db-centric rather than app-centric perspective. In this perspective your db needs to guarantee that declarative constraints will be followed. In this perspective the db is the center of the environment, not the bottom tier of the app stack. In this perspective you could potentially have dozens of apps using a single db.
It isn't a matter of MySQL having grown up a bit (and it has). It is a matter of it not having outgrown the content management and/or single app per db environment in which it arose.
How much do you think Craigslist really cares if your ad to sell a futon goes missing at 2am? They'll apologize and help you create a new ad. They're not going to migrate databases over it.
For every one of those sites, I suspect the database integrity plan is "meh, restore from a backup."
So what you're saying is that all of the engineers at DBAs are sitting around going "Oh we just lost another user's Facebook page. No worries. They can just create another one." Do you not think they would've switched databases already if the couldn't fulfil the primary purpose of their business i.e. managing data.
That's what replication is for. :) Really, mysql makes a fine key value store. "Mostly ACID" is good enough for lots of purposes, but what people complain about is its "serious business" sql failures.
Look, I used to have to support mysql. Every couple months we had to increment our minimum required version because we found yet another query that didn't work right.
That's really interesting. The two folks from Facebook contradict each other when talking about joins. Aditya says "No joins in production.", while Mark says "Some of the stories I read about sharded SQL state that you don't do joins when you have sharded MySQL, and that has never been the case for the workloads that I watch. You're always doing some complex query processing within a shard."
Anyway, I can see how you might have come away with the notion that MySQL is a glorified key value store by watching the first video which only briefly touches on their MySQL usage.
I don't like Oracle either, but you're coming dangerously close to sounding like a conspiracy theorist. The GPL-not-LGPL issue with the client library hasn't been a problem for years now. The attentive have been linking their non-free mysql-be-using software with libdrizzle, which is a complete, clean-room implemented, Berkeley licensed MySQL client library. Before that they were using the public domain version that MySQL-AB released before Monty decided to get sneaky with the GPL.
I believe that MySQL has become better since Oracle got a hold of it, their stewardship I consider to be much better than Sun's. Since acquiring MySQL, Oracle has put out an extremely solid release, MySQL 5.5, and are making steady progress on 5.6. If you lived through the early releases of MySQL 5.1 you have an idea of what a botched MySQL release is like.
I don't believe you do anyone on HN a service by spreading your gut feeling FUD about Oracle and MySQL.
Yeah, I have also had multi-row inserts deadlock against themselves in MySQL.
However, you need to reword that slightly. MySQL does have such a working insert statement if you set strict mode. The problem is that apps can unset strict mode. Until that changes.....
So really you should word it as:
"One that can be guaranteed not to randomly truncate your data, or allow insertion of nulls into not null columns."
MySQL inserts in strict mode don't do these things. MySQL inserts cannot be guaranteed not to do these things however. Therefore this relegates MySQL, in my view, to a one-app-per-db environment because you cannot prove that your db constraints will be properly enforced and therefore have to independently verify this aspect in every app that connects.
If you're evaluating databases based on their replication capabilities, don't stop with what MySQL has built-in. Take a look at http://www.percona.com/live/mysql-conference-2012/sessions/h... The advice boils down to "You probably want to ignore the built-in replication and use Galera".
PostgreSQL may not be better on all possible points of comparison.
PostgreSQL replcation alone isn't comparable to MySQL with Galera. I don't know enough about the various extensions to PostgreSQL to know which ones could give you
* Synchronous replication
* Active-active multi-master topology
* Read and write to any cluster node
* Automatic membership control, failed nodes drop from the cluster
* Automatic node joining
* True parallel replication, on row level
* Direct client connections
in a reliable and perfomant manner. Keep in mind I'm just atarting to play with Galera, I haven't used it in production yet, but it's making my planned architecure much simpler to manage than the traditional MySQL or PostgreSQL replication approaches.
Good mention. A small correction though: repmgr is not so much an replication option for postgres as well as a handy toolkit for using Postgres's own replication features in a more convenient way.
For write scalability there is the Postgres-XC project. I do not know if it is ready just yet to run in production but it is getting there soon otherwise.
It looks to me as if Postgres-XC is largely waiting for 9.2 to be released. It's basically PostgreSQL plus some patches that allow for write-scalable clusters. From the timing of their betas, it looks like they are largely following the 9.2 release schedule.
Because up until recently PostgreSQL replication didn't really exist, and you had to choose between 10 different trigger-be-using unscalable bag-on-the-side technologies to fill that void.
Still I know of at least one quite large website (one of the largest in Sweden) that use Slony despite the performance hit from trigger based replication.
Because MySQL's history is one of specializing in content management. In fact if you look at things this way, many of the MySQL gotchas which make DB people cringe are actually content management features. Yes, this includes data truncation.
When you move off the web site/content management side, PostgreSQL has long been the open source DB for complex business tools.
Betamax/VHS (look it up). MySQL had earlier presence, perhaps because it was easier to install? Hence people expected to find it as it was what they knew.
MySQL had earlier presence because it was the only available solution. PostgreSQL came much later. There was something called postgress95 which crashed on connection.
The same goes with PostgreSQL. Replication isn't a one-size-fits-all need, and companies like Affilias will find Slony much easier to work with than the built in PostgreSQL replication because Slony allows for seemless, zero-downtime upgrades, partial replication, and a whole host of other features. On the other hand, Slony allows for partial replication.....
You also have Bucardo which can do master-master replication between two nodes, and a few other solutions out there.
Slightly OT, but I'm going to reiterate a comment from the older submission, this is one of the best written comparisons of two similar technologies I've ever read. Generally you either get one-sided pieces or supposed "fair fight" pieces which slant the view towards whatever the author has chosen as the better solution for them. Those can be helpful, but something divorced from justifying a decision like this is amazing. So thanks to the writer(s)!
Just about every single claim provides a reference. For example, the sentence MySQL 5.1 natively supports 9 storage engines links directly to the MySQL documentation where they are listed.
Depending upon its definition, you can add a column to a PostgreSQL table without locking it.
Maybe they changed it in later versions of MySQL, but adding a column to a table become so lengthy for some of our projects that we switched for that reason alone.
Its one of the big reasons I am looking at switching. I have been dumping the table and reloading it in using a new name and the renaming the tables to get around this issue but its hardly ideal.
Its not that I do it often, its how long it takes. An alter on a 60 gig table takes hours, during which your table is locked which is totally unacceptable.
Seems the risks are such my approch works just as well,
"pt-online-schema-change modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."
With that being the case what im doing is fine I guess.
Because there is a disclaimer, you are disregarding a thoroughly tested product that solves your exact problem? One which is released by one of the premier MySQL consultancies with the author of "the" MySQL book as the designer?
You could make the same exact disclaimer about SQL statement and it would be true:
"INSERT modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."
"ALTER modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."
It's just good advice to test stuff before you use it in production. Discarding an awesome product simply because the authors (responsibly) mention that you should probably test it out seems overly paranoid and would severely limit your choices. Percona Toolkit is the most well known and reliable set of tools out there in the MySQL ecosystem.
I would say this site is too nice to mysql. For stored procedures, mysql had those later whereas postgresql got those right early on. Mysql stored procedures suck donkey balls compared to postgres. Postgres in general just feels better designed, from the beginning.
re: replication, slony is horrible yet they focus on that. The slony author says you can daisy chain things, but that's a setup nightmare. Also, slony's n^2 communication gives you consistency guarantees, something I'm pretty sure mysql can't do, but most people don't need that. I much prefer bucardo. It's simple, easier to configure, fewer guarantees, but replicates much faster. I just wouldn't run a bank on that. However, how many people design bank software.
I think it's a bad idea to evaluate by checklists of features you think you need. It's kind of like choosing where to live based on a checklist.
I started with MySQL, then used MySQL and PostgreSQL for a while. Then, when I started to do more "real" projects, I just got so frustrated with MySQL in so many ways at once. It wasn't that MySQL couldn't do it, it's that it was frustrating at every turn.
In my case, what caused me to drop MySQL almost entirely (around 2003 or 2004) was doing a few simple reports involving dates. Then I started using postgres and it was refreshingly consistent and flexible without so many caveats. And now I develop for postgresql, and the code is similarly consistent and flexible (and just all-around nice).
I have had a long string of positive experiences with postgres. It's hard to wrap them up into a feature checklist.
If something held you back from using postgres in the past, it's a good idea to watch the release notes to see if something new might solve that problem (or better yet, discuss on the lists so maybe it will be solved faster). But I tend to think that looking at long lists of features is a distraction.
Previous discussion: http://news.ycombinator.com/item?id=328257
reply