Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

		MySQL vs PostgreSQL (www.wikivs.com) similar stories update story
		160.0 points by llambda \| karma 58401 \| avg karma 18.99 2012-08-15 18:31:16+00:00 \| hide \| past \| favorite \| 107 comments

view as:

ken | karma 11187 | avg karma 3.69 2012-08-15 19:00:55+00:00 | [–] similar comments

Looks like the fragment ID was added to escape duplication detection.

Previous discussion: http://news.ycombinator.com/item?id=328257

lazyjones | karma 5054 | avg karma 1.69 2012-08-15 14:12:37 | [–] similar comments

They could add a check for identically rendered topic titles (including site name in parentheses) to avoid some of these.

kyrra | karma 9473 | avg karma 4.81 2012-08-15 19:38:37+00:00 | [–] similar comments

That won't work for pure javascript sites like Twitter.

lazyjones | karma 5054 | avg karma 1.69 2012-08-15 21:14:21+00:00 | [–] similar comments

I am referring to the title that is generated and displayed on HN for the link (I don't know if it's done through scraping). Twitter has proper title tags on status pages by the way (e.g. https://twitter.com/wikileaks/status/235818483588407296)

bartonfink | karma 1864 | avg karma 1.82 2012-08-15 17:17:01 | [–] similar comments

That title is supplied by the submitting user according to some informal recommendations. Every so often, there will be an argument over whether the submitted title is appropriate for the content. HN doesn't scrape anything for it.

Groxx | karma 17784 | avg karma 2.5 2012-08-15 19:28:12+00:00 | [–] similar comments

That's from 1405 days ago. Duplicates can be submitted in a significantly shorter time-span, IIRC.

kemiller | karma 2600 | avg karma 3.77 2012-08-15 19:28:52+00:00 | [–] similar comments

That was three years ago. The page has plenty of new information. Perhaps there should be a statute of limitations on the dupe check.

sophacles | karma 10424 | avg karma 3.11 2012-08-15 15:01:23 | [–] similar comments

There is. It isn't an official policy, but dupe check only compares to values in some cache somewhere. Duplicate articles are FINE on HN, given that after a time period the ability to comment is removed from an article (unless of course, the suggestion is that there is nothing valuable to discuss about something n days after a conversation starts, ever).

jfb | karma 7041 | avg karma 3.18 2012-08-15 14:41:10 | [–] similar comments

Reading this makes me glad I don't ever have to interact with MySQL. PostgreSQL is bad enough.

pygy_ | karma 6222 | avg karma 2.87 2012-08-15 19:51:32+00:00 | [–] similar comments

What is "bad enough" about Postgres?

jfb | karma 7041 | avg karma 3.18 2012-08-15 15:01:28 | [–] similar comments

It's SQL. Isn't that sufficient?

pygy_ | karma 6222 | avg karma 2.87 2012-08-15 20:07:27+00:00 | [–] similar comments

Indeed, it's a shame that QUEL died. I recently daydreamed about Teaquel, a CoffeeScriptesque take on SQL. I'm afraid it would have little chance to take off, though...

Anything else?

atsaloli | karma 831 | avg karma 1.16 2012-08-15 20:34:39+00:00 | [–] similar comments

htsql is a next-generation query language built on top of SQL - you can express in severals lines of htsql what would be a screenful of SQL. www.htsql.org

justinsb | karma 3831 | avg karma 4.26 2012-08-15 20:10:58+00:00 | [–] similar comments

Mangling the well-known quotation: SQL is the worst syntax, except all the others that have been tried.

What are you comparing it to?

pygy_ | karma 6222 | avg karma 2.87 2012-08-15 15:23:21 | [–] similar comments

QUEL was pretty clean. Far more consistent than SQL.

It died, mostly because SQL was used by Oracle and IBM DB2 which were better marketed than Ingres. The DB world ended up standardizing on SQL because of this.

Worse is better, once again.

http://en.wikipedia.org/wiki/QUEL_query_languages

justinsb | karma 3831 | avg karma 4.26 2012-08-15 23:48:37+00:00 | [–] similar comments

It is cleaner, although it doesn't feel _that_ different to SQL to my eye. I guess if we're now thinking about replacing SQL then I'd want to get a lot in return. (I do know that I'm sort of moving the goalposts here - thank you for contributing the QUEL info)

einhverfr | karma 5546 | avg karma 1.68 2012-08-15 20:27:28 | [–] similar comments

interestingly COPY survives in PostgreSQL though not quite in the same form as in Quel.

beagle3 | karma 16421 | avg karma 2.62 2012-08-15 17:30:07 | [–] similar comments

Kx system's kdb+/q query language.

It looks almost like SQL; the main philosophical differences:

a) it embraces order (that is, every query result has an implicit "running index" field (called i, starting at 0).

This single handedly solves a lot of inconsistencies in practical SQL having to do with order, which is abhorred by the relational data model, thus not a first class concept, but is often required in practice, and thus inconsistently bolted on.

b) columns can nest, and do not have to have the same type - which means that aggregations like count, sum, max, min and distinct are not special in any way.

c) there's a simple underlying programming language, so if you have an intermediate select result that you need twice, you just give it a name by prepending a "name: " to the select.

    temp: select from grades where age>10;

    b1: select from temp where eyecolor=`blue;
    b2: select from temp where eyecolor=`brown;

(compare to the mess that is correlated sub-queries, or alternatively, horrible "create temporary table x as ..."

justinsb | karma 3831 | avg karma 4.26 2012-08-15 18:42:57 | [–] similar comments

I definitely see why (a) is good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language; I think most databases do support the ROW_NUMBER() function now though. I think (c) has been introduced as Common Table Expressions in SQL.

I'm not sure I fully understand (b) - do you have a link so I can learn more?

As you can probably guess, I'm very interested in this stuff!

beagle3 | karma 16421 | avg karma 2.62 2012-08-16 11:44:47 | [–] similar comments

> good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language;

First of all, there's some tautology here - a "relational database" (and similarly, relational calculus, relational algebra, etc), BY DEFINITION deal with sets of tuples ("relations") which are (again, by definition) unordered. So let's just ignore the word "relational" in this discussion.

> I think most databases do support the ROW_NUMBER() function now though.

Many do. But do compare (sql 2005):

    SELECT * FROM
    ( SELECT
        ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS ROW_NUMBER,
        COLUMNS
      FROM tablename
    ) foo
    WHERE ROW_NUMBER <= 10

To (kdb+ / q):

    select from tablename orderby asc sort_key where i<10;

And usefulness of order goes why beyond time-series data (and sorting): let's say you have a tiered pricing scheme for widgets you sell:

    order_size | per_widget_cost
    ----------------------------
    1-9        | $2
    10-49      | $1.9
    50-199     | $1.8

Without embracing order, you:

a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)

b) not duplicate data, but have crazy subselects (among all with count > from_count, select the one with maximum to_count) or stuff like that.

When you actually have order, you have operator that embrace that order - e.g. kdb+/q's "bin" which finds the "bin" (as in "bucket", not as in binray) something fits in:

    select unit_price[from_count bin order_count] from table

There are many other use cases involving running sums (e.g, you have a table of weights and priority; select the list of highest priority items whose weight sums to 100lbs or less).

Order is really really missing in SQL, but it's one of those things people are not aware of because they've never used anything that does support it properly.

nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables). Many one-to-many relationships in sql that need additional tables can just be done within the same table in kdb+/q

regarding Common Table Expressions - I wasn't aware of them, they do help a lot. The syntax is horrible, but I guess they do work...

edit: more info on kdb+/q can be found in http://kx.com/q/d/kdb+1.htm - you have to get to section 8 before they start discussing the query language, but it's short and to the point. There's a lot more in http://kx.com/q/d/ if you are interested.

einhverfr | karma 5546 | avg karma 1.68 2012-08-17 11:00:19+00:00 | [–] similar comments

Without embracing order, you:

a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)

That's never how I have done it. Since this is a tiered pricing model, you just:

select * from widget_price where min_size < ? ORDER BY min_size desc limit 1;

We actually do something almost identical for sales tax rate changes in LedgerSMB. You look to the date of the transaction and take the most recent rate. No range types, etc.

Where range types are handy (and why I am looking forward to them in 9.2) is where you have to deal with things like a financial transaction that should be amortized over a period of time. This would allow you to adjust the transaction incrementally as a reporting, rather than an accounting, function. But I wouldn't use them for cases where you don't want overlaps or gaps. The best thing there is to just put in floor values and select the next highest floor.

There you go. Again order by and limit/offset do pretty much what you want without doing crazy inline views like you are doing in your example. BTW, I see inline views as an antipattern in SQL best avoided if you can.

nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables).

PostgreSQL supports this btw although back in 7.3 or so if I remember, I found that tables with tuples in columns were write-only but that was fixed pretty quickly (first with a "don't do that" check and then with a real fix).

The point of CTE's is to give you a stable intermediate result set.

beagle3 | karma 16421 | avg karma 2.62 2012-08-17 13:50:52+00:00 | [–] similar comments

.... and you've just embraced (in a limited inconsistent way). using "ORDER" is outside of the relational model, where order plays no part.

ORDER was there from the beginning, but CTEs, nested columns, window functions, connect-by recursive selects and similar stuff is being added because SQL and the relational model are actually quite limited when it comes to real world problems.

Of course, SQL will keep getting extended to solve real world problems; however, that does not mean SQL is "the best solution out there" (or "the worst solution except all others that were tried").

einhverfr | karma 5546 | avg karma 1.68 2012-08-17 10:25:44 | [–] similar comments

There are certain things I really like about SQL at least compared to other programming languages. I am not sure it is the best possible db query language. Personally I have always thought Quel was elegant. However compared to app languages, I would far rather spend my time debugging 200-1000 line SQL statements than 200-1000 line subroutines in Perl, Ruby, or C++.....

einhverfr | karma 5546 | avg karma 1.68 2012-08-18 01:30:21+00:00 | [–] similar comments

Regarding ordering. Relations are defined to not be in a meaningful order. This doesn't mean you couldn't define some derivative that is ordered or that the relations might not be ordered in some way that is ignored by the relational math.

Suppose we have a relation R. We may order this relation physically in order to help the computer retrieve data faster, clustering on an index for example. However, clustering on an index does not mean that we are guaranteed to get the same order back when we do a select * from.... (we probably will but we aren't guaranteed to). And if we add a join or other relational transformation, the order will probably not be the same. In other words, ordering is outside the scope of relational math per se.

However that doesn't mean you can't have pre-ordered relations. It just means the ordering is meaningless as far as the math goes. The ordering may however be of great practical importance as the computer goes about grabbing the relevant tuples from the relation.

Similarly I don't see a reason why ordering can't happen after the relational math is done either, in this case for humans.

What this tells me is that the relational model is not entirely complete in itself in that ordering is an orthogonal consideration largely ignored which means there are certain questions you cannot answer directly with relational math (such select from R a relation L such that it includes the tuples with the five highest values of R(2) lower than 25. I think that's mostly what you are getting at. But that's a matter of relational math being incomplete for real-world scenarios, not SQL (since SQL implementations do provide for ordering).

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:12:57+00:00 | [–] similar comments

Let's see in PostgreSQL 8.4 and higher...

a is accomplished by windowing functions. These can also do running totals among other things which is really helpful in accounting environments.

b has been supported since at least 8.3. I think a column can actually be an array of complex types in 8.4 and higher (it can be a tuple in 8.0-8.3 at least though I reported a bug in this in 7.3 which resulted in a "don't do that" check).

c is handled using common table expressions.

Examples for b and c:

CREATE TABLE foo (id int, value text);

CREATE TABLE bar (id int, values foo[]);

INSERT INTO bar (values) values ({row(1, 'test')}); -- not sure if this is quite the syntax. Might take some playing around with.

For C look at examples at http://ledgersmbdev.blogspot.com/2012/07/ctes-and-ledgersmb....

We use this extensively for things like relation to tree generation.

beagle3 | karma 16421 | avg karma 2.62 2012-08-16 17:40:55+00:00 | [–] similar comments

My point was (and still is) that standard SQL is horrible, and there are way better solutions.

Real world usage makes SQL vendors extend SQL to make it less sucky; some of these extensions were later encoded into standard, and some are still proprietary.

Windowing functions are nice and all, but are a complex solution to a problem that would hardly exist if you actually embraced order as fundamental.

Ok, how about the most useful kdb+ extension (which I forgot about earlier): foreign key chasing: if table t has field a which has a foreign key reference to table s (which has field b which has a foreign key reference to table r (which has field c which has a foreign key ...)

in kdb+, you do:

    select a.b.c from t

Does pgsql have something similar? Or do you have to spell out all the joins?

einhverfr | karma 5546 | avg karma 1.68 2012-08-17 15:19:17+00:00 | [–] similar comments

can foreign key chasing handle composite primary/foreign key joins?

You can build something to do this in PostgreSQL using stored procedures and a (a.b).c syntax but that's kind of advanced stuff. To do this you have to create a b function such that b(a) returns tuple of type b which has column or function c.

Example:

create table address (...)

create table employee (...., address_id);

create function address(employee) returns address as $$...$$;

select (employee.address).country from employee; will then return the country field from the address returned by address(employee).

So yeah, kinda, if you build your own.

edit: I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so. I don't know what the performance ramifications would be of throwing this into the column list.

beagle3 | karma 16421 | avg karma 2.62 2012-08-17 15:44:41+00:00 | [–] similar comments

> can foreign key chasing handle composite primary/foreign key joins?

Yes. The only requirement for foreign key chasing to work is that it uniquely identifies one record in the foreign table. Whether that key is atomic or composite is of no consequence.

(internally kdb+ stores a pointer to the foreign record when it verifies the existence of said record on insert, so it doesn't have to do a join query - it always knows exactly which record to bring in. So in practice, it is very efficient regardless of what kind of indexes you might have in place, the size or the composition of the foreign key field)

> So yeah, kinda, if you build your own. I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so.

pgsql is a wonderful beast. I really like it. And I would be even happier if they adopted some kdb+/q syntax and semantics, though I don't think that's likely to happen.

epo | karma 2535 | avg karma 2.77 2012-08-16 08:58:20+00:00 | [–] similar comments

SQL is fundamentally about sets, thinking of it in any other way provokes unhappiness. Your differences are set-subverting add-ons which could be accomplished by any higher level front end to SQL.

SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway. These are the sort of people who use PHP and MySQL.

jfb | karma 7041 | avg karma 3.18 2012-08-16 16:13:25+00:00 | [–] similar comments

Except that SQL's fundamental abstraction is the bag, not the set.

beagle3 | karma 16421 | avg karma 2.62 2012-08-16 17:08:22+00:00 | [–] similar comments

> SQL is fundamentally about sets,

Nope. "Relational Algebra" / "Relational Calculus" / "The Relational Model" is about sets.

SQL is about bags (orderless like sets, but each item might be repeated multiple times). It's also about order ("ORDER BY" clause) in a horrible inconsistent way.

> SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway

No, SQL is not comparable to assembly in any meaningful way (you could replace "assembly language" with "danish" in your statement and would be equally true)

While assembly language is more verbose, it is more fundamental than everything else in the sense that eventually everything must be expressed in assembly language (machine code, actually, which is equivalent to a proper subset of assembly language) to be executed. Thus, going down to assembly language might be more up-front work, but it is guaranteed that you can match or improve on run-time results from any other language.

SQL is an inconsistent abstraction that makes some things simple, some things hard, and some things essentially impossible -- and many of the things it does do, it does in a way that's inherently inefficient. (And don't tell me about the possible smart query optimizer - it doesn't really exist any more than Intel's Itanium optimizer that makes code properly utilize the VLIW; or a Unicorn).

edit: add the note about machine code.

jfb | karma 7041 | avg karma 3.18 2012-08-15 20:35:00 | [–] similar comments

Tutorial D, for one. Or any language that isn't as hideously broken as SQL.

  * why bags?
  * where's my closure under composition?
  * THAT SYNTAX OH GOD THAT SYNTAX

I'd be much happier with a more mathematical language, rather than the godawful "english-lite" 3GL misery

einhverfr | karma 5546 | avg karma 1.68 2012-08-17 20:43:52 | [–] similar comments

By the way, the fundamental concern I have with SQL is the issue of ambiguity regarding NULLs. We are taught that NULL means one of two things, but really it means one of three:

    * Unknown value
    * Not applicable value
    * Value does not exist

This is a big issue, because you would expect operators to treat these cases differently. known || unknown is obviously unknown, but known || not_applicable should probably be known, and known || does_not_exist should be known. In sane RDBMS's there is a possibility of magic values which provides a sane way to handle not_applicable (for example an empty string as distinct from NULL and yes I am calling into question the sanity of Oracle). However, you still have the fact that the first and third cases are ambiguous although you hope not in any given query (the third case implies a missing value from an outer join), the ambiguity could in fact happen.

This is a fundamentally broken aspect of SQL. The problem with ambiguity is that if your data is ambiguous mathematically, then it cannot be reliably transformed using math.

mattparlane | karma 359 | avg karma 3.63 2012-08-15 20:45:15+00:00 | [–] similar comments

You're getting downvoted, but I'll reply anyway -- I agree completely.

If you're typing raw SQL for getting reports out of a database then you're probably fine, but for web apps you're not typing queries, you're constructing them as strings using another language.

I've always hated the idea of writing one language in another, it feels like a giant eval() in JavaScript/PHP/etc. Not to mention it opens you up to injection attacks.

I like programatic access like MongoDB has, it certainly has its downsides but I prefer talking to a database via an API.

reinhardt | karma 3622 | avg karma 3.68 2012-08-15 21:02:06+00:00 | [–] similar comments

You sound like you've either never heard of ORMs and related SQL generation toolkits, or chose to dismiss them without even a mention. Which is it?

mattparlane | karma 359 | avg karma 3.63 2012-08-15 21:53:29+00:00 | [–] similar comments

But if I need an abstraction to talk to my database, why not choose a database that requires no abstraction?

I've used ORMs and I've had to write my own once or twice, I really don't like them.

michaelt | karma 31037 | avg karma 4.1 2012-08-16 09:41:13+00:00 | [–] similar comments

Surely any time you have a data store that can work with multiple programming languages and platforms you're going to end up with an abstraction layer because of different programming languages' data representation?

Even for things as simple as integers, do you have unlimited precision, unsigned value support and null values?

Sanddancer | karma 5346 | avg karma 2.91 2012-08-15 16:48:12 | [–] similar comments

Opening you up for injection attacks? Only if you're using a language that doesn't support parametrized queries/prepared statements. Add ORMs to that, and you are using an API to talk to your database.

einhverfr | karma 5546 | avg karma 1.68 2012-08-15 20:45:59 | [–] similar comments

There are certain cases I run into where PostgreSQL doesn't support parameterization. For example, utility statements in PostgreSQL have no associated plan, so they have no parameters. If you want to run a utility statement (like CREATE ROLE or DROP ROLE) you have to do so via string concatenation. This is true even in stored procedures which gives you the uncomfortable possibility of SQL injection occurring inside a stored procedure already running with elevated permissions.

einhverfr | karma 5546 | avg karma 1.68 2012-08-15 20:21:41 | [–] similar comments

This is actually one area where PostgreSQL really shines. With LedgerSMB we define our interfaces in the db (as stored procs) and then have a very simple query mapper function which looks up the stored procedure in the system catalogs and then figures out the arguments. We have a second function which then generates a query based on supplied args and runs it.

No other app code since we started this (at least code in the new framework) includes any SQL. All the SQL stuff is done by one simple function. The real programming is in the database for this interface. Our approach isn't fully developed. I expect we will be working on an object-oriented interface inside PostgreSQL soon which will make the queries look like:

SELECT (f).* FROM (select entity(?, ?, ?, ?, ?).save) f;

save(entity) will then handle actually saving the data.

fleitz | karma 12602 | avg karma 3.0 2012-08-15 20:23:20+00:00 | [–] similar comments

Call me when MySQL has a working insert statement.

One that doesn't randomly truncate your data, or allow insertion of nulls into not null columns.

ceejayoz | karma 74610 | avg karma 3.33 2012-08-15 20:38:54+00:00 | [–] similar comments

You mean like:

  SET GLOBAL sql_mode='TRADITIONAL';

?

twerquie | karma 588 | avg karma 2.94 2012-08-15 21:23:48+00:00 | [–] similar comments

That's like selling aftermarket airbags for your car.

ceejayoz | karma 74610 | avg karma 3.33 2012-08-15 21:42:13+00:00 | [–] similar comments

It's no more aftermarket than InnoDB support is.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:14:44+00:00 | [–] similar comments

But individual apps can change that on the session level, right?

So still no guarantees and you more or less have to audit every app connecting to your db to make sure it isn't tampering with the sql_mode if you value data integrity.

ceejayoz | karma 74610 | avg karma 3.33 2012-08-16 02:00:17+00:00 | [–] similar comments

Individual apps can do far more destructive stupid things than changing sql_mode.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 02:32:19+00:00 | [–] similar comments

The point though is that you lose a set of guarantees with MySQL that you don't lose with PostgreSQL. Yeah, computers make it possible to make more mistakes faster than any invention since handguns and tequila but when you lose the ability to guarantee that your declarative constraints are actually followed, you lose the ability to prove that certain types of errors are not being made. That's a big deal.

Roboprog | karma 2235 | avg karma 1.06 2012-08-15 20:45:42+00:00 | [–] similar comments

I think ceejayoz took you a little too literally. MySQL continues to scare the hell out of me. It was bad enough before Oracle go a hold of it: the designers seemed completely unconcerned with data integrity. It has many patches to deal with individual issues, but when the creators just fundamentally don't care???

The GLP-not-LGPL license sounds like a booby trap gladly left in place by Oracle from before they got it. I can't see Oracle wanting to do anything other than make it light and fast (at expense of correctness), either, since if you want "data integrity", they have a solution for you.

swah | karma 9711 | avg karma 2.14 2012-08-15 17:01:39 | [–] similar comments

But OTOH it powers gigantic websites a la Facebook. Can it really be that bad fundamentally?

scott_w | karma 2365 | avg karma 2.24 2012-08-15 17:32:27 | [–] similar comments

Facebook also uses PHP...

In all seriousness, Facebook's requirements are for high availability, with data integrity only mattering to the point at which it affects the usability of Facebook.

In other words, Facebook can tolerate a high degree of shit in the system before it becomes a problem. That isn't bad in and of itself, but you need to investigate your own needs before using something because Facebook/Google use it.

taligent | karma 3565 | avg karma 2.11 2012-08-15 23:28:13+00:00 | [–] similar comments

So does Twitter, Flickr, Yahoo, Wikipedia, Craigslist, LinkedIn etc.

And I would imagine that most of those wouldn't tolerate any issues with data integrity.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:32:21+00:00 | [–] similar comments

What is it powering though?

The way I look at it is this:

MySQL is at its roots an SQL-like database specializing in content management. The sorts of data integrity problems that can occur in MySQL are entirely tolerable in a content management environment for the most part. It was designed to be a fast backend for web sites with an SQL interface. A lot of the issues it has are entirely due to that legacy, but those issues don't matter at all when you are using MySQL for content management.

Every single one of those examples are probably using MySQL for some sort of content management.

So it isn't just whether they can tolerate some crap in their systems. It is what sort of crap they can tolerate. If the MySQL gotchas don't result in intolerable crap, then it doesn't matter.

Now, personally I wouldn't run an accounting system on MySQL particularly if I was expecting many apps to access the same db. This is because the sorts of issues that MySQL has are real show-stoppers in these environments. But content management? Why not? Heck many of these data integrity problems may be features in these environments.

For example, suppose MySQL@Twitter truncates your tweets to the maximum length silently without issuing an error. Bug or feature? Suppose it truncates numbers in your accounting software? Those are two completely different cases and while they may in theory be comparable, in practice they are not.

taligent | karma 3565 | avg karma 2.11 2012-08-15 20:47:09 | [–] similar comments

It is powering the core function of the sites i.e. it is the primary databases in all cases.

I am sorry but are you that deluded as to think so many of the world's leading IT companies are going to choose a database that silently loses data ? Do you really think they are that stupid ? I mean come on.

einhverfr | karma 5546 | avg karma 1.68 2012-08-15 21:28:32 | [–] similar comments

Are you sure? Do we have evidence that the accounting end of the advertisement network, etc. is handled on MySQL? Or is it the backend for the public side only?

Let me give you an example. I have a customer that uses MySQL for some processing on their web site and all data gets batched up nightly and entered into the PostgreSQL-based accounting system. The MySQL db is accessed only by the app which does the processing and the connectors to the PostgreSQL db. The PostgreSQL db has a lot more logic in it and is a lot more complex.

In their case, their environment is not subject to any of the MySQL gotchas. They can make sure the credit card processing software runs in an acceptable SQL mode, and if something goes wrong between the PostgreSQL export of reports and the MySQL public side (which has happened for reasons other than MySQL's fault), they can track down and fix the problem. That's what loosely coupled systems are about.

IT in this case is about risk management. How can you guarantee that specific important data will not be lost. You can do this with MySQL for some set of environments (either because the data truncation isn't a big deal, regarding your tweets, or because it can be retrieved from another source, or because there is only one app accessing the database so you don't have to worry about apps turning off strict mode), but you can do it with PostgreSQL for a much larger set of environments and that's what the complaint is.

wisty | karma 8961 | avg karma 3.07 2012-08-16 08:07:54 | [–] similar comments

http://en.wikipedia.org/wiki/MyISAM

> MyISAM was the default storage engine for the MySQL relational database management system versions prior to 5.5

This is what the anti-MySQL crowd is talking about - the dark days when MySQL lacked transactions, referential integrity, and concurrency. It was basically the SQLite of its day.

There's nothing wrong with all that if you just want a key-value store, which is what most web apps are.

And it generally doesn't "silently lose data" unless you are touching that data. It tends to just silently do stuff that doesn't quite make sense. Like if you post in a thread, and your "post count" goes up but your post itself fails, it's not a huge problem. Unlike in an accounting app, where you really don't want a transaction to partially work (e.g. money falling through the cracks).

When MySQL broke on a website, I think most people just assumed it was internet gremlins.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 09:57:32 | [–] similar comments

It's not just MyISAM though. The fact is that these things actually make sense when you think about MySQL as being a content-management backend for the web. The problem actually has to do with SQL modes, the fact that MySQL will silently substitute table backends for you, and much more. Again in some environments this is ok, and if you have total control and tight control over a small number (preferably only 1) of applications hitting your db it isn't the end of the world.

But PostgreSQL folks look at MySQL from a different perspective. It's a db-centric rather than app-centric perspective. In this perspective your db needs to guarantee that declarative constraints will be followed. In this perspective the db is the center of the environment, not the bottom tier of the app stack. In this perspective you could potentially have dozens of apps using a single db.

It isn't a matter of MySQL having grown up a bit (and it has). It is a matter of it not having outgrown the content management and/or single app per db environment in which it arose.

tedunangst | karma 26000 | avg karma 2.74 2012-08-16 01:38:28+00:00 | [–] similar comments

How much do you think Craigslist really cares if your ad to sell a futon goes missing at 2am? They'll apologize and help you create a new ad. They're not going to migrate databases over it.

For every one of those sites, I suspect the database integrity plan is "meh, restore from a backup."

taligent | karma 3565 | avg karma 2.11 2012-08-15 20:50:42 | [–] similar comments

So what you're saying is that all of the engineers at DBAs are sitting around going "Oh we just lost another user's Facebook page. No worries. They can just create another one." Do you not think they would've switched databases already if the couldn't fulfil the primary purpose of their business i.e. managing data.

tedunangst | karma 26000 | avg karma 2.74 2012-08-16 03:11:41+00:00 | [–] similar comments

That's what replication is for. :) Really, mysql makes a fine key value store. "Mostly ACID" is good enough for lots of purposes, but what people complain about is its "serious business" sql failures.

Look, I used to have to support mysql. Every couple months we had to increment our minimum required version because we found yet another query that didn't work right.

Aloisius | karma 5149 | avg karma 3.84 2012-08-16 06:01:32+00:00 | [–] similar comments

LinkedIn uses mostly Oracle with a few MySQL databases scattered about plus a couple large custom systems.

brown9-2 | karma 15736 | avg karma 3.87 2012-08-15 20:49:46 | [–] similar comments

Facebook uses MySQL as a glorified key-value store, not as a traditional RDBMS.

spudlyo | karma 4900 | avg karma 5.4 2012-08-16 02:54:28+00:00 | [–] similar comments

So I voted you down because you're wrong.

http://www.youtube.com/watch?v=Zofzid6xIZ4#t=04m10s

Click on that link and listen to Mark Callaghan, an engineer at Facebook, talk about their query workload. From a slide:

    The Workload
    
    - OLTP
    - Fast Queries
    - Simple Joins
    - Fast Transactions
    - Secondary indexes critical to performance
    - Some rows very hot

brown9-2 | karma 15736 | avg karma 3.87 2012-08-16 14:21:38+00:00 | [–] similar comments

Thanks for the link, the video looks interesting. I was basing my claims on this talk (also from 2010) http://www.infoq.com/presentations/Scale-at-Facebook. Or it might have been this other talk: http://www.infoq.com/presentations/Facebook-Software-Stack

spudlyo | karma 4900 | avg karma 5.4 2012-08-16 17:42:49+00:00 | [–] similar comments

That's really interesting. The two folks from Facebook contradict each other when talking about joins. Aditya says "No joins in production.", while Mark says "Some of the stories I read about sharded SQL state that you don't do joins when you have sharded MySQL, and that has never been the case for the workloads that I watch. You're always doing some complex query processing within a shard."

Anyway, I can see how you might have come away with the notion that MySQL is a glorified key value store by watching the first video which only briefly touches on their MySQL usage.

epo | karma 2535 | avg karma 2.77 2012-08-16 08:31:51+00:00 | [–] similar comments

OK, serious question. But what do Facebook use for storing their enterprise data, you know the stuff they actually care about? Is that MySQL?

spudlyo | karma 4900 | avg karma 5.4 2012-08-15 18:23:32 | [–] similar comments

I don't like Oracle either, but you're coming dangerously close to sounding like a conspiracy theorist. The GPL-not-LGPL issue with the client library hasn't been a problem for years now. The attentive have been linking their non-free mysql-be-using software with libdrizzle, which is a complete, clean-room implemented, Berkeley licensed MySQL client library. Before that they were using the public domain version that MySQL-AB released before Monty decided to get sneaky with the GPL.

I believe that MySQL has become better since Oracle got a hold of it, their stewardship I consider to be much better than Sun's. Since acquiring MySQL, Oracle has put out an extremely solid release, MySQL 5.5, and are making steady progress on 5.6. If you lived through the early releases of MySQL 5.1 you have an idea of what a botched MySQL release is like.

I don't believe you do anyone on HN a service by spreading your gut feeling FUD about Oracle and MySQL.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:05:48+00:00 | [–] similar comments

Yeah, I have also had multi-row inserts deadlock against themselves in MySQL.

However, you need to reword that slightly. MySQL does have such a working insert statement if you set strict mode. The problem is that apps can unset strict mode. Until that changes.....

So really you should word it as:

"One that can be guaranteed not to randomly truncate your data, or allow insertion of nulls into not null columns."

MySQL inserts in strict mode don't do these things. MySQL inserts cannot be guaranteed not to do these things however. Therefore this relegates MySQL, in my view, to a one-app-per-db environment because you cannot prove that your db constraints will be properly enforced and therefore have to independently verify this aspect in every app that connects.

sciurus | karma 16280 | avg karma 4.91 2012-08-15 15:32:06 | [–] similar comments

If you're evaluating databases based on their replication capabilities, don't stop with what MySQL has built-in. Take a look at http://www.percona.com/live/mysql-conference-2012/sessions/h... The advice boils down to "You probably want to ignore the built-in replication and use Galera".

You can get Galera integrated with an otherwise vanilla MySQL from http://codership.com/products/mysql_galera or integrated with Percona's XtraDB fron http://www.percona.com/software/percona-xtradb-cluster/

briandear | karma 2865 | avg karma 0.45 2012-08-15 20:50:44+00:00 | [–] similar comments

Why bother? PostgreSQL is better.

sciurus | karma 16280 | avg karma 4.91 2012-08-15 16:28:54 | [–] similar comments

PostgreSQL may not be better on all possible points of comparison.

PostgreSQL replcation alone isn't comparable to MySQL with Galera. I don't know enough about the various extensions to PostgreSQL to know which ones could give you

* Synchronous replication * Active-active multi-master topology * Read and write to any cluster node * Automatic membership control, failed nodes drop from the cluster * Automatic node joining * True parallel replication, on row level * Direct client connections

in a reliable and perfomant manner. Keep in mind I'm just atarting to play with Galera, I haven't used it in production yet, but it's making my planned architecure much simpler to manage than the traditional MySQL or PostgreSQL replication approaches.

baghali | karma 145 | avg karma 2.74 2012-08-15 19:16:47 | [–] similar comments

repmgr[1] is a very good replication option for PostgreSQL

[1]https://github.com/2ndQuadrant/repmgr

jvdongen | karma 215 | avg karma 2.65 2012-08-16 09:03:52+00:00 | [–] similar comments

Good mention. A small correction though: repmgr is not so much an replication option for postgres as well as a handy toolkit for using Postgres's own replication features in a more convenient way.

jeltz | karma 6278 | avg karma 2.2 2012-08-16 09:18:48+00:00 | [–] similar comments

For write scalability there is the Postgres-XC project. I do not know if it is ready just yet to run in production but it is getting there soon otherwise.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 15:26:24+00:00 | [–] similar comments

It looks to me as if Postgres-XC is largely waiting for 9.2 to be released. It's basically PostgreSQL plus some patches that allow for write-scalable clusters. From the timing of their betas, it looks like they are largely following the 9.2 release schedule.

taligent | karma 3565 | avg karma 2.11 2012-08-15 23:29:08+00:00 | [–] similar comments

But yet every major website chooses MySQL over PostgreSQL.

Why is that ?

spudlyo | karma 4900 | avg karma 5.4 2012-08-15 18:33:16 | [–] similar comments

Because up until recently PostgreSQL replication didn't really exist, and you had to choose between 10 different trigger-be-using unscalable bag-on-the-side technologies to fill that void.

jeltz | karma 6278 | avg karma 2.2 2012-08-16 08:07:41+00:00 | [–] similar comments

Still I know of at least one quite large website (one of the largest in Sweden) that use Slony despite the performance hit from trigger based replication.

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:40:50+00:00 | [–] similar comments

Because MySQL's history is one of specializing in content management. In fact if you look at things this way, many of the MySQL gotchas which make DB people cringe are actually content management features. Yes, this includes data truncation.

When you move off the web site/content management side, PostgreSQL has long been the open source DB for complex business tools.

epo | karma 2535 | avg karma 2.77 2012-08-16 03:29:55 | [–] similar comments

Betamax/VHS (look it up). MySQL had earlier presence, perhaps because it was easier to install? Hence people expected to find it as it was what they knew.

z92 | karma 1606 | avg karma 3.14 2012-08-16 12:21:38+00:00 | [–] similar comments

MySQL had earlier presence because it was the only available solution. PostgreSQL came much later. There was something called postgress95 which crashed on connection.

davidw | karma 63893 | avg karma 3.82 2012-08-16 10:52:03+00:00 | [–] similar comments

Don't Skype and Instagram use Postgres?

http://en.wikipedia.org/wiki/PostgreSQL#Prominent_users

einhverfr | karma 5546 | avg karma 1.68 2012-08-16 01:38:07+00:00 | [–] similar comments

The same goes with PostgreSQL. Replication isn't a one-size-fits-all need, and companies like Affilias will find Slony much easier to work with than the built in PostgreSQL replication because Slony allows for seemless, zero-downtime upgrades, partial replication, and a whole host of other features. On the other hand, Slony allows for partial replication.....

You also have Bucardo which can do master-master replication between two nodes, and a few other solutions out there.

zheng | karma 939 | avg karma 3.4 2012-08-15 20:52:24+00:00 | [–] similar comments

Slightly OT, but I'm going to reiterate a comment from the older submission, this is one of the best written comparisons of two similar technologies I've ever read. Generally you either get one-sided pieces or supposed "fair fight" pieces which slant the view towards whatever the author has chosen as the better solution for them. Those can be helpful, but something divorced from justifying a decision like this is amazing. So thanks to the writer(s)!

3amOpsGuy | karma 535 | avg karma 2.45 2012-08-15 21:16:28+00:00 | [–] similar comments

I only skimmed the page, but i didn't find any references or stats to back up any claims in the text?

nl | karma 29762 | avg karma 2.49 2012-08-15 21:23:07 | [–] similar comments

Really?

Just about every single claim provides a reference. For example, the sentence MySQL 5.1 natively supports 9 storage engines links directly to the MySQL documentation where they are listed.

dredmorbius | karma 63069 | avg karma 2.05 2012-08-15 21:41:46+00:00 | [–] similar comments

And the whole wiki (hit the homepage) is comparisons of things. Some are technologies, but it's more than just that.

Pretty slick.

AaronBBrown | karma 329 | avg karma 2.22 2012-08-15 16:59:03 | [–] similar comments

Oh bloody hell, yet another PostgreSQL vs MySQL battle ensues.

sbov | karma 5450 | avg karma 3.65 2012-08-15 23:25:41+00:00 | [–] similar comments

Depending upon its definition, you can add a column to a PostgreSQL table without locking it.

Maybe they changed it in later versions of MySQL, but adding a column to a table become so lengthy for some of our projects that we switched for that reason alone.

boyter | karma 7913 | avg karma 5.65 2012-08-15 23:44:25+00:00 | [–] similar comments

Its one of the big reasons I am looking at switching. I have been dumping the table and reloading it in using a new name and the renaming the tables to get around this issue but its hardly ideal.

GrumpySimon | karma 141 | avg karma 2.94 2012-08-16 01:12:40+00:00 | [–] similar comments

What are you doing that you need to alter your schema so often?!

boyter | karma 7913 | avg karma 5.65 2012-08-16 02:24:17+00:00 | [–] similar comments

Its not that I do it often, its how long it takes. An alter on a 60 gig table takes hours, during which your table is locked which is totally unacceptable.

einhverfr | karma 5546 | avg karma 1.68 2012-08-15 20:16:05 | [–] similar comments

What's funny is I used to use MySQL for db prototyping because altering schemas on PostgreSQL was such a pain (prior to 7.3). Now it is much better.

AaronBBrown | karma 329 | avg karma 2.22 2012-08-16 20:00:51+00:00 | [–] similar comments

pt-online-schema-change should resolve that issue for you: http://www.percona.com/doc/percona-toolkit/2.1/pt-online-sch...

boyter | karma 7913 | avg karma 5.65 2012-08-16 22:19:06+00:00 | [–] similar comments

Seems the risks are such my approch works just as well,

"pt-online-schema-change modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."

With that being the case what im doing is fine I guess.

AaronBBrown | karma 329 | avg karma 2.22 2012-08-17 20:36:27 | [–] similar comments

Because there is a disclaimer, you are disregarding a thoroughly tested product that solves your exact problem? One which is released by one of the premier MySQL consultancies with the author of "the" MySQL book as the designer?

boyter | karma 7913 | avg karma 5.65 2012-08-20 04:41:35 | [–] similar comments

I don't pay for support. So yes.

AaronBBrown | karma 329 | avg karma 2.22 2012-08-20 11:18:20+00:00 | [–] similar comments

You could make the same exact disclaimer about SQL statement and it would be true:

"INSERT modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."

"ALTER modifies data and structures. You should be careful with it, and test it before using it in production. You should also ensure that you have recoverable backups before using this tool."

It's just good advice to test stuff before you use it in production. Discarding an awesome product simply because the authors (responsibly) mention that you should probably test it out seems overly paranoid and would severely limit your choices. Percona Toolkit is the most well known and reliable set of tools out there in the MySQL ecosystem.

therico | karma 4 | avg karma 2.0 2012-08-16 12:35:03+00:00 | [–] similar comments

It's a pity your team didn't come across pt-online-schema-change. It's been extremely useful for our large migrations.

magoon | karma 592 | avg karma 2.11 2012-08-15 19:01:38 | [–] similar comments

Xbox vs Playstation...

billsix | karma 574 | avg karma 2.29 2012-08-16 05:00:10+00:00 | [–] similar comments

Just pipe your data /dev/null to get that web-scale

http://www.youtube.com/watch?v=b2F-DItXtZs

rburhum | karma 2570 | avg karma 5.02 2012-08-16 00:51:06 | [–] similar comments

A huge advantage of PostgreSQL over MySQL is PostGIS (the geospatial extension of PostGIS). I can't believe they don't even mention this.

fkdjs | karma 110 | avg karma 1.86 2012-08-16 13:21:12+00:00 | [–] similar comments

I would say this site is too nice to mysql. For stored procedures, mysql had those later whereas postgresql got those right early on. Mysql stored procedures suck donkey balls compared to postgres. Postgres in general just feels better designed, from the beginning.

re: replication, slony is horrible yet they focus on that. The slony author says you can daisy chain things, but that's a setup nightmare. Also, slony's n^2 communication gives you consistency guarantees, something I'm pretty sure mysql can't do, but most people don't need that. I much prefer bucardo. It's simple, easier to configure, fewer guarantees, but replicates much faster. I just wouldn't run a bank on that. However, how many people design bank software.

dlikhten | karma 1143 | avg karma 3.34 2012-08-16 13:23:50+00:00 | [–] similar comments

I read that entire site as:

PostgresSQL has feature X, Y, Z

MySQL had X, recently introduced Y, and does not always have Z.

It's always MySQL catching up to postgres. And lots of it's db engines are not ACID. Its not the DB I thought it was.

jeffdavis | karma 7244 | avg karma 2.82 2012-08-16 18:03:01+00:00 | [–] similar comments

I think it's a bad idea to evaluate by checklists of features you think you need. It's kind of like choosing where to live based on a checklist.

I started with MySQL, then used MySQL and PostgreSQL for a while. Then, when I started to do more "real" projects, I just got so frustrated with MySQL in so many ways at once. It wasn't that MySQL couldn't do it, it's that it was frustrating at every turn.

In my case, what caused me to drop MySQL almost entirely (around 2003 or 2004) was doing a few simple reports involving dates. Then I started using postgres and it was refreshingly consistent and flexible without so many caveats. And now I develop for postgresql, and the code is similarly consistent and flexible (and just all-around nice).

I have had a long string of positive experiences with postgres. It's hard to wrap them up into a feature checklist.

If something held you back from using postgres in the past, it's a good idea to watch the release notes to see if something new might solve that problem (or better yet, discuss on the lists so maybe it will be solved faster). But I tend to think that looking at long lists of features is a distraction.

Legal | privacy