Hacker News new | past | comments | ask | show | jobs | submit login
MySQL - Do Not Pass This Way Again (grimoire.ca)
224 points by craigkerstiens on Jan 26, 2013 | hide | past | favorite | 156 comments



I'm really not sure what to think of that article. On one hand side, I definitely agree with it and I've experienced many issues with MySQL.

On the other, there are so many... strange points, it's hard for me to trust the author about the parts that are new to me. Things I've found weird so far are:

- "my favourite example being a fat-fingered UPDATE query where a mistyped = (as -, off by a single key) caused 90% of the rows in the table to be affected," - if I ever run "rm -rf . /" fat-fingering the space, I'm going to blame myself only - not fileutils or bash - this has nothing to do with the database

- (about backups) "Unless you meticulously lock tables or make the database read-only for the duration," - this is not trivial, but logging onto slave and doing "FLUSH TABLES WITH READ LOCK, sync, snapshot, UNLOCK TABLES" is not rocket science either. And it's well documented on their "backup methods" page.

- "It's unrealistic to expect every single user to run SHOW CREATE TABLE before every single query, or to memorize the types of every column in your schema, though." - ... yeah... we shouldn't ask them to remember the syntax either - just keep guessing until you get everything right ;)

- "Foreign keys are ignored if you spell them certain, common, ways" - another case of "I want to use the wrong syntax, but still get the right answer"

I really wish he limited himself to hard facts - the main idea of the article wouldn't suffer at all. There are enough things to hate in MySQL without going into the subjective and "inconvenient, but still ok" parts.


"Foreign keys are ignored if you spell them certain, common, ways" - another case of "I want to use the wrong syntax, but still get the right answer"

Both forms of declaring foreign keys are SQL-standard compliant syntax. That MySQL silently ignores the "inline" version is the "wrong" here, not the syntax.

From MySQL's own docs:

"Furthermore, InnoDB does not recognize or support “inline REFERENCES specifications” (as defined in the SQL standard) where the references are defined as part of the column specification. InnoDB accepts REFERENCES clauses only when specified as part of a separate FOREIGN KEY specification. For other storage engines, MySQL Server parses and ignores foreign key specifications." [1]

[1] http://dev.mysql.com/doc/refman/5.5/en/create-table.html


I understand that and don't have issue with what he most likely meant - only what he wrote.

In short - MySQL does not always comply with the SQL standard. That's unfortunate / silly / annoying / inconsistent. I completely agree.

But I don't agree with "Foreign keys are ignored if you spell them certain, common, ways". This is not very specific. Common to what situation? People being used to it in another project? It's a documented behaviour. He seems to have a problem with the effect, not the cause. (or maybe I just read it that way...)


I can understand MySQL not being 100% compliant with the SQL standard. What I have trouble understanding is why it can't be bothered to throw an error. Imagine if your programming language silently ignored some constructs? But it seems to correlate with MySQL's philosophy of being good at silently ignoring problems :(


> It's a documented behaviour.

In other contexts, these are sometimes called "Known Bugs".


If it doesn't implement part of the SQL spec that is a shame but why if there is part of the statement that isn't understood or processed there is no excuse for not raising an error. That is unacceptable. Documenting it just isn't enough.


I did some brief searching, but didn't come up with anything. Certainly for such a problematic case there is at least a bug report filed requesting that an error is raised?


Interestingly, it's not a syntax error.


"I'm going to blame myself only - not fileutils or bash - this has nothing to do with the database"

I disagree.

  UPDATE MyTab SET column = 'value' WHERE otherColumn = 1
is a valid SQL statement, while

   UPDATE MyTab SET column = 'value' WHERE otherColumn - 1
is not and should throw a syntax error and not perform a mass update due to some bullshit auto conversion.

What would rile me most is that even if you're a very careful person and do a

BEGIN TRAN

before going for the update (so you can ROLLBACK if in doubt), you may not notice that something went wrong and COMMIT anyway.

A database should always be fail safe; period.


This is actually a pretty good summary of the whole article. The main point is that mysql works very well, by slightly modifying what you meant into valid statements and doing the best it can. And because what it does depends on the (hidden) type of the data as well as what you actually ask it to do ... it sometimes does things you didn't intend.

It reads a lot like a static typing versus dynamic typing debate.

It's easy to find yourself on either side of this debate. If I program in javascript for a few hours I'm ready to attack any dynamic typing proponent with an axe. If I attempt to write something bigger in Haskell (or try to make something run faster), I am starting to wonder if being bludgeoned to death with an axe is too easy a death. If I try to program anyting in Coq it gets a lot worse.


This is not dynamic vs static. This is automatic type coercion vs explicit type coercion. This is not the same thing. Weak and strong typing are completely orthogonal to if the language is static or dynamic. Weak typing mean values will be automatically coerced into something it isn't, whereas dynamic typing only mean that you will get your error during runtime, not at compile time.

Javascript is dynamic and weakly typed. C is static and weakly typed.


I think waps was talking about the general continuum of how systems behave towards software engineers.

At one end are systems which are demanding refuseniks, at the other are loosey-goosey, anything-goes systems. It is human nature to find the loosey-goosey system more attractive because it gives immediate positive feedback ("it works!") and conceals negative feedback ("what the hell does that error mean?").

The thing is that RDBMSes are ostensibly meant to be on a stricter end of that continuum. But MySQL directly subverts that presumption, which is the source of a lot of heartache in the longer run.


"I want to use the wrong syntax, but still get the right answer"

The point is that wrong syntax should error instead of letting the user think it worked. Problems like this means the user has no idea if the FK was setup without testing it. IMHO, there is nothing okay with silently failing.


It's not the wrong syntax, but you're right that it should error if MySQL isn't going to bother supporting it. Silent failures are just plain failures.


Good point. It's not wrong to the sql standard, just wrong to mysql. That makes the silent fail even more brain dead.


As the nice man said:

> MySQL is riddled with these sorts of surprises, and apologists lean very heavily on the "that's documented" excuse for its bad behaviour.

For a database, defaulting to safe behaviour should be the default, not something you need to look up.


The first thing you found wierd relies on selective quotation, you omitted "because of implicit string to integer conversion". The second is an apologist "but its well documented" defence. Your third point is changing the subject, he was criticising implicit type conversion, you again igore that. The fourth point is not about wanting to use the wrong syntax it is about MySQL acepting valid syntax and ignoring it.

So your whole post is nothing more than simple fanboy apologism.


I'm as far from being mysql fanboy reasonably possible, so no, that's not it. What I'm trying to say is: mixing subjective and objective criticism makes the argument weaker. Don't complain that you lost your data due to a typo (statement worked as designed, you had backups, this was a test database and you'd never do that in production anyway - right?) A complaint about the lack of consistency and not adhering to standards is much stronger and still true.


Your last sentence was not needed and lowers the tone of debate around here. Please consider avoiding name calling in future.


I appreciate this well-argued piece of persuasive writing for not choosing MySQL, but the premise is surprising -- I don't recall seeing, on Hacker News or elsewhere, any writeups from companies that chose MySQL then ran into significant problems they had to architect around, nor writeups from companies that chose MySQL then had to rip everything out to switch to something in the same family of solutions (Postgres, Oracle, etc).


We migrated Lanyrd from MySQL to PostgreSQL, primarily to reduce the pain involved in running schemes alterations against large tables but we've since been enjoying some if the more advanced PostgreSQL features (pg_trgm indexes for example). You can read about our migration here: http://lanyrd.com/blog/2012/lanyrds-big-move/


Thanks for being clear and detailed about your DB migration.

I've been involved in a few and they've generally been brought on by DBAs convincing the bosses that "everything is wrong"...once it was done the DBAs wanted to blame the developers again.


Just wanted to confirm something. You say you have no single point of failure but you are running everything in one data center ?


That was slightly loose wording in the article: yes, the single data center is currently a single point of failure (though we can spin up a replacement stack elsewhere very quickly). Andrew was referring to the way we had rearranged our software stack to reduce the SPOFs (our previous architecture had a single load balancer for example).


There is always a single point of failure if you zoom out far enough.


You don't tend to hit the kind of problems you have to rip out or architect around; it's more a constant set of nagging exceptions to remember, bugs that take longer than they should to fix, application improvements that you'd like to make but can't because you can't change a table definition - but never any single critical showstopper. The PHP comparison is apt (and I've never heard anywhere choose to rip everything out of PHP and start again - although when I worked at last.fm a lot of our effort was replacing PHP code with better alternatives, one piece at a time. Rather like my job before that, where we spent about four years gradually getting our application into a state where certain mysql tables became small enough that we could afford to alter them).


Yes this is called "we're stuck with it syndrome"


I worked at a large startup that was trying to migrate from MySQL to Postgres because of how long table migrations take on MySQL. There's a tendency for adding a column in MySQL to a table to be stop-the-world for the whole server, and it can take hours. They had some pretty serious workarounds for that, and some of them were "try to never change the database structure". There was some thought that there were other problems that Postgres would solve, some advocacy happened here: http://corner.squareup.com/2011/06/postgresql-data-is-import...


Percona's OSC [1] pretty much fixes the migration pain without locking.

[1] http://www.percona.com/doc/percona-toolkit/2.1/pt-online-sch...


We recently spent a few weeks trying to make it work in production, and couldn't. We always ran into deadlocks due to gap locking and auto_increment columns.

After a few weeks of failure, we finally gave up and did it the slow way, unweigh a slave, do the alter offline, catch up in replication, and fail over. Repeat as needed.

Percona's OSC is great for the cases where it works, but there are still many cases where it doesn't.


pt-online-schema change is a great tool, but not a panacea.

I've seen a serious issue where under moderately heavy load, the OSC tool would trigger a deadlock on the final atomic table swap, blocking all transactions against the table, including the final `RENAME TABLE`.

All operations (including reads) would block, waiting on a table metadata lock. Killing the offending RENAME thread was the only way to restore access.

IIRC, the ultimate cause was due to a bug in our code that would occasionally leave stale transactions open in edge cases. Happened a few times, and caused a fairly severe outage the first time we hit it.

Not sure where the blame here lies -- and what can fundamentally be improved -- but OSC techniques and MySQL still leave plenty of room for a moderately savvy user to get into trouble.


At Facebook we use our own OSC which may work better for circumstances and less well for others.

http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook...

Check out the README for examples.


> a large startup

I think the word for that is "company".


try to never change the database structure works pretty well. For all other cases, the easiest solution requires that you have a good redundancy system in place, and some extra planning. But basically run alter table without replication on your out of rotation slaves and masters, and then swap. Depending on your backup situation, you could probably just run the alter table on your backup systems and then restore from then.

Yes, alter table is inconvenient, but how often are you updating your schema and how often do you use it? In my experience, once major development was done (which is usually before there's a large amount of data in the tables), schema changes are rare, but queries are very common.


This has been one of the biggest changes I've noticed since our team switched from MySQL to PostgreSQL: our database design is now much, much better and we make changes to it much more often.

I has grown used to compromising on database quality to avoid having to make frequent changes to large tables: implementing a new feature as a join table for example rather than adding the columns it needs to a core, multi million row table.

Being able to add a billable column to a large existing table without worrying about downtime frees you up to be much smarter about how you design your application, and much more agile about changes you make to exuding functionality. This is a far bigger benefit than I was expecting from the change.


That should be nullable, not billable.


One of my colleagues has been involved in rearchictecting a reasonably large database on MySQL; his complaints ran along the same lines as the article. Many bad defaults, many subtle edge cases of wrong. E.g., the encoding based on connection string problem is very real. :-/


Perhaps most companies prefer to keep the press positive? Mine certainly does. (And it'll probably stick with MySQL until its dying day despite it causing problems repeatedly.)


this is my favorite MySQL "decision", that the GROUP BY keyword by default (that is, unless you turn it off with the late-added magic flag ONLY_FULL_GROUP_BY) will gladly select an essentially "random" (well, the first row based on INSERT order, which in SQL is as good as random) row for you:

    mysql: create table data (token_a varchar(10), token_b varchar(10));
	Query OK, 0 rows affected (0.05 sec)

    mysql: insert into data (token_a, token_b) values ('A', 'A');
	Query OK, 1 row affected (0.01 sec)

    mysql: insert into data (token_a, token_b) values ('A', 'B');
	Query OK, 1 row affected (0.00 sec)

    mysql: insert into data (token_a, token_b) values ('B', 'B');
	Query OK, 1 row affected (0.00 sec)

    mysql: insert into data (token_a, token_b) values ('B', 'A');
	Query OK, 1 row affected (0.00 sec)

    mysql: select * from data group by token_a;
	+---------+---------+
	| token_a | token_b |
	+---------+---------+
	| A       | A       |
	| B       | B       |
	+---------+---------+
	2 rows in set (0.00 sec)
Note here the value we get for "token_b" is based on whether or not "A" or "B" were inserted first. The second "token_b" for each "token_a" (as well as any number of other rows that might follow it for that "token_a") is just discarded.

The scary thing is that I semi-regularly come across applications in Very Important Industries that have large amounts of SQL that rely upon this behavior of "picking any old row" for you, rather than selecting a MAX() or MIN() of some column and then joining to a subquery of the GROUP BY + aggregate....because joining to a subquery in MySQL also performs like crap.


I have encountered situations where this is convenient, and I have yet to see it cause any bugs or problems.

In the places I have seen it used, the 'arbitrary' column typically has the same value for the entire group, e.g. for efficiently selecting distinct texts based on their hash values.

PostgreSQL has a similar feature using SELECT DISTINCT ON: http://www.postgresql.org/docs/9.2/static/queries-select-lis...


> In the places I have seen it used, the 'arbitrary' column typically has the same value for the entire group, e.g. for efficiently selecting distinct texts based on their hash values.

I agree that case is convenient. But I've seen it being used to actually pick a random row, where a different row would have a different result (such that the output of the program would definitely be different), and it's clear the developers who wrote it didn't fully understand this was going on. I wrote it up in a report for this particular client but they didn't seem to want to touch this particular very old and venerable code.


I have seen bugs caused by this and I have only worked a little with MySQL. I agree though that the behaviour is very convenient, but the PostgreSQL way of implementing it is superior since it is more explicit about what it does and it also can be combined with ORDER BY so it does not just give you a random row.


You say ONLY_FULL_GROUP_BY was late added. But there are posts talking about this setting in 2002.


but only added in version 5 (I was using it back with version 3 in the late 90s):

http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html#...

> Do not permit queries for which the select list or (as of MySQL 5.0.23) HAVING list refers to nonaggregated columns that are not named in the GROUP BY clause.


You quote version 3.23, yet talk about subqueries which didn't get added until version 4.1.

ONLY_FULL_GROUP_BY was added in 4.0 but only applied to select list columns, the 5.0.23 mentioned above applies to the HAVING clause list not allowing them as well:

http://dev.mysql.com/doc/refman/4.1/en/server-sql-mode.html#...


the point is, it's outrageous MySQL even does GROUP BY this way and that doing it the "correct" way is not even the default.


Ronald Bradford's http://www.slideshare.net/ronaldbradford/my-sql-idiosyncrasi... is worth reviewing for anyone who runs MySQL.

I especially like how he explains SQL_MODE bit by bit and ends up recommending

        SQL_MODE =
           ALLOW_INVALID_DATES, ANSI_QUOTES, ERROR_FOR_DIVISION_ZERO,
           HIGH_NOT_PRECEDENCE,IGNORE_SPACE,NO_AUTO_CREATE_USER,
           NO_AUTO_VALUE_ON_ZERO, NO_BACKSLASH_ESCAPES, NO_DIR_IN_CREATE,
           NO_ENGINE_SUBSTITUTION, NO_FIELD_OPTIONS,NO_KEY_OPTIONS,NO_TABLE_OPTIONS, 
           NO_UNSIGNED_SUBTRACTION,NO_ZERO_DATE, NO_ZERO_IN_DATE,
           ONLY_FULL_GROUP_BY, PAD_CHAR_TO_FULL_LENGTH (5.1.20), PIPES_AS_CONCAT,
           REAL_AS_FLOAT, STRICT_ALL_TABLES, STRICT_TRANS_TABLES
I also recommend "MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation" http://blog.mozilla.org/it/2013/01/17/mysql-5-1-vs-mysql-5-5... for anyone who working with non-integer numerics.


Ugh, maybe my Internet is slow, but I can't seem to get the page to load. Maybe there should be an unofficial rule that if you're going to write a worthwhile article about database and, I assume since I can't yet read it, performance, you should enable caching on your blog

Edit: here's the raw text version stored on github https://raw.github.com/ojacobson/grimoiredotca/master/wiki/m...


They can't expect the page to be on HN in the first place. Though, I agree, caching couldn't hurt.

Loaded for me just now, albeit after 30 seconds.


MySQL is the visual basic of SQL databases. Anyone can set one up and use it.

The problem is that many non-technical people use MySQL and then think they know all about DBs. Ask them what ACID is, or about foreign key constraints. You'll get blank stares. If you know what those things are and value them, you probably don't use MySQL.


Can you provide some citations to this claim? I know a lot of people who understand ACID, Foreign Keys, and have experience with other DBs (like PostgreSQL) and still often choose MySQL based on its merits.


I think he is talking about "people who use mysql only because it was the easiest thing to learn" don't know these things, not "experienced DBAs/devs who happen to be using MySQL".


Yeah, he's not talking about DBAs who work at Facebook, Yahoo, YouTube, Wikipedia or Twitter. Just because half the Alexa top 10 uses MySQL in production doesn't mean it's not just used by non-technical noobs who don't know any better.


I've been using MySql for the most of my career, and I did learn about relational algebra, normal forms and whatnot back in school, thank you. It's my experience that most programmers I have worked with knows about this stuff too.

That aside, MySql does have transactions and it does have foreign key constraints. Innodb has been around for a long time, you know.


As I've said before, MySQL's data handling is very much like VB's "On Error Resume Next".


I really don't know where to begin with such a stupid comment.

How about you start with listing which of these companies needs educating: http://www.mysql.com/customers/

Facebook ? Twitter ? Amazon ? Flickr ?


In some of these cases MySQL is not being used as an RDBMS.

And in most of those cases it looks like path dependency rather than a selection based on the merits of this or that database.

Either way, this is actually an argument from authority ("So and so use Brand X, therefore Brand X has positive qualities Y").

I've sat in on presentations with DB2 engineers who bragged about the data centre IBM runs for UPS. ~12 billion transactions per day (and that was 5 years ago). Does UPS make DB2 better or worse than MySQL?

Answer: it's irrelevant. DB2 and MySQL would need to be picked on their merits, not their users.


It does invalidate claims that using product X is going to inevitably cause you huge problems once you "grow enough". I've seen a lot of places grow a lot while using MySQL, and the problems aren't really that different or more serious than in places using Postgres or Oracle.

In fact, I know enough places moving from Postgres to MySQL due to growing pains.


Right, but we're not talking here about scale. We're talking about ... trustworthiness, I suppose. Dependability.

What this article demonstrated is that MySQL will often give a false sense of security. That's not what you want from a system ostensibly meant to provide a number of important technical guarantees.

Multi-master is the one tickbox feature where MySQL is still clearly in front. Once the PostgreSQL team finish going about inbuilt multi-master in their usual meticulous, stepwise fashion (my WAG is that it'll land by 9.6), there really won't be any good technical reasons left to use MySQL.


But I didn't mean "grow" in exclusively performance meaning. I meant the whole package (as with growth the requirements for reliability also ultimately show up, at least in some parts of the business). Even my current job that involves a lot of MySQL has important chunks of Must Never Go Away Or Else data, with a lot of complex insert/update action going on them, and somehow it's working fairly well for us.


One of the persistent themes of the linked piece is that MySQL simply conceals many kinds of fault. "Somehow it's working fairly well for us" may be right. Or it might not. Either way, MySQL isn't going to warn you.


With default settings. I agree that keeping them as default is fairly bad, but in a growing scenario you need people who know the defaults very well anyway. After all, quiet database silliness is only one of the many ways you can quietly corrupt huge swaths of data.

It's similar to Rails – the defaults save your time, and in their case are arguably reasonable, but if you don't know how and why they work, they are going to bite you. There is no escaping from knowing about the complexity, but you may escape from typing it out every single time.


This is a mechanism vs policy argument.

Defaults count. Elsewhere I've said that I prefer to start with rigid guarantees and relax them. Default policy matters because in practice:

* The documentation isn't read.

* Even when it's read, the documentation may be incomplete.

* When it's read and complete, the crucial segment may be skimmed.

* When the documentation is complete and the crucial segment was read, it may have been misunderstood because of unclear writing.

* When the documentation is complete and the crucial segment was clearly written and read, it may be forgotten later on.

Then a new DBA or programmer arrives, and the whole thing starts all over again.

Safety mechanisms that require active effort above the baseline configuration do not work very well. Saying "there is a great mechanism" does not describe the actual properties of the actual system. The default policy is the policy that counts, because a single omission will reintroduce it.

Windows XP, buffer overflows, botched system deployments and so on all have a common property: they require positive effort by humans in order to rise above their baseline safety/security/reliability profile. There is no failsafe -- they only work by constant vigilance.

I don't see that as a good thing.


The analogy goes like this: If you're very rich, you can do horrible drugs and pay a veritable army of people to keep you presentable and working. Of course, this until one day it doesn't. Being large and successful doesn't mean that there's nothing festering deep within. Companies and people can become so invested in previous solutions that the perceived benefit versus the cost of switching is almost always too low to gather enough political will to execute. MySQL is a drug, when you start it is cheap and available, but as you grow as you hope you will, you start to need more and more. Of course, this exposes many of the flaws and problems of the drugs you were on and if you're in a bad way, still are. Better to go through a bit of pain now and know what you're in for than to just use MySQL because facebook, twitter, amazon, and flickr can pay a huge team of people or large consulting firms to make their technology decisions actually work. If you really must, use MySQL to spike the product but know the side-effects and know when to get the hell off it.


He's clearly talking about people new to databases who choose MySQL because of its ubiquity, not seasoned engineers and large companies operating legacy applications.

> Facebook ? Twitter ? Amazon ? Flickr ?

I imagine none of these companies would choose MySQL if starting from scratch today.


I worked at one data-heavy startup where things were on MySQL and even with a lot of consulting by Percona, eventually it just couldn't keep up with our needs, and the project was ported to PostgreSQL. I've worked on other projects that were less data intensive and MySQL worked fine though.

If given a choice, I'll take PostgreSQL any day, but I do understand that people are hesitant to change database whey they don't need to. If you are encountering trouble though, by all means, move on.


i read until the backup process and gave up reading through, for cold backups of online databases, you should use percona's extrabackup, its open source and free and works perfectly. http://www.percona.com/doc/percona-xtrabackup/ if you are using mysql a simple googling "hot backup mysql" will lead to that, i guess author didn't even bother to search.

besides, you need to use replication and connect a couple of slaves if you care about being online and backup from one of the slaves - which is a common practice for all databases not for mysql. if you are trying to dump from the master without slaves good luck with any database.


XtraBackup has its limitations wrt to locking when you've got a mix between InnoDB and MyISAM tables.

The problem with taking a backup from a MySQL slave is that the data is not guaranteed to be identical to the master, thanks to the subtle problems and peculiarities of the MySQL master-slave replication (some of them are described in the OP). For precisely this reason I install an automated job that periodically checksums the tables and sends the results to the DBA role.


This person doesn't even offer a solution?

How is it that people who have blogs that take 30 seconds to load continue to give performance advice that gets upvoted?

The funny thing is that this blog's performance is based on some cookie. If I reload in Chrome? 2 sec. If I reload in "Incognito Chrome", it's again really slow.

So seriously, just stop with these authoritative blog posts when you don't even know what you're talking about.


It's not performance advice, though.

It's a list of known problems with MySQL, most of which violate the "Principle of Least Surprise".

Some of the problems have performance implications. The query planner stuff, for example.

> The funny thing is that this blog's performance is based on some cookie. If I reload in Chrome? 2 sec ... just stop ... when you don't even know what you're talking about.

It may interest you to learn that web browsers have local caches. Incognito Mode does not have such a cache, and so must refetch pages from scratch on each view.


If you have to be a jerk about it, it may interest you that relatively static blog content can be cached on the server.

I'm genuinely interested to learn which use cases are more dependent on query planning than caching methodologies.


> I'm genuinely interested to learn which use cases are more dependent on query planning than caching methodologies.

I'll reply to this separately, it's a good discussion to have.

My basic problem is that caching comes with a coordination cost and I prefer the originating data source to be as performant as possible.

My own use case is a small Wordpress multisite installation. Even with a relatively trivial amount of traffic and site data, it behaves abysmally on some simple requests. The linked article seems to explain why -- the query planner ignores indexes on certain kinds of joins. The same sort of joins as the Wordpress Recent Comments Widget.

Now, I can and have worked around this by using multiple layers of caching. There's the MySQL query cache, some memcache (PHP opcode caches would in theory be quicker but I've never been satisfied with their stability) and of course pumping gzipped HTML to disk for nginx to serve directly without hitting PHP or MySQL.

But like I said, caching comes with a coordination cost. One of my sites is used less like a blog and more like a chat room. Hundreds of comments per hour, every single one of which causes the query cache to be pruned of the exact query I most need to cache in the first place.

That is: I need to cache this query because they talk so much. But they talk so much that the cache is not that helpful.

What would be helpful is if MySQL was a bit smarter about using the indexes I put there in an apparently useless bit of chicken-waving.


As you described it, the query/caching strategy sounds too complicated.

mysql proxy might help you out, or mysql triggers, or plenty of easy indexing strategies.

i agree, caching strategies are not easy, but that is not the fault of databases. IMHO keeping your data layers separate is best.

if you have a multi-site installation, set some kind of prefix within the app that makes sense to you for each application, before it goes into whatever cache.

this will work for you whether it is shared memory, memcached, or some kinda file situation.


> As you described it, the query/caching strategy sounds too complicated.

But that was his whole point - needlessly complicating an otherwise simple setup to get around limitations in the DB engine.

> mysql proxy might help you out, or mysql triggers, or plenty of easy indexing strategies.

I genuinely fail to see how mysql-proxy [1] (which has a sleuth of long-standing, unfixed, problems on its own), or triggers would (elegantly) help in this situation.

[1] http://dev.mysql.com/doc/refman/5.1/en/mysql-proxy.html

edit: formatting


...or he could use a dbms that works properly and save himself the hassle.


> But like I said, caching comes with a coordination cost.

And what? It's the reason to not use caching? ridiculous


> relatively static blog content can be cached on the server.

If you poke around, you'll find that this isn't a Wordpress installation. (Or any other dynamic blog engine, for that matter.)

It's Markdoc[1], a static site generator:

> Markdoc converts wikis into raw HTML files and media ... Markdoc need not be installed on the hosting site, since the resultant HTML is completely independent.

I believe the program at fault here is Apache, which in the default configuration loves nothing better than to choke on child processes.

[1] http://markdoc.org/


When criticizing a piece of software as popular as MySQL, offering a solution is an exercise in futility: it only leads people to think that the author is wrong because the alternative posited is wrong for them. See the endless MySQL vs. PostgreSQL discussions (which are not entirely without merit: the two system do have overlap in targeted users).

Instead, this author his simply focused on numerous detailed aspects of MySQL that he/she seems to think would be seen by most people as particularly insane, without need for comparison. I think there's a place for that.


You're right, offering a solution to MySQL problems in a thread is generally useless.

I don't really get what point the author is trying to make.

I also don't take detailed performance advice from people who can't keep a blog up. Yeah, must be the wrong backend.


How could you possibly not get the point? It's the first two (2) sentences "Considering MySQL? Use something else."


I used Postgres on a project and my main concern with it is that whatever GUIs were available were seriously lacking. Phppgadmin is awful.

If you're really pushing its capabilities then there may be better solutions than MySQL but for bog standard applications and websites there is no alternative as far as I'm concerned.


You should've used pgAdmin [1], it's much better than PhpPgAdmin [1]. I have not used the MySQL GUI (MySQL Workbench [2])that Oracle/Sun/MySQL sells with the Enterprise license but by its feature list I suspect that it provides comparable functionality to pgAdmin (which is FOSS). I would not put any of Php{Pg,My}Admin interfaces in the same category - they're too lacking IMO.

[1] http://www.pgadmin.org/

[2] http://www.mysql.com/products/workbench/


I'll second pgadmin. I finally woke up to the fact that I should start using Postgres a few months ago, and pgadmin made my initial foray into postgres very straight forward


I use MySQL and Postgres, depending on the project.

What other option is so obvious that you can't even name it?


SQLite is so obvious, and in many cases it's just enough. Or, hsqldb if you're into Java.


This is an ad hominem argument. His statements about MySQL should be judged on their own merits, regardless of the load times of his blog.


No, it's not.

If you're going to be a performance authority about web technology, you should know how to configure the open-source and very well-documented code you are using.


The OP is not trying to be "a performance authority about web technology", in fact the article's main focus in not on performance, but on peculiarities in the RDBMS engine that force the developers to constantly seek workarounds and be very careful about silent (and unexpected) behavior.

There is no "web technologies" in the article, the term "web" is found only once (!) in the whole text.


I don't see how using Apache, with what I guess are the default settings, has any bearing on remarks about MySQL.


didn't say anything about apache, so not sure what you're complaining about.


See my other reply for why I brought Apache into the discussion.


And so you assume that the author of the article doesn't know how to optimize his or her blog, ignoring the possibility that he or she does, but didn't do it for intervening reasons.

Still has no bearing on the veracity of statements made in the article.


Given such a detailed article, the solution is obviously another DB. Postgres lacks most of the MySQL flaws described.


It took less than a second for it to load on my browser. And why should he offer a solution? The point of his rant isn't that you should use a particular solution, but rather that you should avoid MySQL at all costs.


the solution is Postgres


I'd be very interested to hear from people using other databases on whether their DB of choice is much better. I've been using MySQL for a while, have been burned by a few things, but figured it was mainly my fault. If indeed there are better options I'd love to hear the details.

(Just to be clear, I know about other databases, just aren't sure if any are that much better in real world use).


I've never recommended MySQL for anything where the data is important, but I've had to help several businesses deal with existing installations. Here's a few teachable moments:

One customer who ran MySQL on windows experienced data loss after a historic MySQL crash left them with a corrupted MyISAM table, which caused mysqldump to SILENTLY exit before the dump was complete. Their dumps only had the tables up to the corrupted one. They had been saving those dumps for months, but never testing them. When the server eventually gave up the ghost they found that none of their dumps were good. Lesson: 1. ALWAYS test your dumps. 2. NEVER trust MyISAM.

Another customer, smart enough to use InnoDB but not smart enough to change the MySQL defaults let one database get too large and found they couldn't shrink any of their databases because unless you use innodb_file_per_table InnoDB won't ever relinquish space even if you delete all the rows in your tables even after "optimize table". Lesson: 1. ALWAYS use innodb_file_per_table if you plan to keep the database for a long time. 2. Periodically run mysqlcheck optimize or optimize table to actually reclaim deleted space.

I've also helped customers rewrite their applications to avoid the "Total number of locks exceeds lock table size" problem. What is that? Well, even if you use database level locks the number of rows a transaction can alter is still limited by the server's memory buffer pool size. Lesson: 1. Know your working set. Make sure your application checks the server's buffer pool size via SHOW VARIABLES and is smart enough to avoid large transaction updates. 2. Use a large buffer pool size whenever you can.

Coming from a PostgreSQL background, these came as total surprises to me. I've never been completely happy with PostgreSQL's transaction id wraparound issues but I have yet to experience anything like the problems I've observed with MySQL.


Many of the complaints I hear people make about the entire concept of an "RDBMS" (often then to motivate why the NoSQL solution they decided to start using is better) are actually MySQL-specific issues that do not affect PostgreSQL (or Oracle, or usually SQL Server; I only mention PostgreSQL, as you wanted a concrete experience); one key example is "if you want to change your schema, it requires locking the entire system and rewriting the table" <- no, as the schema is just metadata; you should be able to do these things under first-class transactions, and PostgreSQL supports this just fine.


Or "joins are too slow". Well, yes, they can be. But some database systems have smarter plan builders than others.

Take, for example, this gem:

> Joining and ordering by rows from multiple tables often forces MySQL to dump the whole join to a temporary table, then sort it -- awful, especially if you then use LIMIT BY to paginate the results.

This describes just about every standard page-with-comments schema ever devised.

It struck me with particular force because perhaps I've been unfairly blaming Wordpress for the failings of MySQL, vis-a-vis the atrocious performance of the Recent Comments Widget.

I see horrible join performance on tables with about half a million rows, which is chicken feed. The query cache "solves" my problem, but it's amazing that it should be necessary for very trivial joins between 2 or 3 tables.


Is it MySQLs fault that wordpress asks it to do something that's hard?

LIMIT OFFSET queries are always going to be hard because you have to generate lots of rows only to throw them away, but there are ways to organize the data, and write the queries so that you avoid temporary tables. I don't think Wordpress is MySQL specific; and maybe they use an ORM layer anyway, but if you want to get the most performance out of MySQL, you need to know how it works and write your queries accordingly. Sometimes that means get all the data unsorted from MySQL and sort it yourself, which is unfortunate, but frontend cpu + memory is cheaper than database I/O to make a temp table and sort it


I know WordPress quite well. Most of these queries can be very easily executed from indexes, pulling from the tables only the data that will be output. It's just that mysql chooses to do it the dumb way.


> LIMIT OFFSET queries are always going to be hard because you have to generate lots of rows only to throw them away

In fairness, that often depends upon the intelligence of the planner. A lot of sort-limit-offset queries can be reasonably easy and not generate too many rows if the planner works correctly and you have the right indexes.

edit: Assuming you're mostly picking up early pages as opposed to getting late ones, anyway - which is usually the case for, say, a comment system.


> LIMIT OFFSET queries are always going to be hard because you have to generate lots of rows only to throw them away There are ways to avoid that: http://www.slideshare.net/suratbhati/efficient-pagination-us...


To be fair, there are cases where schema changes in PostgreSQL require re-writing the table, too. Like, for example, when you change the data type of an existing column.

Otherwise, you're good, though.


Yes, but that case comes up less often and isn't what I see NoSQL people complaining about ;P. (Also, when I've listed that as an explicit caveat recently, I often get corrected that they changed the ramifications of that recently, and it only sometimes has to happen, so I figured I'd just explicitly list the situations where you clearly should never have needed to rewrite the table.)

Generally, though, this is related to be able to do table changes under transaction locks: I have often enjoyed being able to, under an atomic transaction, replace a table with a view over that table, or make modifications to indexes and columns that I then rollback if there's a mistake.


To be fair, if you want to change the datatype of data in a NoSQL system, you have to rewrite it as well.


Actually that depends on which change you do. varchar(2) -> varchar(4) does not require a rewrite while text -> int and varchar(4) -> varchar(2) does.


Used MySQL for 10+ years. I found it great for most purposes. I don't feel I left MySQL because of MySQL's failings, but that of all monolithic RDBMS.


Have you tried other RDBMSes?


OK I'll bite: Yes. Ultimately my problem with the old monolithic RDBMS is architectural.

Q: "How do you get 24x7x365 service on a mission-critical system that depends on a huge monolithic RDBMS datastore, preserving the capacity for RDBMS major-version software upgrades (protocol, on-disk format, etc.) and no downtime even between major versions?"

A: "While you could umm and ahh about it for awhile, basically, you don't. It's too big. Decentralization, plan-to-fail, clustering ('private cloud', hah), and commodity hardware are your friends. Embrace them and enjoy lower blood pressure."

If you don't have high availability requirements, RDBMS can still be a great and simple solution. Right tool for the job. (I have simply personally drifted from RDBMS as the requirements of systems I tend to work on has grown. My current rule of thumb is roughly: 'Make all storage backends service-provider abstracted (swappable, easy to benchmark, unit test, etc.). When choosing a backend for a storage implementation - if you can't SQLite it, don't SQL it.')


Any central service is going to be a point of failure. This isn't unique to RBMSes (the other classic SPOF is message buses).

The idea that RDBMSes are unsuitable for HA applications is ... well to be charitable, I'll call it "inaccurate".

Most of the techniques that are used for HA were first invented for conventional databases and/or their best friends, mainframes and midrange systems.

I agree however that it is a question of picking the right tool for the job.

I'm a data-safety bigot. I require a great deal of talking down from my tree. These days I can understand that, yes, OK, a consistent and durable model of Facebook comments probably isn't really that important.

But I will bet folding money that Facebook stores their financial data in a huge monolithic RDBMS datastore, like Amazon, Microsoft, Google and Apple do.


I agree with the historical point. But I think you miss mine: show me an open source RDBMS (I don't want to drop $annual_profit_margin on Oracle) with major-version upgrade capable HA while under load. My view: it doesn't exist. You could try to make it happen somehow, but it would be a huge project on its own.

On your somewhat loaded financial example, as someone who is designing some financial systems at the moment, I would instead argue that 'eventual consistency' is actually the de-facto model within the vast majority of business accounting, globally (credit card chargebacks, taxation systems, international (or domestic in the US) bank transfers, invoices/accounts receivable, etc.). Simultaneously truly real time and truly atomic requirements, particularly at scale, are rare.


Accountants invented the concept of a queue of transactions causing predictable updates. They also invented eventual consistency in the form of special and general ledgers.

However, ACID is four requirements and they're all still valuable and useful defaults. Atomicity comes from double-entry bookkeeping, consistency from the idea that data is meaningless without structure, isolation from the demands of consistency and durability because people get grumpy when you tell that that umpteen jillion dollars may ... or may not ... have been recorded.

Like I said, I'm a data-safety bigot. I greatly prefer to start with a safe default and then relax the guarantees. Retrofitting safety is harder, especially when you don't have a uniform statement of what your data is.

For something like a blog, an RDBMS is probably overkill. And MySQL got its start in life because it so thoroughly (silently) relaxed the standard guarantees that it was much faster than anything else.

> major-version upgrade capable HA while under load

The strategies are the same as for NoSQL.

Either you stop the world, or you run versions in parallel and drain traffic from the old versions.

I also find it slightly hilarious that just casually upgrading something without exercising great caution is seen as a good idea.


1) "24x7x365 service on a mission-critical system"

2) "(I don't want to drop $annual_profit_margin on Oracle)"

If it's critical, you're willing to pay for it.

If you're not willing to pay for it, it's not critical.

This comes naturally from the definition of mission critical.


I know Apple uses SAP/Oracle and imagine the others would be similar.

But they use Oracle Cluster i.e. multi-master so it isn't monolithic like say a giant PostgreSQL database would be.


I started 7 years ago with SQLite and still have all my sites and webapps running on that. Works wonderfully for me. Meanwhile I must have seen close to a dozen data storage systems become popular then be replaced by the next big thing, from MySQL to NoSQL and everything in between.


I love SQLite. It's simple, it doesn't make a fuss, and it does what you'd expect. And better yet, since it's a single file application database instead of a database running on a server, support for it is built in to Python, and you using it is as simple as an import statement and .connect() - no server to configure.


MySQL became popular rather more than7 years ago, and still is.


>>It's good enough. No it ain't. There are plenty of other equally-capable data storage systems that don't come with MySQL's huge raft of edge cases and quirks.

Actually, it is good enough. Good enough to powere billions of websites. Good enough not to pay for Oracle, DB2, or trying to cram some half-finished nosql mess in where a relational database works better.

MySQL isn't an end all, but please, don't pretend that NoSQL holds all the answers.


I imagine that the author had in mind other Free relational systems, rather than NoSQL or proprietary. For example, off the top of my head:

* Postgres * Apache Derby * SQLite

All of which I'd prefer to using MySQL in just about any context, and each of which has contexts for which it's overwhelmingly the best choice.


what's wrong with this picture?

    Oracle/DB2/SQL Server/$$$ <------------ (?) ------------- MySQL --------> NoSQL
someone's missing the elephant in the room.


Which elephant is in your room?


I've been hearing bad things about MySQL, so I've been avoiding it as of late.

So far my experience has been subpar.

PostgreSQL is pedantic with data insertion, almost to a fault. This costs me development time. (Also I have no idea what my users will do, and I'd rather have faulty data inserted than none at all. If it's for a client asking about a product, this could cost money). Yet purists claim this is a great feature. It's also about twice as slow as MySQL (admittedly this is likely due to the maturity of the environment I'm working in).

I personally like PostgreSQL. However I see it more as a guilty nerd pleasure rather than a development time effective solution.

MSSQL is very nice, and my experience has been the best. Microsoft's tools are top quality. You'll find yourself very productive; creating advanced SQL views, mirroring, and snapshots. However MSSQL reeks of vender lockin, I had to virtualise the MSSQL tools, and getting the drivers to work on Linux took almsot two days of googling. Despite ease of use, the vendor lockin doesn't make MSSQL worth it.


I don't know where the "twice as slow" number for postgresql comes from. My the experience is different. Wildly different. For simple "keystore" loads, both are at the same performance class. For anything more complicated, typical relational database scenarios, mysql folds and is one or two orders of magnitude slower.

The query planner's inability to push restrictions down the tree into subqueries alone prevents many typical real world queries.

The way I see it, mysql is trapped between NoSQL and proper relational databases


"PostgreSQL is pedantic with data insertion, almost to a fault."

The PostgreSQL philosophy isn't about being "pedantic", but it is very different from MySQL. I assume that you have much more experience with MySQL; maybe you are trying too hard to use postgres in the mysql way rather than the postgres way?

Personally, I don't think it's a good idea to migrate usually for this reason. The entire project development always has lots of built-in assumptions about how the DB will be used, so the new system is almost never quite the right fit. It can only work in trivial cases, or when you have the right expectations.


Yes I've been using MySQL in the PostgeSQL way. I learned to use databases using oracle databases, which is similar to MySQL.


> oracle databases, which is similar to MySQL.

How did you reach that conclusion?


If you would rather have faulty data than none when something's wrong with your application, then MySQL is your DB of choice.


I disagree that MySQL is a better choice when you have "faulty" data:

* In postgres, you have PL/python (or PL/V8JS, or perl, or many other languages) to help you through the mess. For instance, you could write a canonicalization function to help you put the data into a more-queryable form. You can then even index on that function.

* Powerful triggers might help with post-processing, or putting data into some queue of "bad data" that needs to be cleaned up later. Maybe by doing so, you realize that the data isn't "bad", your schema just needs to be updated to reflect new interesting cases.

* You can pull data in from remote sources with foreign data wrappers, which might be necessary to clean the data up properly (e.g. one extra join against the company LDAP directory using the email might be able to canonicalize those employee names).

* You can catch errors using subtransactions and have a different processing path for data that doesn't fit in the schema.

Maybe some of these features exist in MySQL (I haven't been a real user since around 2003, aside from a bit of administration). But in postgres, these features all work together seamlessly along with all of the other features in postgres to make it all work nicely and without a pile of caveats. And that matters a lot when trying to wrangle strange data.


If you read pippy's statement carefully

  Also I have no idea what my users will do,
  and I'd rather have faulty data inserted than none at all.
you'll see we're in agreement - PostgreSQL's design and feature set makes it clearly superior to MySQL in preventing the spread of faulty data. But pippy would rather a garbage database rather than be slowed down by preventive measures. It's costing him development time after all...

By the way - thanks for all the great work on PosgreSQL range types.


> PostgreSQL is pedantic with data insertion

Better to deal with the exceptions up-front, than to discover months down the line that your database has been silently chopping long strings, silently casting types and coercing values into useless junk.


"Better to deal with the exceptions up-front"

That may be, but when an application developer is developing, they primarily need to get data into the database. That may be right or wrong[1], but it has a big impact on the adoption of particular database systems or practices.

And it's somewhat irrelevant to this discussion, because there are many ways you can feed the data you want into postgres (like declaring columns BYTEA, TEXT, or HSTORE), no matter how "bad" it is. I would argue that it's much easier to deal with "bad" data in postgres than mysql because you have so much flexibility (e.g. write a function inside of postgres in PL/python to get some meaning out of the ill-specified data; or write some triggers to do post-processing).

[1] I personally think that a misalignment of incentives is responsible for the widespread focus on the ease of writing into a database without considering the utility of the data once it's in there.


If you must use MySQL, at least switch to MariaDB. Those Oracle folks can not be trusted anymore, not even with a toaster.


But you can trust the fork started by the guy who sold it to Sun in the first place, then followed it to Oracle, then led the exodus from Oracle to capitalize on anti-Oracle feeling by starting a fork?

Riiiiiiiiiight....


Then there is always the Percona or Twitter or millions of other forks around.


Percona Server is good. The ability to monitor is great and the supporting tools are very helpful.


I agree that the MariaDB and Percona Server versions of MySQL are far superior in many ways, but Oracle, with 5.6 has proved their commitment to MySQL with some awesome new features. I admit there has been a reduction in transparency with the test cases fiasco, but beyond FUD, what is it that they "can't be trusted with?" Are you concerned that they are in there secretly making a bittorrent out of your database or something?


I originally was using MySQL to learn with rails because it was so simple to set up on my Mac.

The only reason I migrated (quite early I may add) is that at the time Heroku pretty much mandated I move to postgres.

I'm glad I made the move, but I'd say awareness of the alternatives is the limiting factor. The brand awareness MySQL has is pretty big compared with many others. I wonder how much impact Heroku's decision to support postgres has helped those similar to myself drop mysql.


"I'm going to rant against MySQL, but I'm not going to suggest a better alternative."


In what entitled universe do you live in where a guy who carefully and patiently points out problems is also obligated to solve every single one of them?

Also, do you really need someone to spell out the alternatives to MySQL? There are too many to list.


The article would have been much more credible if it would have said something like "try postgresql instead". It has nothing to do with entitlements.


He does not need to be able to point to an extant, better alternative for his criticisms to be "credible."

BTW, I can't believe you're implying that his post is not credible. The practical outcome of your demand for a solution is to shut down legitimate criticism.


"Not MySQL" is good advice. Look into alternatives, there are many good ones.


This is a great article.

But I really wish it had sources for each of the claims. I would be interested to read the relevant documentation, because some of these directly describe problems I've had with running a Wordpress installation.

And I've been blaming Wordpress for it. There's possibly a big mea culpa brewing; but I'd really like to look at the specifics.


Oh just blame Wordpress. My self-written software has no performance issues on MySQL, but Wordpress loading times are over two seconds with a default installation.


> Oh just blame Wordpress.

I usually do. But in this case it looks as though MySQL's underwhelming query planner might be at fault.


MySQL's query planner is actually pretty good, but there are some things that are just not going to be fast; most of which is documented, almost all of which can be seen with describe select ...; if it uses a temporary table, it's probably necessary, and it's definitely going to be slow once you have enough rows. If WordPress uses any of these things, it's not MySQL's fault.


This may have been true ten years ago. Not anymore. Mysql query planner is dumb as a rock. Easy example: instead of referring to a table in FROM, try using a sub query with SELECT * from the same table. It's an easy reproduction of the planner failing to process the query tree. It should result in the same execution tree. It results in a temporary table being created, as an identical copy of the original table.

Before you attack the query itself, and note that it is an example.


Fortunately there are big improvements coming to the specific case of subqueries in MySQL 5.6.3 [1]. MariaDB (a MySQL fork) also has several further optimizations [2].

[1] http://dev.mysql.com/doc/refman/5.6/en/from-clause-subquery-... [2] https://kb.askmonty.org/en/subquery-optimizations-map/


Its query planner may be good (I don't know about this), in the sense that it can typically choose the best of a bad lot. However, the query executor (of stock MySQL) is awful - no hash joins? No merge joins? It is good compared to what?


There is a bit of an elitist attitude in this point of view. Of course MySQL is not perfect. That's a strawman argument, nobody is claiming it is.

For 99+% of all applications that need a simple database, it is more than "good enough".


This looks similar to the MongoDB FUD from a year or two ago. Nevertheless, I don't like MySQL and prefer PostgreSQL for all my projects.


That's a detailedand well documented article sir. Nothing like FUD.


The article lost me at this.

> Already on MySQL? Migrate.

Got a silly little thing in the corner running just fine on MySQL. Go spend time on it? No.


Instead of writing these rants go and build a WAMP like package with PostgreSQL or other alternatives. That might actually do some good.


> Instead of writing these rants go and build a WAMP like package with PostgreSQL or other alternatives. That might actually do some good.

You mean like WAPP, the package installer provided by BitNami [1]?

I don't see a problem with a well-argumented rant, IMO it's quite helpful as it sparks such discussions, let people improve their knowledge and make better informed decissions, and, not the least important, forces a vendor to work towards improving their software.

[1] http://bitnami.org/stack/wapp




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: