Making MySQL Better at GitHub

DigitalSea · on Sept 2, 2014

Anyone else disappointed the post didn't go into all that much detail? Scaling databases is hard if you're not privileged enough to have access to large pools of money to hire good DBA's.

Interestingly, I would love to see companies like Github opening up their database schemas to the public with mock data. Scaling is one aspect, but the best thing you can do in the beginning is to create a solid schema (normalise, denormalise...) it would be interesting to see what Github uses and why. Still awesome to see MySQL being the choice most large companies like Github choose in the face of new and untested NoSQL databases like MongoDB.

calinet6 · on Sept 3, 2014

Or better, more solid relational options like Postgres...

Honestly, is there any reason to use MySQL over Postgres at this point? Or is it sort of six of one half a dozen of the other as long as the data model is decent?

cwyers · on Sept 3, 2014

INSERT IGNORE and REPLACE are two pretty good reasons, in my opinion. Postgres also doesn't have real table partitioning. Yes, you can sorta kinda hack around the first with stored procs. And yes, you can do something that looks a lot like partitioned tables using table inheritance. And yes, Postgres now has replication support, if you don't mind using only row-based replication (MySQL lets you choose between row-based and statement-based replication) among other tradeoffs.

So yeah, Postgres is a better relational database than MySQL, if you ignore all the things MySQL does better. Another great thing about Postgres is that whenever a site like Hacker News gets a thread about MySQL, you get a bunch of people asking why you aren't using Postgres instead, and then whenever you try and answer the question you get a bunch of Postgres users to tell you you're wrong or that the features you care about don't matter (or that they're hard to implement, which... this is my problem why?) or that Postgres really has replication that's as good as MySQL's this time we pinky swear. So most MySQL users get a first impression of Postgres' user community that is quite frankly rather unfavorable.

Oh, and MySQL has a lot more third-party documentation, tooling and support available.

calinet6 · on Sept 3, 2014

Perhaps it has something to do with having worked in the pg codebase for a time. It's clean as a whistle, I respect that a lot. But that's certainly not the whole picture.

Alupis · on Sept 3, 2014

I was surprised (and a little disappointed) they were not using MariaDB, as it's made by the creator of MySQL, has more features, and generally is considered "upstream" of MySQL these days... seems the MariaDB project would be more receptive to making changes and working with the Github team in order to scale.

cwyers · on Sept 3, 2014

I really don't want to get into a debate over the merits of MariaDB versus MySQL, as a lot of it depends on your workload. I investigated moving a MySQL application to MariaDB and some of the features we were using just weren't there yet, and I don't think all of them made it into the most recent release (but don't quote me on that -- I don't maintain that code anymore, so my memory could be hazy on it.)

But the idea that MariaDB is "upstream" of Oracle MySQL is silly. Is Oracle even merging code from MariaDB?

Alupis · on Sept 3, 2014

Well, there are two versions of MariaDB -- the 5.x tree (which lags behind the 5.x tree of MySQL and merges changes from MySQL as they are released), and then the 10.x tree which is where MariaDB introduces new features that are not yet in MySQL, and MySQL has merged some of those changes into their project. So, both projects sort of merge each other at times...

I'm no DBA, but I'd think Monty (the creator of MySQL) and his smaller crew at his consulting firm (SkySQL and MariaDB Consulting) which makes MariaDB would be more open and flexible to working directly with Github's teams and needs than through the bureaucracy at Oracle.

cwyers · on Sept 3, 2014

So neither is an upstream of each other at this point, they're a fork. And contrary to what people are saying in this thread, as of 10.0 MariaDB is no longer a drop-in replacement for the most recent version of Oracle MySQL and MariaDB's developers are no longer committing to porting all Oracle MySQL features. So MariaDB is no longer a Percona Server-like "MySQL plus goodies" upgrade proposition (and it really hasn't been for a long time - the 5.x series is still at 5.5). MariaDB actually will tell you which MySQL 5.6 features they support:

https://mariadb.com/blog/mysql-56-vs-mariadb-100

So if you need, for instance, MySQL 5.6's partitioning improvements:

https://blogs.oracle.com/MySQL/entry/mysql_5_6_is_a

You're better off on Oracle MySQL. There's other tradeoffs, depending on what you use. What bugs the heck out of me is how a fair number of MariaDB advocates spread FUD about Oracle (if you judge them by their track record, they've committed to improving and maintaining MySQL -- they're not perfect, but that's no excuse to harp about stuff they COULD do when there's no evidence they WANT to sabotage MySQL to force you to switch to Oracle), and want to turn the debate into a holy war rather than focusing on letting everyone pick the best tool for the task.

Alupis · on Sept 6, 2014

I think you should research MariaDB some more -- there are a lot of reasons to use it, and a lot of companies are switching. In fact, just today we upgraded our Zimbra cluster and was surprised to see they had made the switch from MySQL to MariaDB. This isn't "fud" as you put it... but rather a better product for a lot of reasons.

> You're better off on Oracle MySQL.

Hardly true, given the two db's are mostly the same except that the creator is now making newer and better changes in the 10.x branch of MariaDB (Monty left Oracle just like most Sun employees due to inner-politics and fighting that is regular at Oracle).

btown · on Sept 3, 2014

MySQL historically had more support for HA/clustering than Postgres. Recently, there's been a lot of progress on integrating Postgres clustering into the core, to the point where it's mature, but perhaps not as battle-tested. Not a reason to choose MySQL for a startup, I think, but if you've got a cluster working on MySQL and a clear understanding of its pitfalls, there's no real reason to switch to Postgres.

calinet6 · on Sept 3, 2014

That makes sense, thanks. I guess as long as you have a reasonably optimized relational database of a given class, you're going to be about on the same order of magnitude of performance.

markwatson · on Sept 3, 2014

We chose MySQL where I work only because it's easier to hire people with a lot of MySQL knowledge. Technical reasons aren't the only ones when deciding what software/framework/libraries to use.

yangyang · on Sept 3, 2014

That can work both ways though. Sometimes if you chose a less popular option it can make it easier to find outstanding candidates.

vtal · on Sept 3, 2014

1. Monitoring and administration tools for MySQL are (better?) more polished. 2. WAY easier to find MySQL DBA's vs. PG DBA's 3. More resources in general around MySQL. Whatever problem/issue you have it's out there already.

samlambert · on Sept 3, 2014

There are a lot of reasons IMO. MySQL has a proven track record for stability and performance powering huge sites. I would say MySQL has a nicer replication story as well.

In general MySQL is a lot more widely used with a greater pool of knowledge out there.

Alupis · on Sept 3, 2014

MariaDB is a drop-in replacement for MySQL, is made by the creator of MySQL (after he left Oracle post-Sun acquisition), has more features and is now generally considered the upstream of MySQL. I also bet Monty is more receptive to working directly with your teams to scale the product or make changes as necessary.

erkkie · on Sept 3, 2014

Nicer replication story? Try recent postgres with wal shipping + wal streaming;)

d4mi3n · on Sept 3, 2014

Tooling. While I agree psql tends to be more performant, when you're dealing with groups of people having good tools (and good documentation for those tools) trumps many considerations.

As an example, the organization I work at is considering a move to psql, but our main barrier is a lack of good DB clients that are accessible to people who aren't software developers. The best we're aware of as far as DB GUIs for Postgres is pgAdmin, whereas if we're talking MySQL you have things like MySQL workbench, SQLPro, and a myriad of other applications available for whatever your operating system of choice is.

erkkie · on Sept 3, 2014

After Postgres got replication out of the box I'm pretty sure it's just inertia and the fact that it's installed everywhere keeping mysql usage up. Postgres is so much nicer when dealing with day-to-day things.

Eg disabling connections per database using sql and having built-in query stats (http://www.postgresql.org/docs/9.4/static/pgstatstatements.h...) just made me smile.

morgo · on Sept 3, 2014

The built in query stats is similar to Performance Schema in MySQL. The statistics are quite fine grained, and with a set of views on top (https://github.com/MarkLeith/mysql-sys) are very useful for observability.

erkkie · on Sept 3, 2014

Thanks, wasn't aware of that, quit mysql habits pre 5.6 :)

morgo · on Sept 4, 2014

Technically, it was introduced in 5.5, but off by default :)

5.7 is going to be amazing for observability. Memory, transactions, stored procedures, replication, meta data locking and prepared statements are all instrumented in P_S.

collyw · on Sept 3, 2014

MySQL is like PHP, its installed almost everywhere, and requires less set up. I did have one problem recently where MySQL turned out to be a better solution, as it has a native bitcount operation, and a larger set of numerical types.

ksec · on Sept 3, 2014

I thought it was more to do with Github is based on Rails, and MySQl is the default choice.

Yes, I really wish DHH and the Basecamp team would move to Postgre , therefore moving the community to it.

jeremykemper · on Sept 3, 2014

SQLite is the Rails default, in fact.

In October 2013, NewRelic reported that 53% of their customers are using PostgreSQL: http://blog.newrelic.com/2013/10/10/infographic-state-stack-...

In July 2014, Planet Argon's Rails survey showed a surge in preference for PostgreSQL since 2012: http://rails-hosting.com/Results/2014/index.html#Database

Neither captures the distribution of usage from small- to large-scale apps, but it does look like the community has already moved both in mindshare and number of production deployments. I'd wager that Heroku's PostgreSQL default is responsible for much of that.

samlambert · on Sept 3, 2014

I find it quite patronizing that you assume the only reason anyone would use MySQL is that it is the default choice.

Alupis · on Sept 3, 2014

>I find it quite patronizing that you assume the only reason anyone would use MySQL is that it is the default choice.

Really? Defaults are supposed to be sane and the best-fit-for-most-common-scenarios... so it's a little patronizing to automatically assume your project is just so special that defaults are not good enough.

Defaults should be good enough until they have been proven to not be good enough. Don't over-engineer your project.

calinet6 · on Sept 3, 2014

Given the needs of most applications (which don't press either database's capabilities whatsoever), that really is probably the main reason.

yangyang · on Sept 3, 2014

I think there is some truth in it though; that and the inertia it currently has.

If you're picking up a framework, do you use the default database or try to do something non-default on top of what you're already learning?

otterley · on Sept 3, 2014

One minor reason I noticed recently is that you can't change Postgres' listen(2) backlog. This can have a significant impact on responsiveness.

barosl · on Sept 3, 2014

I guess using a connection queue is preferable? (Although it complicates the software stack a bit.)

sjwright · on Sept 4, 2014

I was planning to move to Postgres, until TokuDB went open source.

samlambert · on Sept 3, 2014

The problem with going into too much detail is that it can become a list of recommendations that might not even apply to different people's infrastructure.

Most of the time MySQL optimization is workload specific.

threeseed · on Sept 3, 2014

Scaling databases is EASY if you pick a database designed for it. I've personally managed Cassandra, HBase and Riak on 30+ node clusters for companies. Almost no issues.

It shows how little you know about real world scalability when you actually suggest MySQL over something like Cassandra. It is a nightmare to scale unless you are sharding in your application layer.

Xorlev · on Sept 3, 2014

It can be easier, but saying it's ever easy belies the bevy of issues that can (and will) always crop up.

seabrookmx · on Sept 2, 2014

> new and untested NoSQL databases

MongoDB is new and untested? What rock do you live under?

It is new compared to many of the veteran Relational databases like SQL Server (and I don't mean that in a bad way), but it is a proven technology used by many. See http://www.mongodb.com/who-uses-mongodb

karlick88 · on Sept 3, 2014

Unfortunately, many of those who used mongodb regretted the decision

Mongodb is a good db but it is not good for many use cases that people are using them for. There are a lot of companies that realized the mistake and are actively migrating off it. I know we are planning a "Off-Mongo party" with another company once we both manage to migrate off

lucian1900 · on Sept 3, 2014

I think you're being too kind.

Many of us who used MongoDB have concluded that there is no task at which it is in any way better than all other existing alternatives.

seabrookmx · on Sept 7, 2014

That's a pretty bold statement. Care to enlighten?

seabrookmx · on Sept 7, 2014

I agree that document-oriented databases don't deserve a lot of the hype they get (use the right tool for the job and all that), but there is definitely some use cases where they make sense.

At my company most of our systems are SQL Server powered, but one of the newer systems that stores large blobs of metadata for products is using Mongo, and it is working quite well.

threeseed · on Sept 3, 2014

You are either being disingenuous or you're ignorant.

Many of the companies that switched off MongoDB were growing and ended up moving up to databases like Cassandra. MongoDB is a great database from when you're starting to when you're mid sized.

Cassandra destroys PostgreSQL in scalability but we don't say PostgreSQL is a crap database because of it.

kalleboo · on Sept 3, 2014

What makes Cassandra unsuitable for starting to mid sized where MongoDB is better?

ayrx · on Sept 3, 2014

Calling it new and untested is a bit too much. It's probably more accurate to describe it as being tested and found unworthy for most use cases.

josephlord · on Sept 3, 2014

MongoDB is web scale

http://www.mongodb-is-web-scale.com

kaeawc · on Sept 3, 2014

That has so much win. Thank you.

nodesocket · on Sept 2, 2014

Does GitHub use Percona or MariaDB? We (https://commando.io) switched to Percona, mainly to use their Xtrabackup feature which can do streaming type backups without bringing MySQL to its knees. Also, Percona supports an awesome custom configuration option:

    thread_handling=pool-of-threads

http://www.percona.com/doc/percona-server/5.5/performance/th...

morgo · on Sept 2, 2014

It is worth mentioning that commercial distributions of MySQL feature pool-of-threads as well. Percona Server features what is essentially a re-implementation: http://dev.mysql.com/doc/refman/5.5/en/faqs-thread-pool.html

samlambert · on Sept 2, 2014

I have tested thread pooling with our workload and it didn't go so well. Also

> Note This feature implementation is considered BETA quality.

nodesocket · on Sept 2, 2014

Interesting, love to see your results some day. How are you doing backups? I assume you are not just using mysqldump?

samlambert · on Sept 2, 2014

Both Xtrabackups and mysqldump for logical backups.

15characterlimi · on Sept 2, 2014

Are you using Percona XtraDB Cluster?

samlambert · on Sept 2, 2014

Nope.

nrbafna · on Sept 2, 2014

Do you some figures around improvement from changing thread_handling to a pool instead of one per thread?

The queries on my db server, fit the documented use case for this feature - lots of short live queries.

morgo · on Sept 2, 2014

The analogy I would use for thread-pool is to insert a waiter in front of the chefs in the kitchen.

It doesn't make sense for all workloads, but I have found thread pool to be useful in cases where application servers can overload database servers (either via misconfigured connection pooling, or no pooling).

An example where it might make less sense: a dedicated worker queue running in N threads connecting to MySQL.

walden42 · on Sept 3, 2014

Why not just run backups on a replica suited specifically for that purpose?

Thaxll · on Sept 2, 2014

Why won't you use the Oracle version? atm MariaDB isn't a replacement for the stock version of MySQL.

toomuchtodo · on Sept 2, 2014

MariaDB is a drop-in replacement for stock MySQL. It's performance is actually better than MySQL.

https://mariadb.com/kb/en/mariadb/mariadb-vs-mysql-compatibi...

icot · on Sept 3, 2014

I would suggest that anybody interested in performance test these claims in their own setup. I ran tests doing binary swapping binaries when the releases of both were around 5.5.34 and in my case MySQL CE had between 10%-15% better performance.

When I ran the tests for MariaDB 10.x and MySQL CE 5.6.x the advantage even went further for MySQL with around 20% better performance.

I find always funny how with every release you can get opposing claims from each camp regarding performance.

stock_toaster · on Sept 3, 2014

Not quite drop in. We have had a couple particularly wonky update statements with nested subqueries that ran on mysql, but only produced an error on mariadb.

Thaxll · on Sept 2, 2014

MySQL has years of proved stability, the performance is BS, one graph means nothing, it's like all the people saying they're running thousands of QPS on a server, you just don't know the benchmark and the type of queries running. I can run 3k SELECT in cache...

I'm sure one day MariaDB will replace stock MySQL but clearly it's not ready yet.

btw Oracle is doing a good job with MySQL atm.

heavenlyhash · on Sept 2, 2014

My bets are all squarely on MariaDB. MySQL under Oracle is stymied by conflicts of interest.

One concrete example is hashjoins, a fairly simple and efficient strategy for many general purpose workloads: MariaDB supports hashing as a join strategy since 5.3/5.5 [1] back in 2011. MySQL, to the best of my knowledge, still lacks any implementation of this. OracleDB, of course, supports hash joins. Oracle has every incentive not to implement hash joins in MySQL because hash joins are one of the performance features that they use to drive sales of OracleDB.

[1] https://mariadb.com/kb/en/mariadb/what-is-mariadb-53/#join-o...

sjwright · on Sept 2, 2014

> Oracle is doing a good job with MySQL atm.

Obvious troll is obvious.

> the performance is BS

We're extremely happy with our "BS" performance.

spudlyo · on Sept 2, 2014

Obvious troll is obvious.

MySQL professional here. I'm quite pleased with Oracle's 5.5 and 5.6 releases, and feel that they're doing a pretty good job. While they've not been perfect stewards, I feel they have done better than Sun -- perhaps you don't remember the fiasco that was the 5.1 release?

I don't feel that this is a trollish opinion. Mark Callaghan, a MySQL luminary who has done a lot of excellent work for the community also has positive things[0] to say about Oracle's stewardship of MySQL.

[0] https://news.ycombinator.com/item?id=7483079

viraptor · on Sept 3, 2014

Yes, they're ok-ish. But the original claim was "MariaDB isn't a replacement for the stock version of MySQL".

Actually yes, it is. It doesn't work the other way around (for example timestamp column has a different description, so dump cannot be loaded into MySQL). But you can use MariaDB in place of MySQL and there should be no performance degradation, since they come from the same codebase.

Reliability was mentioned too - in that case Oracle actually failed by holding back tests. In MariaDB you can reproduce testing if you want. In MySQL not anymore.

jeremycole · on Sept 3, 2014

No, not really. MariaDB is completely dependent on Oracle and Percona. While they are doing some good work, they are by no means a complete and independent fork, nor do they have the resources to be.

MariaDB is also impacted by the lack of tests, they are absolutely not making replacements for all those tests, and they continue to pull code from upstream. So, same problem there.

sjwright · on Sept 4, 2014

This might have been true in 2011-12, but today MariaDB is demonstrably NOT dependent on Oracle or Percona. If both disappeared tomorrow, it would still continue.

The fact that they continue to "pull" some code is because they're not idiots, and aren't going to duplicate effort. As far as I'm aware, they are now being very selective about what they pull.

Calling Oracle an "upstream" is a joke. They aren't publishing atomic changesets, which also means the Oracle fork of MySQL is no longer a morally Open Source software program.

MariaDB is NOT impacted by some vaporous lack of tests. They are building tests for every change they're making. As for the tests privately held by Oracle, well, those tests don't help anyone because they're not public and don't enjoy public scrutiny. Who knows if they're even running them?

sjwright · on Sept 4, 2014

> nor do they have the resources to be.

They have received large investments from companies like Intel, and have been granted extensive engineering (and probably financial) help from companies like Google and Facebook. Probably many others. And they have most of the MySQL brains-trust in their employ, Monty most famously. I don't know how anyone could argue they're under-resourced for the task of maintaining and improving a mature product.

Compare that with Oracle, who pulls dangerous stunts like this: http://ronaldbradford.com/blog/when-is-a-crashing-mysql-bug-...

spudlyo · on Sept 6, 2014

I don't know how anyone could argue they're under-resourced for the task of maintaining and improving a mature product.

You mean Jeremy Cole, the guy you're arguing with, who led the effort at Google to standardize on MariaDB[0], who worked for many years with Monty at MySQL AB and who is a recognized leader in the MySQL community?[1] Fuck that guy, I have no idea how he could have such an opinion.

[0]: http://www.theregister.co.uk/2013/09/12/google_mariadb_mysql...

[1]: http://openlife.cc/blogs/2013/april/mysql-community-awards-2...

sjwright · on Sept 4, 2014

I agree, Oracle aren't terrible stewards of MySQL, but that's not my point. Yes 5.5 and 5.6 are good releases. MariaDB 10 is an even better release and shows that Monty has still got value beyond what any large corporation can provide.

I know for a fact that if we were forced away from MariaDB back to stock Oracle MySQL releases, we'd have to expand onto more slaves and fix queries that are no longer optimized.

sjwright · on Sept 2, 2014

Our experience switching from Oracle's MySQL to MariaDB was a solid improvement in performance -- particularly on certain queries where MariaDB's superior query optimizations kick in.

We've also migrated many tables to TokuDB storage engine, and seen phenomenal improvements in performance and scaling. It's so good we were able to de-partition and de-archive our largest tables with no performance penalty.

If you haven't tried MariaDB yet, try it.

If your database is reasonably large (10GB+) and you haven't tried TokuDB yet, TRY IT.

shamsulbuddy · on Sept 3, 2014

Recently i have got the task to work on a 10 GB and 12 Million rows table to make it work with complex aggregation queries like SUM or COUNT.

Server - Ubuntu 14.0.4 Mysql - Percona XtraDB Cluster 5.6 RAM - 16 GB CPU - 8 Cores - 2.95 Ghz SSD - 100 GB

I tried many options --

1) Increased the innodb_buffer_pool_size to 8GB on a 16GB RAM machine -- It helped but nothing magical here 2) Add few more Keys on Date based columns and force the users in front end to select at least one date range column -- Seen some performance gain here 3) Tried MyISAM engine -- I would say in present days MyISAM is history since Innodb itself is pretty much comparable to MyISAM -- so didn't see amy much performance gain .. one disadvantage was that loading of this 12 Million rows took ages .. and also "SELECT table_rows FROM information_schema.tables" had hanged , so i was not able to to figure out that how many rows have been loaded in my Table. 4) Finally I tried Partitioning the Table based on a DATE column -- Massive Performance gain .. If the user has selected a date range which falls under one or two partitions , then you can get the results very fast , even if it falls in many partitions still the performance is acceptable .. Useful ref - http://www.slideshare.net/datacharmer/mysql-partitions-tutor... -- remember that its Mandatory that you add that date column as part of Primary Key if you are Partitioning on that date column .

--- For Ref -- my Innodb settings

# INNODB # innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 512M innodb-flush-log-at-trx-commit = 2 innodb-file-per-table = 1 innodb-buffer-pool-size = 6G bulk-insert-buffer-size = 256M innodb_thread_concurrency = 16

Xorlev · on Sept 3, 2014

If it truly was only 10GB of data, something was seriously wrong with your table, schema, or query.

shamsulbuddy · on Sept 3, 2014

Do you think that 10GB data is easily to manage .. what all common practices do you suggest . any examples with use case

cr3ative · on Sept 3, 2014

Just an observation that the word "index" isn't found in your post - you do have those, right?

sjwright · on Sept 3, 2014

My suggestion is to try TokuDB storage engine on that table. Our (admittedly text heavy) workload got order of magnitude scalability improvements and a dramatic fall in query times.

shamsulbuddy · on Sept 4, 2014

I tried Tokudb as well and below are the stats comparison for complex SUM and COUNT query. Please note that the number of rows and size is exactly the same and there is a partition on a Date column

Table size 10GB and rows 12 Million

Also note that I am using the default settings of Tokudb

With Tokudb - 1 min 41.38 sec With Innodb - 58.17 sec

So almost double.

stonewhite · on Sept 2, 2014

I thought this 'GitHub making MySQL better' was more along the lines of 'Facebook making PHP better'. But impressive nevertheless.

electrum · on Sept 2, 2014

Facebook, Google, LinkedIn, and Twitter making MySQL better: http://webscalesql.org/

derengel · on Sept 3, 2014

Talking out of ignorance, but why is MySQL so dominant in big companies? why do these companies choose MySQL over Postgresql? I see people bashing MySQL all the time because of Oracle, but technically, is MySQL less capable than Postgres? Would it be wrong to start a new business on MySQL?

fulafel · on Sept 3, 2014

MySQL used to suck a lot more some years ago (tip of the iceberg: no transaction support!), the Oracle business is small potatoes compared to numerous earlier shortcomings that lead to data loss left and right.

Inertia explains the current MySQL position reasonably well, a better question is how did it climb to its current position during its years of technical incompetence.

dragonwriter · on Sept 3, 2014

> Inertia explains the current MySQL position reasonably well, a better question is how did it climb to its current position during its years of technical incompetence.

Existing, and having an better install story and windows support than PostgreSQL. When it established its dominance, it wasn't because it was the best open-source multi-user relational database in terms of spec sheet features, it was because it was the one people trying to start something could easily setup and get something running with, which quickly led to it being widely supported on shared hosts and having a large base of people with at least some experience, which then created a nice positive feedback loop to maintain its popularity.

rimantas · on Sept 3, 2014

Oh god, "no transaction support". I wish anyone mentioning that would check, what was the last version missing it and when was it released.

As for "how did it climb" the answer is simple: replication. Working replication was available in MySQL years ago too. Not perfect, but working.

Big companies usually do have competent people who make informed decisions, not based "I've read on the internet that PG is the real DB and MySQL is just a toy".

0xbadcafebee · on Sept 3, 2014

> why do these companies choose MySQL over Postgresql?

There's been a perception for a long time that MySQL was "more lightweight" than traditional RDBMS's and therefore "faster", the same thinking that perpetuates NoSQL solutions today.

Originally Postgres didn't even support SQL. mSQL was developed as an SQL interface to Postgres in the mid 90s. When it turned out Postgres was dog slow on the old-ass hardware the devs were using they just implemented their own lightweight db and mSQL became the top pick for new OSS-based systems. But mSQL was commercially licensed, so MySQL was created for personal use. Since it reused the same API as mSQL, everyone just adopted MySQL as a 'drop-in' replacement. So MySQL is lightweight and fast and free and Postgres is a dog slow incumbent.

I guess you could compare it to how many people feel Java is a humongous pig that can't scale and PHP is fast and lightweight. And obviously lots of sites use PHP. But some shops choose Java because they want something PHP can't offer. (Note: this is not a fair comparison to MySQL and Postgres in any way, but it shows the weird 'feelings' people get for different software)

But also: MySQL has more DBAs, a higher number of installations, more 3rd party support, bigger user/dev community, and in general is more popular.

> technically, is MySQL less capable than Postgres?

Each has individual technical benefits and drawbacks the other doesn't have.

> Would it be wrong to start a new business on MySQL?

What, like, ethically?

MySQL is just a tool. I could 'bash' a table knife by saying it's a dull, heavy piece of shit compared to some other knife, but guess what? Everyone uses table knives. They don't typically use them to debone fish, however. Look at your use case and pick the tool you feel comfortable with that fits it best.

samlambert · on Sept 3, 2014

> why do these companies choose MySQL over Postgresql?

Even though it is not without its faults it is a reliable and high performing database, which is what is needed at scale.

> is MySQL less capable than Postgres?

Definitely not.

yangyang · on Sept 3, 2014

> Definitely not.

Have you used both professionally? Not being arsey, seriously interested.

samlambert · on Sept 3, 2014

WebscaleSQL is a great project.

tsuraan · on Sept 2, 2014

Is there a document on how GitHub uses MySQL? Does it actually store git objects (either as binaries or in a logical equivalent with foreign keys, etc), or is the database just used for higher-level things, like users, organizations, etc?

technoweenie · on Sept 3, 2014

We use MySQL with ActiveRecord primarily (with a few exceptions). So that's all Users, Repositories, Issues, etc.

Git data gets stored with Git on separate file servers.

bluedino · on Sept 2, 2014

>> now featuring a massively updated DB cluster with SSDs and 10GBs networking!

Any further details on the DB servers, SSDs that were used, and the 10Gb networking infrastructure?

ck2 · on Sept 2, 2014

Did you remember to move the undo log outside of ibdata1 ?

http://dev.mysql.com/doc/refman/5.6/en/innodb-undo-tablespac...

Because in mysql, ibdata1 can never shrink, only grow.

And that's how you end up with massive ibdata1 files that cannot be managed.

Oh and undo can only be moved outside of ibdata when mysql is being initialized, not afterwards.

innodb has some serious flaws

http://www.percona.com/blog/wp-content/uploads/2010/04/InnoD...

lmz · on Sept 2, 2014

Can you actually shrink the undo logs once you move them out of ibdata1? From your link:

Users cannot drop the separate tablespaces created to hold InnoDB undo logs, or the individual segments inside those tablespaces.

The most important part to control ibdata1 growth is innodb_file_per_table (default now).

ck2 · on Sept 3, 2014

No you cannot shrink them but more importantly they do not have to be cached.

The rest of ibdata1 has to be cached for performance.

Impossible to isolate the caching needs unless undo is moved outside and 99% of servers out there have probably not been setup with external undo logs because most admin do not know of this limitation after it has been configured.

Only way around this is to rebuild the entire database, loads of fun.

vtal · on Sept 3, 2014

couldn't you use innodb_file_per_table and not worry about the huge ibdata1 file?

ck2 · on Sept 3, 2014

No file_per_table does not change anything about the unlimited growth of the undo log in the ibdata1 file.

_RPM · on Sept 2, 2014

"5am beer" Is it normal to stay in the office all night? (CS undergraduate, never had full time job)

roskilli · on Sept 2, 2014

If you are on call/scheduled for the deployment you may be required from time to time to come online and deploy in off-peak times. This type of maintenance should be extremely rare - in that it requires making your service entirely unavailable, most deployments should be able to be done in a rolling manner and not affect ongoing service availability.

Also usually you are on call every so-often so if your team/company happens to do this often it's not always you performing the deployment.

In regards to "being in the office": most people will do this from home after sleeping most of the night and waking up to an alarm and then coming in a little later the next day. There are a few hardcore ones out there that prefer to pull an all-nighter and do so from the office, although my mileage has been those are quite few and far between.

AYBABTME · on Sept 3, 2014

Experience tells me an allnighter would be a poor idea, as sleep deprivation is just as harmful to cognitive abilities than being drunk. Especially in the times between 5AM and 8AM, which I've found affects individuals the most.

If something goes wrong with your database, you'll be in a shitty condition to make fast and sound judgement calls.

One could say that not everybody's the same, but in those conditions I think that's not true. I've trained extensively under sleep deprived conditions in the military, where we were continually asked to make quick decisions in harsh conditions. Everybody's bad when sleep deprived and everybody's especially shitty at 5AM.

wastedhours · on Sept 3, 2014

The last sentence is interesting to me - I wonder if there are many sys/devops types who are morning people? I'm a developer and function much better at 5am than 11pm, but it'd be an interesting correlation if server-peeps were almost exclusively nightowls.

toomuchtodo · on Sept 2, 2014

> In regards to "being in the office": most people will do this from home after sleeping most of the night and waking up to an alarm and then coming in a little later the next day. There are a few hardcore ones out there that prefer to pull an all-nighter and do so from the office, although my mileage has been those are quite few and far between.

DevOps/Admin here! I prefer the all-nighter, but I've almost always (in 14 hours) done it from home unless physical hardware had to be moved (i.e. forklifted datacenter to datacenter).

showerst · on Sept 2, 2014

Not at any decently functioning company, but during a huge operation like a data center migration or a core technology change, having all hands on deck during non-business hours is fairly normal.

angersock · on Sept 2, 2014

Not at a company worth working at, no.

That sort of thing should be the extreme exception--anything else is just burning out employees.

pmontra · on Sept 3, 2014

I worked for a telco years ago. We had separate development and operations teams. Operations worked on 3 turns of 8 hours, 7x24. They took care of every installation (pre-production and production) and they usually did them in the night shift. The staff was rotated over the months and there was no particular employer burn out.

By the way, one full time equivalent (that is, a hypothetical person working 24 hours a day all year long) equalled to five real people. That is, if they wanted ten people to be always available they had to hire fifty. You can easily understand why these arrangements are not common for Internet companies. Furthermore telcos have different requirements. Devops wasn't there yet and I wonder if it is accepted by management even now. My bets are against it.

I also wonder if companies like Google, Amazon and Facebook are organized in that way too.

angersock · on Sept 3, 2014

Sounds like a sound way of managing it...as others have pointed out, occasional on-call duty or what have you is fine--it's the idea that staying at the office until 3 is "normal" and accepted as part of your equity payout is insane.

samlambert · on Sept 2, 2014

I think all of us where at home during this. I certainly was.

imbriaco · on Sept 3, 2014

I was too. And I had that beer when we finished. ;)

samlambert · on Sept 3, 2014

So awesome.

toomuchtodo · on Sept 2, 2014

During massive migrations, yes. Doesn't count if you're working remote though, you're already at home.

ojbyrne · on Sept 2, 2014

If you read carefully, it seems that it was 5am Saturday, not through the night, and it looks like it was done by 7:15am. The beer was just labelled with the start of the process.

15characterlimi · on Sept 2, 2014

Migrations like this don't happen often, and being on-site would be common when you are running in a physical data center rather than the cloud. When we switched backend DB systems we did it from home.

drstewart · on Sept 2, 2014

No, but it's also not normal to migrate your production database servers to new datacenters. And when you do, you want to do this in a way that will minimally impact your users (i.e. after hours).

nathan_f77 · on Sept 2, 2014

It's not normal, but this was an exceptional situation.

gatehouse · on Sept 2, 2014

They mentioned that their new config has a delayed replica. Can anyone comment on how useful this actually is?

I do snapshots + binlogs so I can do a point-in-time recovery to any time in the last month. So obviously a delayed replica would be a faster way to recover from human error at the MySQL prompt. But it would still require human intervention which is slow and can't really be automated. On the other hand, presumably a process already exists to bring a replica up from scratch, and that could be done and paused at a certain point. So it seems like a lot of extra effort and hardware for a really narrow and constrained benefit.

Anyone running a delayed replica -- is this wrong? Has it been used ever? often? Worth it?

morgo · on Sept 2, 2014

Delayed replicas is a MySQL 5.6 feature (and can be emulated in previous versions).

You are correct in that the main use-case is fast recovery from human error. But as a DBA, I can tell you that accidents like this cover 90% of disasters :)

samlambert · on Sept 2, 2014

It is certainly helpful for grabbing bits of data that you want quickly, it is also another line of defense for DR.

If we make a change and want to quickly compare against an older version of the data delayed slaves are super handy.

We can also restore to any point in time with binlogs.

sjwright · on Sept 2, 2014

We use a delayed replica for our off-site backups. It's a little bit of safety, plus it comes in handy for sanity checking previous state after making big global changes.

coops · on Sept 2, 2014

It would be interesting if someone from Github could discuss why they chose to do this migration by taking the whole site offline and doing the migration all at once. Did anyone investigate if this could be done without taking the site offline?

samlambert · on Sept 2, 2014

Doing this online would of been very tricky while maintaining 100% consistency. We perform major infrastructure changes often without ever having to take the site offline. In this case and at this time it was unavoidable.

I feel 13 minutes of maintenance at 5am PST was a good trade off for the benefits we gained.

coops · on Sept 2, 2014

Can you go into more detail regarding the prohibitive consistency issues? How do you maintain consistency in steady-state (ie. not during migrations?) Also, how do you make the call as to whether to bring your site down vs. attempting a live migration?

imbriaco · on Sept 3, 2014

http://en.wikipedia.org/wiki/Split-brain_(computing)

coops · on Sept 3, 2014

Yes, this is certainly a state you would desire to avoid. However, you didn't answer any of my questions.

why-el · on Sept 2, 2014

I think its a smart decision, given the nature of the product. An off time of 14 minutes on a Saturday very early morning is a price they were willing to pay to make this a one time operation with no (actually reduced) risks of losing data consistency and other pitfalls that come with a live migration.

_pmf_ · on Sept 3, 2014

A 20 point manual checklist for performing critical infrastructure updates does not sound very DevOpsy to me.

MaulingMonkey · on Sept 3, 2014

Obligatory http://xkcd.com/1205/ . Checklists are an insanely super cheap way to provide a repeatable process. They're not sexy, perhaps, but they're damn useful.

There are three factors here for me:

I automate things if it will save me time. I automate things if it will provide necessary reliability to the process. I automate some things which annoy me to do manually, even if I can't justify it on either of those basis.

The second one is tricky. When things go pear shaped in new and interesting ways 10 points into your automatic scripts, do they correctly and automatically recover? Mine generally don't. I'm going to react better to things going pear shaped.

But my coworkers don't necessarily have the same attention to detail with my checklist. This can be because they're not as familiar with the tools, or simply don't believe the rigor justified, or necessary.

BradRuderman · on Sept 2, 2014

I would love to know more about the specific setup they use? Do they use snowflake or something for sharding?

revskill · on Sept 3, 2014

Event Sourcing could help scale database infinitely. But why Github not using it ?

samlambert · on Sept 3, 2014

How do you scale the event store?

revskill · on Sept 3, 2014

It's easy to support sharding because there is one table needed to scale based on one column (AggregateId).

SEJeff · on Sept 2, 2014

Wait, you mean relational databases actually can scale? Say it ain't so! - end sarcasm

Personally, I've always been wowed at what youtube does with mysql. See the entire vitess[1] project for an idea. Thanks github for writing this up though, very neat.

[1] https://github.com/youtube/vitess

threeseed · on Sept 2, 2014

You do realise that most of people using MySQL at that scale aren't using it as a relational store ? They are sharding in the application layer and using it as a dumb key-value store.

No one would argue that MySQL's database engines can't scale. But you could argue that the relational model doesn't scale.

samlambert · on Sept 2, 2014

I think at massive scale a lot of models break down, and I would agree sharding is the key to huge scale.

I often see "MySQL doesn't scale" posts which simply isn't true. I just wish people would stick with it longer and iron out their problems.

jeffdavis · on Sept 3, 2014

"But you could argue that the relational model doesn't scale."

How would one make such an argument? The relational model is simply a combination of logic and set theory used for manipulating data. It's orthogonal to scalability concerns.

latch · on Sept 3, 2014

Pretty sure he means the relations of a normalized model that fail to scale (either due to complexity (too many joins), or size (too much data for a single node)).

jeffdavis · on Sept 3, 2014

Normalization is also a logical concept (and merely a suggestion when it comes to the relational model, not a requirement) orthogonal to physical scalability concerns. Sometimes people use it loosely, assuming a one-to-one correspondence between a relation and a physical file.

These are important distinctions, because a misconception here leads to entirely the wrong solution.

One thing that does have inherent physical constraints is consistency. That's usually what people mean when they say that the relational model doesn't scale, but it would be much less confusing to just say that. Then there would be no reason to dismiss a relational language when designing scalable systems.

SEJeff · on Sept 3, 2014

I was poking fun at people who say SQL doesn't scale when what they really mean and don't realize is that normalized schema doesn't work at large scale. I agree that you pretty much have to shard data and "join at the app level" at sufficient size, but the definition of "big data" changes every day.

10 years ago 1T would be "big data" whereas today, 1P would be "big data". I'm waiting for the time when you can get 1P ssd drives for your laptops :)

JoeAltmaier · on Sept 3, 2014

And they may be based on 'memristor' technology, who's advantage is very low power and 3D physical topology.

on Sept 2, 2014

[deleted]

toomuchtodo · on Sept 2, 2014

Didn't Google scale Adwords primarily with MySQL?

jeremycole · on Sept 3, 2014

Yes, as well as Youtube, Twitter, and Facebook...

opendais · on Sept 2, 2014

I'm tired of the "lol lets mock NoSQL fanbois" behavior on HN. You fail to realize you are acting exactly like the people you are mocking.

Generally the use case for "scale" with NoSQL isn't that MySQL isn't technically capable. It is a cost/benefit for a specific use case.

For instance, if you are storing counters that are purely tracked via key/value ... MySQL is a terrible choice from a server-cost-to-performance-perspective.

molixiaoge · on Sept 3, 2014

get it

EGreg · on Sept 2, 2014

I say it's not fully shipped until it's distributed