Anyone else disappointed the post didn't go into all that much detail? Scaling databases is hard if you're not privileged enough to have access to large pools of money to hire good DBA's.
Interestingly, I would love to see companies like Github opening up their database schemas to the public with mock data. Scaling is one aspect, but the best thing you can do in the beginning is to create a solid schema (normalise, denormalise...) it would be interesting to see what Github uses and why. Still awesome to see MySQL being the choice most large companies like Github choose in the face of new and untested NoSQL databases like MongoDB.
Or better, more solid relational options like Postgres...
Honestly, is there any reason to use MySQL over Postgres at this point? Or is it sort of six of one half a dozen of the other as long as the data model is decent?
INSERT IGNORE and REPLACE are two pretty good reasons, in my opinion. Postgres also doesn't have real table partitioning. Yes, you can sorta kinda hack around the first with stored procs. And yes, you can do something that looks a lot like partitioned tables using table inheritance. And yes, Postgres now has replication support, if you don't mind using only row-based replication (MySQL lets you choose between row-based and statement-based replication) among other tradeoffs.
So yeah, Postgres is a better relational database than MySQL, if you ignore all the things MySQL does better. Another great thing about Postgres is that whenever a site like Hacker News gets a thread about MySQL, you get a bunch of people asking why you aren't using Postgres instead, and then whenever you try and answer the question you get a bunch of Postgres users to tell you you're wrong or that the features you care about don't matter (or that they're hard to implement, which... this is my problem why?) or that Postgres really has replication that's as good as MySQL's this time we pinky swear. So most MySQL users get a first impression of Postgres' user community that is quite frankly rather unfavorable.
Oh, and MySQL has a lot more third-party documentation, tooling and support available.
Perhaps it has something to do with having worked in the pg codebase for a time. It's clean as a whistle, I respect that a lot. But that's certainly not the whole picture.
I was surprised (and a little disappointed) they were not using MariaDB, as it's made by the creator of MySQL, has more features, and generally is considered "upstream" of MySQL these days... seems the MariaDB project would be more receptive to making changes and working with the Github team in order to scale.
I really don't want to get into a debate over the merits of MariaDB versus MySQL, as a lot of it depends on your workload. I investigated moving a MySQL application to MariaDB and some of the features we were using just weren't there yet, and I don't think all of them made it into the most recent release (but don't quote me on that -- I don't maintain that code anymore, so my memory could be hazy on it.)
But the idea that MariaDB is "upstream" of Oracle MySQL is silly. Is Oracle even merging code from MariaDB?
Well, there are two versions of MariaDB -- the 5.x tree (which lags behind the 5.x tree of MySQL and merges changes from MySQL as they are released), and then the 10.x tree which is where MariaDB introduces new features that are not yet in MySQL, and MySQL has merged some of those changes into their project. So, both projects sort of merge each other at times...
I'm no DBA, but I'd think Monty (the creator of MySQL) and his smaller crew at his consulting firm (SkySQL and MariaDB Consulting) which makes MariaDB would be more open and flexible to working directly with Github's teams and needs than through the bureaucracy at Oracle.
So neither is an upstream of each other at this point, they're a fork. And contrary to what people are saying in this thread, as of 10.0 MariaDB is no longer a drop-in replacement for the most recent version of Oracle MySQL and MariaDB's developers are no longer committing to porting all Oracle MySQL features. So MariaDB is no longer a Percona Server-like "MySQL plus goodies" upgrade proposition (and it really hasn't been for a long time - the 5.x series is still at 5.5). MariaDB actually will tell you which MySQL 5.6 features they support:
You're better off on Oracle MySQL. There's other tradeoffs, depending on what you use. What bugs the heck out of me is how a fair number of MariaDB advocates spread FUD about Oracle (if you judge them by their track record, they've committed to improving and maintaining MySQL -- they're not perfect, but that's no excuse to harp about stuff they COULD do when there's no evidence they WANT to sabotage MySQL to force you to switch to Oracle), and want to turn the debate into a holy war rather than focusing on letting everyone pick the best tool for the task.
I think you should research MariaDB some more -- there are a lot of reasons to use it, and a lot of companies are switching. In fact, just today we upgraded our Zimbra cluster and was surprised to see they had made the switch from MySQL to MariaDB. This isn't "fud" as you put it... but rather a better product for a lot of reasons.
> You're better off on Oracle MySQL.
Hardly true, given the two db's are mostly the same except that the creator is now making newer and better changes in the 10.x branch of MariaDB (Monty left Oracle just like most Sun employees due to inner-politics and fighting that is regular at Oracle).
MySQL historically had more support for HA/clustering than Postgres. Recently, there's been a lot of progress on integrating Postgres clustering into the core, to the point where it's mature, but perhaps not as battle-tested. Not a reason to choose MySQL for a startup, I think, but if you've got a cluster working on MySQL and a clear understanding of its pitfalls, there's no real reason to switch to Postgres.
That makes sense, thanks. I guess as long as you have a reasonably optimized relational database of a given class, you're going to be about on the same order of magnitude of performance.
We chose MySQL where I work only because it's easier to hire people with a lot of MySQL knowledge. Technical reasons aren't the only ones when deciding what software/framework/libraries to use.
1. Monitoring and administration tools for MySQL are (better?) more polished.
2. WAY easier to find MySQL DBA's vs. PG DBA's
3. More resources in general around MySQL. Whatever problem/issue you have it's out there already.
There are a lot of reasons IMO. MySQL has a proven track record for stability and performance powering huge sites. I would say MySQL has a nicer replication story as well.
In general MySQL is a lot more widely used with a greater pool of knowledge out there.
MariaDB is a drop-in replacement for MySQL, is made by the creator of MySQL (after he left Oracle post-Sun acquisition), has more features and is now generally considered the upstream of MySQL. I also bet Monty is more receptive to working directly with your teams to scale the product or make changes as necessary.
Tooling. While I agree psql tends to be more performant, when you're dealing with groups of people having good tools (and good documentation for those tools) trumps many considerations.
As an example, the organization I work at is considering a move to psql, but our main barrier is a lack of good DB clients that are accessible to people who aren't software developers. The best we're aware of as far as DB GUIs for Postgres is pgAdmin, whereas if we're talking MySQL you have things like MySQL workbench, SQLPro, and a myriad of other applications available for whatever your operating system of choice is.
After Postgres got replication out of the box I'm pretty sure it's just inertia and the fact that it's installed everywhere keeping mysql usage up. Postgres is so much nicer when dealing with day-to-day things.
The built in query stats is similar to Performance Schema in MySQL. The statistics are quite fine grained, and with a set of views on top (https://github.com/MarkLeith/mysql-sys) are very useful for observability.
Technically, it was introduced in 5.5, but off by default :)
5.7 is going to be amazing for observability. Memory, transactions, stored procedures, replication, meta data locking and prepared statements are all instrumented in P_S.
MySQL is like PHP, its installed almost everywhere, and requires less set up. I did have one problem recently where MySQL turned out to be a better solution, as it has a native bitcount operation, and a larger set of numerical types.
Neither captures the distribution of usage from small- to large-scale apps, but it does look like the community has already moved both in mindshare and number of production deployments. I'd wager that Heroku's PostgreSQL default is responsible for much of that.
>I find it quite patronizing that you assume the only reason anyone would use MySQL is that it is the default choice.
Really? Defaults are supposed to be sane and the best-fit-for-most-common-scenarios... so it's a little patronizing to automatically assume your project is just so special that defaults are not good enough.
Defaults should be good enough until they have been proven to not be good enough. Don't over-engineer your project.
The problem with going into too much detail is that it can become a list of recommendations that might not even apply to different people's infrastructure.
Most of the time MySQL optimization is workload specific.
Scaling databases is EASY if you pick a database designed for it. I've personally managed Cassandra, HBase and Riak on 30+ node clusters for companies. Almost no issues.
It shows how little you know about real world scalability when you actually suggest MySQL over something like Cassandra. It is a nightmare to scale unless you are sharding in your application layer.
MongoDB is new and untested? What rock do you live under?
It is new compared to many of the veteran Relational databases like SQL Server (and I don't mean that in a bad way), but it is a proven technology used by many. See http://www.mongodb.com/who-uses-mongodb
Unfortunately, many of those who used mongodb regretted the decision
Mongodb is a good db but it is not good for many use cases that people are using them for. There are a lot of companies that realized the mistake and are actively migrating off it. I know we are planning a "Off-Mongo party" with another company once we both manage to migrate off
I agree that document-oriented databases don't deserve a lot of the hype they get (use the right tool for the job and all that), but there is definitely some use cases where they make sense.
At my company most of our systems are SQL Server powered, but one of the newer systems that stores large blobs of metadata for products is using Mongo, and it is working quite well.
You are either being disingenuous or you're ignorant.
Many of the companies that switched off MongoDB were growing and ended up moving up to databases like Cassandra. MongoDB is a great database from when you're starting to when you're mid sized.
Cassandra destroys PostgreSQL in scalability but we don't say PostgreSQL is a crap database because of it.
Does GitHub use Percona or MariaDB? We (https://commando.io) switched to Percona, mainly to use their Xtrabackup feature which can do streaming type backups without bringing MySQL to its knees. Also, Percona supports an awesome custom configuration option:
The analogy I would use for thread-pool is to insert a waiter in front of the chefs in the kitchen.
It doesn't make sense for all workloads, but I have found thread pool to be useful in cases where application servers can overload database servers (either via misconfigured connection pooling, or no pooling).
An example where it might make less sense: a dedicated worker queue running in N threads connecting to MySQL.
I would suggest that anybody interested in performance test these claims in their own setup.
I ran tests doing binary swapping binaries when the releases of both were around 5.5.34 and in my case MySQL CE had between 10%-15% better performance.
When I ran the tests for MariaDB 10.x and MySQL CE 5.6.x the advantage even went further for MySQL with around 20% better performance.
I find always funny how with every release you can get opposing claims from each camp regarding performance.
Not quite drop in. We have had a couple particularly wonky update statements with nested subqueries that ran on mysql, but only produced an error on mariadb.
MySQL has years of proved stability, the performance is BS, one graph means nothing, it's like all the people saying they're running thousands of QPS on a server, you just don't know the benchmark and the type of queries running. I can run 3k SELECT in cache...
I'm sure one day MariaDB will replace stock MySQL but clearly it's not ready yet.
My bets are all squarely on MariaDB. MySQL under Oracle is stymied by conflicts of interest.
One concrete example is hashjoins, a fairly simple and efficient strategy for many general purpose workloads: MariaDB supports hashing as a join strategy since 5.3/5.5 [1] back in 2011. MySQL, to the best of my knowledge, still lacks any implementation of this. OracleDB, of course, supports hash joins. Oracle has every incentive not to implement hash joins in MySQL because hash joins are one of the performance features that they use to drive sales of OracleDB.
MySQL professional here. I'm quite pleased with Oracle's 5.5 and 5.6 releases, and feel that they're doing a pretty good job. While they've not been perfect stewards, I feel they have done better than Sun -- perhaps you don't remember the fiasco that was the 5.1 release?
I don't feel that this is a trollish opinion. Mark Callaghan, a MySQL luminary who has done a lot of excellent work for the community also has positive things[0] to say about Oracle's stewardship of MySQL.
Yes, they're ok-ish. But the original claim was "MariaDB isn't a replacement for the stock version of MySQL".
Actually yes, it is. It doesn't work the other way around (for example timestamp column has a different description, so dump cannot be loaded into MySQL). But you can use MariaDB in place of MySQL and there should be no performance degradation, since they come from the same codebase.
Reliability was mentioned too - in that case Oracle actually failed by holding back tests. In MariaDB you can reproduce testing if you want. In MySQL not anymore.
No, not really. MariaDB is completely dependent on Oracle and Percona. While they are doing some good work, they are by no means a complete and independent fork, nor do they have the resources to be.
MariaDB is also impacted by the lack of tests, they are absolutely not making replacements for all those tests, and they continue to pull code from upstream. So, same problem there.
This might have been true in 2011-12, but today MariaDB is demonstrably NOT dependent on Oracle or Percona. If both disappeared tomorrow, it would still continue.
The fact that they continue to "pull" some code is because they're not idiots, and aren't going to duplicate effort. As far as I'm aware, they are now being very selective about what they pull.
Calling Oracle an "upstream" is a joke. They aren't publishing atomic changesets, which also means the Oracle fork of MySQL is no longer a morally Open Source software program.
MariaDB is NOT impacted by some vaporous lack of tests. They are building tests for every change they're making. As for the tests privately held by Oracle, well, those tests don't help anyone because they're not public and don't enjoy public scrutiny. Who knows if they're even running them?
They have received large investments from companies like Intel, and have been granted extensive engineering (and probably financial) help from companies like Google and Facebook. Probably many others. And they have most of the MySQL brains-trust in their employ, Monty most famously. I don't know how anyone could argue they're under-resourced for the task of maintaining and improving a mature product.
I don't know how anyone could argue they're under-resourced for the task of maintaining and improving a mature product.
You mean Jeremy Cole, the guy you're arguing with, who led the effort at Google to standardize on MariaDB[0], who worked for many years with Monty at MySQL AB and who is a recognized leader in the MySQL community?[1] Fuck that guy, I have no idea how he could have such an opinion.
I agree, Oracle aren't terrible stewards of MySQL, but that's not my point. Yes 5.5 and 5.6 are good releases. MariaDB 10 is an even better release and shows that Monty has still got value beyond what any large corporation can provide.
I know for a fact that if we were forced away from MariaDB back to stock Oracle MySQL releases, we'd have to expand onto more slaves and fix queries that are no longer optimized.
Our experience switching from Oracle's MySQL to MariaDB was a solid improvement in performance -- particularly on certain queries where MariaDB's superior query optimizations kick in.
We've also migrated many tables to TokuDB storage engine, and seen phenomenal improvements in performance and scaling. It's so good we were able to de-partition and de-archive our largest tables with no performance penalty.
If you haven't tried MariaDB yet, try it.
If your database is reasonably large (10GB+) and you haven't tried TokuDB yet, TRY IT.
Recently i have got the task to work on a 10 GB and 12 Million rows table to make it work with complex aggregation queries like SUM or COUNT.
Server - Ubuntu 14.0.4
Mysql - Percona XtraDB Cluster 5.6
RAM - 16 GB
CPU - 8 Cores - 2.95 Ghz
SSD - 100 GB
I tried many options --
1) Increased the innodb_buffer_pool_size to 8GB on a 16GB RAM machine -- It helped but nothing magical here
2) Add few more Keys on Date based columns and force the users in front end to select at least one date range column -- Seen some performance gain here
3) Tried MyISAM engine -- I would say in present days MyISAM is history since Innodb itself is pretty much comparable to MyISAM -- so didn't see amy much performance gain .. one disadvantage was that loading of this 12 Million rows took ages .. and also "SELECT table_rows FROM information_schema.tables" had hanged , so i was not able to to figure out that how many rows have been loaded in my Table.
4) Finally I tried Partitioning the Table based on a DATE column -- Massive Performance gain .. If the user has selected a date range which falls under one or two partitions , then you can get the results very fast , even if it falls in many partitions still the performance is acceptable .. Useful ref - http://www.slideshare.net/datacharmer/mysql-partitions-tutor... -- remember that its Mandatory that you add that date column as part of Primary Key if you are Partitioning on that date column .
My suggestion is to try TokuDB storage engine on that table. Our (admittedly text heavy) workload got order of magnitude scalability improvements and a dramatic fall in query times.
I tried Tokudb as well and below are the stats comparison for complex SUM and COUNT query.
Please note that the number of rows and size is exactly the same and there is a partition on a Date column
Table size 10GB and rows 12 Million
Also note that I am using the default settings of Tokudb
With Tokudb - 1 min 41.38 sec
With Innodb - 58.17 sec
Talking out of ignorance, but why is MySQL so dominant in big companies? why do these companies choose MySQL over Postgresql? I see people bashing MySQL all the time because of Oracle, but technically, is MySQL less capable than Postgres? Would it be wrong to start a new business on MySQL?
MySQL used to suck a lot more some years ago (tip of the iceberg: no transaction support!), the Oracle business is small potatoes compared to numerous earlier shortcomings that lead to data loss left and right.
Inertia explains the current MySQL position reasonably well, a better question is how did it climb to its current position during its years of technical incompetence.
> Inertia explains the current MySQL position reasonably well, a better question is how did it climb to its current position during its years of technical incompetence.
Existing, and having an better install story and windows support than PostgreSQL. When it established its dominance, it wasn't because it was the best open-source multi-user relational database in terms of spec sheet features, it was because it was the one people trying to start something could easily setup and get something running with, which quickly led to it being widely supported on shared hosts and having a large base of people with at least some experience, which then created a nice positive feedback loop to maintain its popularity.
Oh god, "no transaction support". I wish anyone mentioning that would check, what was the last version missing it and when was it released.
As for "how did it climb" the answer is simple: replication. Working replication was available in MySQL years ago too. Not perfect, but working.
Big companies usually do have competent people who make informed decisions, not based "I've read on the internet that PG is the real DB and MySQL is just a toy".
> why do these companies choose MySQL over Postgresql?
There's been a perception for a long time that MySQL was "more lightweight" than traditional RDBMS's and therefore "faster", the same thinking that perpetuates NoSQL solutions today.
Originally Postgres didn't even support SQL. mSQL was developed as an SQL interface to Postgres in the mid 90s. When it turned out Postgres was dog slow on the old-ass hardware the devs were using they just implemented their own lightweight db and mSQL became the top pick for new OSS-based systems. But mSQL was commercially licensed, so MySQL was created for personal use. Since it reused the same API as mSQL, everyone just adopted MySQL as a 'drop-in' replacement. So MySQL is lightweight and fast and free and Postgres is a dog slow incumbent.
I guess you could compare it to how many people feel Java is a humongous pig that can't scale and PHP is fast and lightweight. And obviously lots of sites use PHP. But some shops choose Java because they want something PHP can't offer. (Note: this is not a fair comparison to MySQL and Postgres in any way, but it shows the weird 'feelings' people get for different software)
But also: MySQL has more DBAs, a higher number of installations, more 3rd party support, bigger user/dev community, and in general is more popular.
> technically, is MySQL less capable than Postgres?
Each has individual technical benefits and drawbacks the other doesn't have.
> Would it be wrong to start a new business on MySQL?
What, like, ethically?
MySQL is just a tool. I could 'bash' a table knife by saying it's a dull, heavy piece of shit compared to some other knife, but guess what? Everyone uses table knives. They don't typically use them to debone fish, however. Look at your use case and pick the tool you feel comfortable with that fits it best.
Is there a document on how GitHub uses MySQL? Does it actually store git objects (either as binaries or in a logical equivalent with foreign keys, etc), or is the database just used for higher-level things, like users, organizations, etc?
No you cannot shrink them but more importantly they do not have to be cached.
The rest of ibdata1 has to be cached for performance.
Impossible to isolate the caching needs unless undo is moved outside and 99% of servers out there have probably not been setup with external undo logs because most admin do not know of this limitation after it has been configured.
Only way around this is to rebuild the entire database, loads of fun.
If you are on call/scheduled for the deployment you may be required from time to time to come online and deploy in off-peak times. This type of maintenance should be extremely rare - in that it requires making your service entirely unavailable, most deployments should be able to be done in a rolling manner and not affect ongoing service availability.
Also usually you are on call every so-often so if your team/company happens to do this often it's not always you performing the deployment.
In regards to "being in the office": most people will do this from home after sleeping most of the night and waking up to an alarm and then coming in a little later the next day. There are a few hardcore ones out there that prefer to pull an all-nighter and do so from the office, although my mileage has been those are quite few and far between.
Experience tells me an allnighter would be a poor idea, as sleep deprivation is just as harmful to cognitive abilities than being drunk. Especially in the times between 5AM and 8AM, which I've found affects individuals the most.
If something goes wrong with your database, you'll be in a shitty condition to make fast and sound judgement calls.
One could say that not everybody's the same, but in those conditions I think that's not true. I've trained extensively under sleep deprived conditions in the military, where we were continually asked to make quick decisions in harsh conditions. Everybody's bad when sleep deprived and everybody's especially shitty at 5AM.
The last sentence is interesting to me - I wonder if there are many sys/devops types who are morning people? I'm a developer and function much better at 5am than 11pm, but it'd be an interesting correlation if server-peeps were almost exclusively nightowls.
> In regards to "being in the office": most people will do this from home after sleeping most of the night and waking up to an alarm and then coming in a little later the next day. There are a few hardcore ones out there that prefer to pull an all-nighter and do so from the office, although my mileage has been those are quite few and far between.
DevOps/Admin here! I prefer the all-nighter, but I've almost always (in 14 hours) done it from home unless physical hardware had to be moved (i.e. forklifted datacenter to datacenter).
Not at any decently functioning company, but during a huge operation like a data center migration or a core technology change, having all hands on deck during non-business hours is fairly normal.
I worked for a telco years ago. We had separate development and operations teams. Operations worked on 3 turns of 8 hours, 7x24. They took care of every installation (pre-production and production) and they usually did them in the night shift. The staff was rotated over the months and there was no particular employer burn out.
By the way, one full time equivalent (that is, a hypothetical person working 24 hours a day all year long) equalled to five real people. That is, if they wanted ten people to be always available they had to hire fifty. You can easily understand why these arrangements are not common for Internet companies. Furthermore telcos have different requirements. Devops wasn't there yet and I wonder if it is accepted by management even now. My bets are against it.
I also wonder if companies like Google, Amazon and Facebook are organized in that way too.
Sounds like a sound way of managing it...as others have pointed out, occasional on-call duty or what have you is fine--it's the idea that staying at the office until 3 is "normal" and accepted as part of your equity payout is insane.
If you read carefully, it seems that it was 5am Saturday, not through the night, and it looks like it was done by 7:15am. The beer was just labelled with the start of the process.
Migrations like this don't happen often, and being on-site would be common when you are running in a physical data center rather than the cloud. When we switched backend DB systems we did it from home.
No, but it's also not normal to migrate your production database servers to new datacenters. And when you do, you want to do this in a way that will minimally impact your users (i.e. after hours).
They mentioned that their new config has a delayed replica. Can anyone comment on how useful this actually is?
I do snapshots + binlogs so I can do a point-in-time recovery to any time in the last month. So obviously a delayed replica would be a faster way to recover from human error at the MySQL prompt. But it would still require human intervention which is slow and can't really be automated. On the other hand, presumably a process already exists to bring a replica up from scratch, and that could be done and paused at a certain point. So it seems like a lot of extra effort and hardware for a really narrow and constrained benefit.
Anyone running a delayed replica -- is this wrong? Has it been used ever? often? Worth it?
Delayed replicas is a MySQL 5.6 feature (and can be emulated in previous versions).
You are correct in that the main use-case is fast recovery from human error. But as a DBA, I can tell you that accidents like this cover 90% of disasters :)
We use a delayed replica for our off-site backups. It's a little bit of safety, plus it comes in handy for sanity checking previous state after making big global changes.
It would be interesting if someone from Github could discuss why they chose to do this migration by taking the whole site offline and doing the migration all at once. Did anyone investigate if this could be done without taking the site offline?
Doing this online would of been very tricky while maintaining 100% consistency. We perform major infrastructure changes often without ever having to take the site offline. In this case and at this time it was unavoidable.
I feel 13 minutes of maintenance at 5am PST was a good trade off for the benefits we gained.
Can you go into more detail regarding the prohibitive consistency issues? How do you maintain consistency in steady-state (ie. not during migrations?) Also, how do you make the call as to whether to bring your site down vs. attempting a live migration?
I think its a smart decision, given the nature of the product. An off time of 14 minutes on a Saturday very early morning is a price they were willing to pay to make this a one time operation with no (actually reduced) risks of losing data consistency and other pitfalls that come with a live migration.
Obligatory http://xkcd.com/1205/ . Checklists are an insanely super cheap way to provide a repeatable process. They're not sexy, perhaps, but they're damn useful.
There are three factors here for me:
I automate things if it will save me time.
I automate things if it will provide necessary reliability to the process.
I automate some things which annoy me to do manually, even if I can't justify it on either of those basis.
The second one is tricky. When things go pear shaped in new and interesting ways 10 points into your automatic scripts, do they correctly and automatically recover? Mine generally don't. I'm going to react better to things going pear shaped.
But my coworkers don't necessarily have the same attention to detail with my checklist. This can be because they're not as familiar with the tools, or simply don't believe the rigor justified, or necessary.
Wait, you mean relational databases actually can scale? Say it ain't so! - end sarcasm
Personally, I've always been wowed at what youtube does with mysql. See the entire vitess[1] project for an idea. Thanks github for writing this up though, very neat.
You do realise that most of people using MySQL at that scale aren't using it as a relational store ? They are sharding in the application layer and using it as a dumb key-value store.
No one would argue that MySQL's database engines can't scale. But you could argue that the relational model doesn't scale.
"But you could argue that the relational model doesn't scale."
How would one make such an argument? The relational model is simply a combination of logic and set theory used for manipulating data. It's orthogonal to scalability concerns.
Pretty sure he means the relations of a normalized model that fail to scale (either due to complexity (too many joins), or size (too much data for a single node)).
Normalization is also a logical concept (and merely a suggestion when it comes to the relational model, not a requirement) orthogonal to physical scalability concerns. Sometimes people use it loosely, assuming a one-to-one correspondence between a relation and a physical file.
These are important distinctions, because a misconception here leads to entirely the wrong solution.
One thing that does have inherent physical constraints is consistency. That's usually what people mean when they say that the relational model doesn't scale, but it would be much less confusing to just say that. Then there would be no reason to dismiss a relational language when designing scalable systems.
I was poking fun at people who say SQL doesn't scale when what they really mean and don't realize is that normalized schema doesn't work at large scale. I agree that you pretty much have to shard data and "join at the app level" at sufficient size, but the definition of "big data" changes every day.
10 years ago 1T would be "big data" whereas today, 1P would be "big data". I'm waiting for the time when you can get 1P ssd drives for your laptops :)
I'm tired of the "lol lets mock NoSQL fanbois" behavior on HN. You fail to realize you are acting exactly like the people you are mocking.
Generally the use case for "scale" with NoSQL isn't that MySQL isn't technically capable. It is a cost/benefit for a specific use case.
For instance, if you are storing counters that are purely tracked via key/value ... MySQL is a terrible choice from a server-cost-to-performance-perspective.
Interestingly, I would love to see companies like Github opening up their database schemas to the public with mock data. Scaling is one aspect, but the best thing you can do in the beginning is to create a solid schema (normalise, denormalise...) it would be interesting to see what Github uses and why. Still awesome to see MySQL being the choice most large companies like Github choose in the face of new and untested NoSQL databases like MongoDB.