Netflix Billing Migration to AWS

partiallypro · on June 22, 2016

I find it to be a peculiar business decision to completely (as much as you can) migrate to one of your largest competitors' cloud service. It seems like Microsoft is the only, of the larger cloud providers, that doesn't -really- compete with Netflix. (Google has YouTube, and I guess even Microsoft has a much smaller Windows Store presence.) Even if Amazon can't access the raw data, they could see how you're utilizing it to improve their own video service, and they get the benefit of billing you and potentially using their pricing leverage to squeeze your margins.

I feel like Apple's approach of utilizing multiple providers makes more sense (though they do this for uptime and redundancy.) Maybe I'm being a pessimist.

tomphoolery · on June 22, 2016

> Even if Amazon can't access the raw data, they could see how you're utilizing it to improve their own video service, and they get the benefit of billing you and potentially using their pricing leverage to squeeze your margins.

So you're saying Amazon is going to risk millions of dollars so they can make a few more bucks on video streaming, which is like 3 levels down from their primary business?

Prove it. I had an argument with my last company about this very issue. If Amazon's primary business was video delivery, then that would make a lot of sense. But where does Amazon's primary revenue stream actually come from? That's right, it's AWS. Amazon may be an excellent retailer, but they spend just as much money as they make on the shipping and fulfillment side to get shit on your doorstep faster and cheaper than anyone else out there. Each person that spends money on Amazon can only really spend a few hundred dollars per year. But even a small company that's entirely hosted on AWS, like 70% of the companies I've worked for, pays Amazon thousands of dollars per month for hosting. There's definitely more people than companies, but shipping stuff to people costs a lot more money than Amazon needs to pay in order to have your stuff hosted by them. Plus they do a lot of R&D into making their own systems faster and more efficient, eventually passing savings down when they re-work their pricing tiers.

Basically, my argument is/was that the whole idea of Amazon stealing your IP to make a few extra bucks on whatever they happen to be doing is totally bunk. Amazon is really in the infrastructure business, and if you're a video startup...you're NOT. And take it from someone who watched a company try and fail to build a competing private cloud with no budget and a skeleton crew...it's stressful and not fun at all.

The main reason to use multiple providers is, as you said, uptime and redundancy...and "not putting all your eggs in one basket", so to speak. It's an engineering, not political, decision. My last company could have probably saved themselves by moving everything to AWS and shutting down their Level3 internet-backbone connectivity and direct fiber from the office to the datacenter (which is the same technology AWS is using anyway, except they have an actual cloud API and not just a pile of servers), but they were too busy conflating this engineering/performance decision with one that must be made for political reasons.

tmptmp · on June 22, 2016

>>Prove it. I had an argument with my last company about this very issue. If Amazon's primary business was video delivery, then that would make a lot of sense. But where does Amazon's primary revenue stream actually come from? That's right, it's AWS.

Let me try. Earlier Amazon's primary business was selling books, then it "became" selling almost every object that can be legally sold, and now you are saying that it "is" AWS. What about tomorrow? Tomorrow, it easily may "become" selling videos too. With Amazon it's very much possible.

So, it's not just technical issue, it's a political/business issue too. Of course, as you have said, they must take into consideration the trade-off. If the trade-off is more like "killing yourself under the technical burden of setting up a good network" vs "potentially allowing/helping Amazon to take advantage of your hosted service on their AWS and thus to become a future competitor" then they may go to AWS and/or other cloud provider(s).

edit: allowing/helping

frik · on June 22, 2016

Microsoft's business shifted too. Long gone are the days they were a software vendor for end user. Nowadays they produce services and hardware products for enterprise customer. And end user are the product, their new subscription based Office and Win10 collects a lot of private data like key presses (keylogger), audio from microphone, scans documents and uploads unspecified tracking data in many encrypted TLS (phone home). And Microsoft is known to suddently compete with you. There is a rule, never compete with Microsoft, they have more money than you, they will make their competiting product/service/hardware available for less money than you could offer.

corobo · on June 22, 2016

Oh come off it. You're going to have to back that one with sources.

thwarted · on June 22, 2016

But where does Amazon's primary revenue stream actually come from? That's right, it's AWS.

Looking at Amazon's Q1 2016 Financial Results [0], page 8, we see that net sales of non-AWS amounts to ~$26B, and AWS is ~$2.5B. From the Segment Highlights section (same page), AWS sales is 9% of total sales.

AWS beats non-AWS income only because there were losses in international; ignoring international, it's a $16m difference.

Looking at page 14, Media sales in North America and International sum to $5.6B. Media sales alone is double AWS's sales of $2.5B (page 13). Profit margins are way higher for AWS, so there's still a lot of room for a larger income difference between the two segments.

Based on this, someone doing media streaming as their primary business needs to be aware of who they are in bed with infrastructure-wise, but I agree that it doesn't mean that AWS isn't worth using just because Amazon is in the same market/area. Media sales are a big portion of Amazon's non-AWS sales, and being digital are most likely fewer headaches than physical goods; but not so big of a portion that one needs to worry about Amazon being the 800lb gorilla that would need to be contended with. It's more likely that Amazon is a threat to Netflix on licensing deals and content offerings (IMO, Netflix is slightly stronger, but neither are great, in content offering).

Is Amazon going to pull the rug out from under Netflix? Not if they want anti-trust attention. Is Amazon going to disallow the Netflix app to run on their devices? I suspect no, since they seem to want to create a platform rather than a walled garden when it comes to devices and media. Is Amazon going to undermine the trust in AWS by using AWS customer's data, or metadata about their customers, for their own gain? Probably not. If I'm going to put money on who's going to do "the right thing", for businesses and tech as a whole, it's going to be on the likes of Amazon rather than, say, Oracle.

Looking closer at (just) this (document), the growth numbers are such that it could go either way as to which revenue stream will eventually be the major contributor to Amazon revenue streams. That's a significant reduction in losses (-60%) over Q1 2015, so International turning around over the next year or so could keep AWS and non-AWS income neck and neck.

[0] http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9N...

runamok · on June 24, 2016

Hell, the prime video player is far more buggy and error prone than the Netflix one at least on my media PC. So they can't even reverse engineer that well.

e12e · on June 22, 2016

Good points. I'd just like to point out that, there are "political redundancy" reasons to diversify to multiple (cloud) providers: you could easily get in a billing dispute with a single vendor - if that means your entire business grinds to a halt, then that puts you in a very weak and vulnerable position.

I don't know if the extra effort of using eg. Google and AWS is worth in order to "stay up when AWS goes down" - but it might be worth it to stay up if/when AWS cuts you off for some reasons (quite possibly due to human error, billing, a take-down notice or other legal dispute etc).

None of that helps if you might accidentally find yourself on the wrong side of the "war on terror" by publishing news - in such a case all your US funds and assets might be frozen, and you would need a non-US presence in order to stay up while sorting out the potential error. But I suppose it's no worse than being subject to other kinds of arbitrary censorship...

L_Rahman · on June 22, 2016

The billing redundancy argument would be applicable generally but in the case Netflix on AWS is not something Netflix has to worry about.

Netflix is a marquee AWS customer. The PR damage of Netflix even making noise about leaving AWS would be terrible for the service as it fights for market share against Azure and Google. Netflix will get their way.

wastedhours · on June 22, 2016

Exactly, if Amazon wants to keep AWS as the backbone of large parts of the internet, they need to be able to prove other business considerations don't affect it - if they ever treated Netflix any differently than another marque client they'd torpedo the trust in them from any organisation who think they might compete in the future.

travmatt · on June 23, 2016

That was my first thought. Sabotaging one of the most conspicuous brands (and arguably their flagship customer) in the United States today sounds like an excellent way to commit business suicide - how many other tech companies would they cede to their competitors? Could a company ever defend using AWS again if they even tangentially competed with Amazon? Moreover, I bet that AWS is much more central to their long-term than Amazon video is. That just seems like a ton of risk for very little upshot.

strombofulous · on June 22, 2016

>Amazon may be an excellent retailer, but they spend just as much money as they make on the shipping and fulfillment side to get shit on your doorstep faster and cheaper than anyone else out there. Each person that spends money on Amazon can only really spend a few hundred dollars per year. But even a small company that's entirely hosted on AWS, like 70% of the companies I've worked for, pays Amazon thousands of dollars per month for hosting

But there are a lot more people than businesses, and an order of magnitude more people than businesses who both need custom hosting infrastructure and choose AWS.

adventured · on June 22, 2016

> But there are a lot more people than businesses, and an order of magnitude more people than businesses who both need custom hosting infrastructure and choose AWS.

The first part is irrelevant, because almost none of those people will ever buy a service from AWS. The second part is false.

An order of a magnitude more businesses are utilizing AWS than 'people.' AWS doesn't exist for consumers or average people, it's for businesses. That was the whole point of its existence: the primary customers of AWS are businesses, and always will be. It's where the radical majority of their sales come from, businesses with greater than 10 employees; not from solo developers spending $37 per month, and absolutely not from your average person that doesn't know anything about cloud computing.

JohnTHaller · on June 22, 2016

Amazon has spent years screwing with Amazon Prime video subscribers on Android to the detriment of the Prime brand and their brand as a whole in the eyes of many of those affected users. Amazon could very well screw with AWS to the benefit of Android Prime video for reasons only upper management can discern.

partiallypro · on June 22, 2016

For starters, your company is not the largest video provider in the world, like Netflix. As someone else mentioned here, diversification can be for "political" reasons as well. Especially if you're as large as Netflix; diversifying could keep your billing in check. Let's say you split between AWS, Azure and GCS. That gives you the ability to scale based on what is cheapest. You could also do public and private clouds in tandum. You can scale your private cloud/on-premise to the public cloud if need be with most large cloud providers (though I never suggested they use on-premise.)

As far as this not being Amazon's primary business, well, they are a lot bigger and diverse. But let's use a Wal-Mart analogy. Wal-Mart is so powerful (maybe not anymore, because of Amazon) that if it doesn't like your wholesale pricing to them, they can move your shelf space and practically destroy your business. They have the leverage in those relationships. Now, with AWS being the #1 cloud provider you have a similar lock-in, at the very least your switching cost would be quite high.

So, let's just for example sake say Amazon make a decision to give their own video streaming priority on their own cloud. That's not breaking net neutrality laws, because...it's their servers, this has nothing to do with telecom. So now Amazon Prime Video, streams 4K at a much better rate than Netflix (and Netflix would likely never know.) Then, perhaps another assumption, AWS decides to increase their pricing tier for their media servers for streaming. So suddenly they have squeezed you in two ways. Quality of Service and pricing.

Amazon could easily do a calculation to compare themselves to Azure and GCS to tell what is the proper amount they could get away with and still make it more expensive just to migrate to a competitor. You're locked-in, and just to make up for your switching, it would cost, let's say 1 years worth of AWS service. Hard to explain your sudden blip in earnings to short minded investors.

Anyhow, all I am trying to say is...it would make more sense, to me, from a business stand point to diversify across multiple providers instead of being all in on AWS. I never said you had to have an on-premise set-up. It would be much different if they were a small start-up, but Netflix is not. It can at times take up more internet traffic than torrenting.

I will point out though that services like Docker are making cloud lock-ins harder to do, but they don't solve reliance on APIs or just proprietary offerings. There's a reason why Google, Microsoft and Amazon all underline, push and constantly improve those offerings as a form of lock-in.

It is possible Netflix has some sort of pricing agreement with Amazon that locks in a rate for x amount of years. But either way, when you're as large as Netflix, I think diversification is the better long term strategy.

curun1r · on June 22, 2016

I don't see Netflix and Amazon being serious competitors...yet. For both of them, the main competitor is cable. The more that either/both of them are able to get people to "cut the cord," the better it will be for both of them.

I can't speak for others, but for me, the decision wasn't, "should I choose Netflix or Prime Video?" Instead, it was, "Can I find enough programming to satisfy me for less than I'm paying for cable?" The only way my answer was yes was with multiple streaming subscriptions.

justinmk · on June 22, 2016

> I find it to be a peculiar business decision to completely (as much as you can) migrate to one of your largest competitors' cloud service.

This was directly addressed in a 2011 talk[1] by a director at Netflix:

> Competition drives cost down

> [...]

> We would prefer to be an insignificant customer in a giant cloud.

[1] https://justinmk.github.io/2016/03/17/selfish-cooperation.ht...

hkmurakami · on June 22, 2016

If you think about it, Apple does something similar by having Samsung fab their processors.

kiwijamo · on June 22, 2016

Amazon could, if they wanted to, obtain a lot more useful data from the data Netflix hosts in AWS than Samsung could from just making chips for Apple. So I'm not sure if it's a good comparison.

limeyx · on June 22, 2016

I think for smaller companies it could make sense but Netflix is big enough and visible enough that if AWS cut them off, I think you'd see a huge exodus off of AWS of people fearing they would be next ... simply not worth it to Amazon to shoot itself in the foot like that IMO

truncate · on June 22, 2016

As per this article, Netflix only migrated their billing infrastructure not their video streaming service.

luhn · on June 22, 2016

They only migrated their billing infrastructure because the rest of Netflix is already in AWS. As far as I'm aware, the only non-AWS Netflix component remaining is their CDN. (And it will likely remain that way.)

niftich · on June 22, 2016

That's because they've already migrated their videostreaming infrastructure to AWS (and some in Google Cloud Storage as a failsafe).

This Ars Technica article from February summarizes the multi-year effort: http://arstechnica.com/information-technology/2016/02/netfli...

ihsw · on June 22, 2016

Just to be clear -- their video streams are still served up via Netflix's OpenConnect appliances. The surrounding infrastructure -- everything "outside" of the play button -- finished moving to AWS.

Their billing system has now joined its siblings in living on AWS.

niftich · on June 22, 2016

Those being the colocated servers they give to ISPs?

EDIT: yes. [1] [2]

[1] https://openconnect.netflix.com/ [2] http://www.datacenterdynamics.com/content-tracks/colo-cloud/...

tacticus · on June 24, 2016

Also the cdn edges they run themselves.

https://www.youtube.com/watch?v=tbqcsHg-Q_o

very cool video.

nitwit005 · on June 22, 2016

Microsoft also sells through the Xbox platform, which is in direct competition for the TV space.

hiuhsdf · on June 22, 2016

Yeah, this decision makes no sense, AWS is not even cost-effective past a certain, relatively small, amount of resource usage. Netflix could have easily built their own infrastructure for this, plenty of smaller companies do it and save tons of money. The only thing that would make sense here is if Amazon offered them their services at a huge discount, which could be a move looking to either acquire Netflix in the near future or cripple it.

GrinningFool · on June 22, 2016

Netflix started with their own infrastructure[1] and migrated it all to AWS. Presumably this was more cost-effective for them.

[1] https://media.netflix.com/en/company-blog/completing-the-net...

tomphoolery · on June 22, 2016

it is WAY more cost-effective when you get to that kind of scale to use someone else's battle-tested APIs. the best part is that when that stuff breaks down, it's not your job to fix it.

NetTechM · on June 22, 2016

Plus, no pricey salaries to have event response teams on the payroll.

bpicolo · on June 22, 2016

I mean, you still have a lot of people doing operational work at Netflix scale, regardless of where your infra is

toomuchtodo · on June 22, 2016

You still pay for an ops team, just with a different skillset.

kchoudhu · on June 22, 2016

At this point, I'd imagine Netflix's main expenses are not engineering/bandwidth/compute -- it's content. My guess is that there's better ROI in paying Amazon a premium to not have to worry about the CDN engineering while concentrating investment on expanding their library.

drewrv · on June 22, 2016

It's an easier task for a smaller company to pull off, and I'll bet Netflix is getting a better deal than most of us could negotiate.

antaviana · on June 22, 2016

How do you know Microsoft will not compete with you sometime next quarter? Did you anticipate the LinkedIn purchase with which Microsoft enters the job search market?

stoic · on June 22, 2016

I don't think the differentiation in that space is in technical implementation, but rather in content quality (produced or "acquired")

buro9 · on June 22, 2016

Thanks for writing this up, I'm just at the tail end of having re-architected the CloudFlare billing system, also a subscription system written in Java with a MySQL back-end, but fronted by a Go API that insulates the rest of the business from the internals of the billing system.

The blog post covers a lot of the high level stuff really well, but I'm interested to learn whether you experienced any issues along the way, what they were, how you dealt with them.

In CloudFlare's case our migration was made more complex by also adding PayPal, and changing our processing gateway. Both of which created risk that we've had to work hard to understand and mitigate, i.e. how different gateways may, with the same card, return different results, etc.

pamonrails · on June 22, 2016

We do a lot of migrations from custom systems or SaaS integrations to Kill Bill (open-source subscription billing and payments platform) [0] and we've summarized our strategy and usual pain points in our migration guide [1]. You might find it useful.

Happy to chat offline too if you want to go into specifics.

[0] http://killbill.io/ [1] http://docs.killbill.io/0.16/migration_guide.html

_lflx · on June 21, 2016

How many times is Netflix going to finalize its transition to AWS?

yeukhon · on June 21, 2016

They won't 100% be on AWS, but probably close to 95%. Netflix owns its own media CDN.

dexterdog · on June 22, 2016

So almost all of their bandwidth is handled outside of AWS. I would wager a guess that bandwidth is quite a bit more than 5% of their ops bill.

yeukhon · on June 22, 2016

I think that's where you have to draw line between owning your own infrastructure and using a public cloud in the architecture.

ww520 · on June 21, 2016

DRBD works very well for high availability, especially good to provide failover for master database, since that usually requires fast failover like under 10 seconds. Five or six years ago, did couple HA setup with DRBD along with Linux HA and virtual IP. The failover work great.

kbenson · on June 21, 2016

I was under the impression the accepted way to fail-over with mysql and DRBD was to fail out the old server, and then start repairing tables, because you couldn't be sure they were in a valid state. That's about a decade old info though.

I always just set up master/master replication with mysql. You can get free distributed reads that way if you architect your application right.

ww520 · on June 22, 2016

DRBD produces two mirror copies of data. When the primary failed, the standby has whatever data the primary has just before the crash. When the standby starts up, the RDBMS goes through the normal recovery and bootup. It's same as the primary crashed and being started up again.

MySQL's master/master has a number of complication and problems: 1. data loss due to async nature of replication, 2. update conflict on same data on multiple masters, 3. two masters mean two IP so all the clients need to know how to fail over to different IP, 4. complication in adding or removing master.

With DRBD, the disks are mirrored so that's no chance of data loss once a transaction is committed. There's only one master so no complicate conflict resolution. Linux HA's virtual IP means the standby will take over the primary IP so the clients don't need to know there's a server failover. Adding or removing standby is easy. DRBD will sync the disks automatically, no downtime on the primary.

kbenson · on June 22, 2016

> 1. data loss due to async nature of replication

That depends on what you mean by async. The replication itself is synchronous (statements cannot happen out of order), it's just not lockstep with disk writes and commits. I think it's more illustrative to say it's delayed.

> 2. update conflict on same data on multiple masters, 3. two masters mean two IP so all the clients need to know how to fail over to different IP

I'm not referring to multiple live masters, I'm referring to a set of servers where each is both master and slave to the other, and one live master server which gets the HA IP address. In that respect, there is no difference to a DRBD replicated setup. Clients just use the HA IP.

> 4. complication in adding or removing master.

I never found it that complicated, and I deployed it at least 5-6 times for multiple companies, and even with slightly different topologies (master/master where each master had an additional dedicated slave reserved for intensive read-only queries). What did you find complicated about it?

ww520 · on June 22, 2016

1. MySQL replication is async by default; see [1]. That means commit returns before replication to peer is complete. Committed data can be lost if master's disk is destroyed before a slave replicates it. NDB has synchronous replication but it has other limitation. 5.7 seems to have a semi-sync mode now.

[1] http://dev.mysql.com/doc/refman/5.7/en/replication.html

4. For complication, you need to find out and configure the binary log position. Also because replication is on the binary log, if you ever truncate the log, you can't simply add a brand new master. You have to do a backup on the primary and restore to the new master, and then set up the log position to just prior to backup. Just lots of extra complication.

kbenson · on June 22, 2016

> That means commit returns before replication to peer is complete.

That's what I was talking about. It's just a matter of what aspect of it you are talking about, but I'll give you that it's in their official documentation, so there's no point in me pressing the issue.

> Committed data can be lost if master's disk is destroyed before a slave replicates it.

That is true. It's a trade-off you can make for slightly different CAP assurances, or nuances in the failure states at least (mostly in what you might expect to do in a split brain scenario).

> For complication, you need to find out and configure the binary log position.

Your backups should be logging the binary log position as well (--dump-slave or --maser-data). If they aren't, you aren't doing yourself any favors.

> Also because replication is on the binary log, if you ever truncate the log, you can't simply add a brand new master. You have to do a backup on the primary and restore to the new master, and then set up the log position to just prior to backup. Just lots of extra complication.

If you have to do another backup because it was truncated recently, you aren't much worse off than doing backups with DRBD replication (which even if you have a slave configured and do backups off that, you can truncate logs and need to backup from the master then as well). The downside is that you may not want to immediately do a backup of the master due to reasons of load, which will leave you without a failover for a short while. Whether an extra queryable resource available is worth that is up the the architect.

I remember a Percona training I was at a few years back there were a few more clustering options available I hadn't played with (and still haven't). Percona XtraDB Cluster was one, and it's supposed to support synchronous master/master replication. That might be the best of both worlds, if it lives up to its billing.

Rapzid · on June 22, 2016

DRBD supports flushes and barriers. You just switch the primary and startup the database. InnoDB will recover as necessary and you are good to go. This is assuming you are using all InnoDB tables(you likely should be).

kbenson · on June 22, 2016

> This is assuming you are using all InnoDB tables(you likely should be).

Ah, that's probably it. ~2004/2005 InnoDB wasn't nearly as popular as it is today, IIRC (but it was there, and was used).

In any case, multiple master with failover setups always worked well in my experience, as long as you take care to track the replication state.

Annatar · on June 23, 2016

DRBD is failover - there is an interruption while the switch is made.

High availability is when you can afford to lose one or more nodes, and there is no service interruption. Moreover, the processing and I/O are symmetric and performance scales linearly by adding more nodes.

merb · on June 21, 2016

They used a DRBD replicated MySQL. Wonder why they used MySQL over PostgresSQL then.. Would be great to know if they looked that up aswell.

yeukhon · on June 21, 2016

The interesting thing is cross-region read replication has existed for MySQL since in 2013, whereas cross-region read replication for PG has been out a few days out officially.

https://aws.amazon.com/blogs/aws/cross-region-read-replicas-...

https://aws.amazon.com/about-aws/whats-new/2016/06/amazon-rd...

I am interested in non-read replica replication...

setheron · on June 21, 2016

What does DRBD have to do with MySQL vs PostgresSQL ? DRBD is just block device replication

gdulli · on June 21, 2016

I think it's just a reflex for (ex-MySQL) Postgres users to ask that anytime they see someone using MySQL. If you have significant experience with both you know the quality of life is different between the two.

bpicolo · on June 22, 2016

Really? Because Amazon, Netflix, Facebook, and Google all have stake in the MySQL ring (google least so). MySQL has a lot of good knowns at scale, too.

gdulli · on June 22, 2016

All companies old enough to have a lot of legacy infrastructure from before it was common knowledge that Postgres is better. It would be more curious why MySQL would be adopted today than 10-15 years ago.

bpicolo · on June 22, 2016

MySQL is insanely performant for simple primary key lookups, and has a lot of good knowns. It also has fewer tripups for things like vacuuming, (pretty much requiring) pgbouncer, pgbouncer pooling settings...

If you're doing things with replication logs, mysql replication logs are a heck of a lot simpler, and there's more tooling for them in the OSS world

That said, I do use postgres a lot :)

Annatar · on June 23, 2016

> MySQL is insanely performant for simple primary key lookups, and has a lot of good knowns.

MySQL is extremely buggy and silently corrupts data. It also does not enforce explicitly requested referential integrity, a core mandate of a relational database management system.

https://www.youtube.com/watch?v=emgJtr9tIME

MySQL lacks adequate authentication model (like OS authentication or SmartCard authentication in Oracle).

bpicolo · on June 27, 2016

Video neglects to mention MySQL strict mode (which one should be using) and solves all these behaviors. They're explicit about this in docs, including what the default is: http://dev.mysql.com/doc/refman/5.7/en/server-configuration-...

Auth is a pretty aside argument. Those sorts of auth are not a particularly common use case. All sorts of auth can go in front rather than be built in, and if those don't apply, sure, might affect your DB choice. Not inherently a reason to not use MySQL though.

merb · on June 21, 2016

Actually I never used MySQL really. But sometimes it's great to know why decisions are made. Actually I don't think they said, "well lets use MySQL over Oracle" especially since MySQL is a Oracle product, too. There would've been a way to use MariaDB aswell. And I guess anything with license costs fall out already (they explained why in the article).

Edit: My guess would be that they still keep galera in mind, but since they didn't shared they why, one could only guess. And Transaction Wraparound maybe.

rimantas · on June 21, 2016

MySQL had useable replication years beofre PG. Maybe that's why.

beachstartup · on June 21, 2016

it's probably because they still had oracle software and support contracts in place, and in-house knowledge of mysql.

merb · on June 21, 2016

Actually you could do DRBD with PostgreSQL, too. There is a Master-Master Replication with MySQL that PostgreSQL don't has, but since they didn't used that it would be great to hear we they've choosen MySQL over PostgreSQL. Would be great if they would share that, too.

gaius · on June 21, 2016

DRBD and what database you use are orthogonal surely?

ecnahc515 · on June 22, 2016

The usage of DRDB somewhat eliminates the need for built in master/master replication, since its handled at the block layer. The two disks are always in sync, thus you don't need mysql's master/master setup. One of the main features missing from Postgres is that feature. Since it's not being used in their Mysql setup many people might ask why they wouldn't just use Postgres.

frik · on June 22, 2016

In the last 15 years every news (on /. and HN) that mentions MySQL is interrupted with such questions from Postgres fanboys. How about: MySQL is the very best solution for such tasks and Postgres isn't the universal holy grail? Wouldn't it be better to try to convert people over from (Postgres direct competitors) MSSQL, OracleDB, DB2 to Postgres?

deathanatos · on June 22, 2016

Frankly, no; I've used all of Postgres, MySQL, MSSQL, and SQLite, and MySQL is the worst of the lot. No, PostgreSQL isn't a holy grain, but any user of MySQL would overall be better served by PostgreSQL; especially on AWS where the replication is so simple to set up, which IMO is the only good argument I've heard in favor of MySQL (but I hear even that is fraught with peril).

MySQL is so completely riddled with bugs and utterly baffling behavior — it is the PHP of relational DBs. What I remember off the top of my head includes attempting to subtract datetimes causing the punctuation to be removed from the datetimes and the resulting "integers" to be subtracted, FK integrity being violated in a number of easy-to-hit corner cases, the "utf8" encoding not being able to encode UTF-8 (and the default being latin1…), GROUP BY allowing obviously (i.e., catchable to the parser) broken queries, `SELECT * FROM table` on <10k row tables taking minutes in some cases, the SQL dialect swapping the words "key" and "index" inappropriately; Read [1] if you want more.

Thus far the only other tool I've seen come close to this level of insanity is MongoDB. The thing about a tool so willfully discards any sort of reasoned approach to its topic area is that the people who use it — who inevitably are not well versed in how a relational database works "in theory" — cannot derive from its behavior its rules, because its behavior is irrational, bordering on psychotic, and the people using it tweak query after query while having no understanding of why one query might work better until some abomination that someone usually works is crafted; `-- Don't touch`. A good tool — I believe, somewhere deep inside me — will teach the novice user. MySQL will not; it will drive you insane.

[1]: http://grimoire.ca/mysql/choose-something-else

Matt3o12_ · on June 22, 2016

It should be noted that most Database supported by cloud providers are actually MySQL (compatible) database. While Amazon supports a managed Postgres Database as well, I would bet that their own MySQL database – Amazon Aurora – works best because it integrated best into the AWS eco system. Same goes for Google, unlike Amazon, the only managed Database they offer is a MySQL database (called Cloud SQL).

I'm not disagreeing with your arguments (quite the opposite in fact), but that might be an important point to use MySQL over Postgres.

adrianggg · on June 21, 2016

I got really excited because I read the heading as Netflix migration costs paid for by AWS. I thought they worked out deal to get a free tier during migration. Wow...oh...nevermind... :-)

ryanmerket · on June 21, 2016

I work for AWS. I believe we do offer some migration assistance for bigger startups. Hit me up if you want to learn more: rmerket@amazon.com

hetfeld · on June 21, 2016

Dropped Oracle, using MySQL. Why not use PostgreSQL instead?

lapitopi · on June 22, 2016

I work on the Netflix Billing Team.

PostgreSQL was indeed a very attractive option, but we wanted to keep a path to Aurora open. When we were working on the migration, Aurora was still in beta, so instead of going to Aurora directly, we decided run our own MySQL instances on EC2.

Annatar · on June 23, 2016

Does "Aurora" fix the inherently broken SQL and integrity constraint mishandling associated with MySQL?

Which security models does "Aurora" offer, for instance, are you able to use OS authentication like in Oracle? What about PKI on SmartCards?

philliphaydon · on June 22, 2016

Why would you go to RDS with such large amounts of data when AWS do not provide ability to get data out easily. If you moved away from AWS in the future for what ever reason your data is more or less stuck in AWS.

jon-wood · on June 22, 2016

Amazon released their data migration service a while back, which allows you to transfer data in just about anyway you might want to, they'll migrate data between different RDS engines (MySQL to Postgres for example), and to databases outside AWS. They even support near real-time replication to database servers outside AWS, so you could hypothetically replicate your RDS instances to a fail over environment with another provider. There's very little risk of your data being locked in now.

[1] https://aws.amazon.com/dms/?nc2=h_mo

philliphaydon · on June 22, 2016

Unless you use SQL Server. The dms service is basically useless with sqlserver. We can't get our 200gb do out of AWS. And any method that works without dms takes about 40 hours.

toomuchtodo · on June 22, 2016

Can you replicate to a slave outside of RDS and then perform a replica promotion during scheduled maintenance?

lilbobbytables · on June 22, 2016

Ahhh, now it makes sense. Thanks for answering.

eterm · on June 21, 2016

Is there a version of poe's law for hacker news / slashdot?

A large company describes their very real efforts and shares their experience and knowledge, and the first comment is "Why not use <preferred thing> instead?".

Rodeoclash · on June 21, 2016

Because knowing how they worked through the decision to come to a conclusion on using a particular product is valuable.

Postgres is usually the first choice for applications like this so knowing why they chose something else over that could influence others that need to make a similar decision.

thinkingkong · on June 22, 2016

When you say usually you probably mean "currently, usually" or "in the last x years usually".

Netflix has super smart people. They arent just picking random tools off the shelf and implementing things without constraints for the sake of it.

angersock · on June 22, 2016

Right, which is why asking how they came to that decision (instead of using a DB dating back over three decades of development) is useful and reasonable.

otterley · on June 22, 2016

> Postgres is usually the first choice for applications like this

Do you have any data to substantiate this claim?

cpollard · on June 21, 2016

I know regarding HIPAA, Aurora isn't supported but mySQL is. Maybe there's a reason why it's not HIPAA compliant that made it unusable for Netflix? And cost?

singlow · on June 22, 2016

What aspects of HIPAA does Aurora cause problems for which wouldn't be a problem for other storage engines? The Aurora engine RDS product supports all of the normal MySQL access controls, SSL in transit and at-rest encryption for the data, snapshots and backups.

wmfiv · on June 22, 2016

I don't believe it's a technical problem. It just hasn't completed the internal AWS process for HIPAA certification to be added to their BAA. Postgres is in the same boat.

ams6110 · on June 22, 2016

Where is the intersection between HIPPA and a back-office customer billing system?

justinhj · on June 21, 2016

Assuming MySQL meets their needs there's no hugely compelling reason to choose PostgreSQL over MySQL.

crisopolis · on June 21, 2016

At least they ditched Oracle (licensed)...

emcrazyone · on June 21, 2016

I worked at a Fortune-5 that was heavily invested in Oracle.

Oracle has a nasty licensing model where they charge you per core regardless of if that core is a physical one or not (hyper-threading). While I was there, it suites told all the engineering managers that Oracle was out and the going forward solution was Microsoft SQL which, as I understand, has more relaxed licensing model.

Another thing I'm wondering about is I would figure Netflix to be big enough to have SAN storage. Just about every large company I worked at always used SAN replication technologies instead of open source stuff. And it's not a debate about open source solution vs. commercial. It's more about support. Large companies want a throat to grab when things break bad.

c17r · on June 21, 2016

MS SQL used to be per socket pricing. With 2012 they switched to per core.

That was a sad day.

e12e · on June 22, 2016

Does anyone happen to know how MS actually charges per core these days? I found: https://www.microsoft.com/en-us/Licensing/learn-more/brief-l... which points to http://go.microsoft.com/fwlink/?LinkID=229882 -- taken together, it looks like AMD hex-core+cpus count at .75 cores, single-core cpus count as 4, and dual-cores count as 2. Given that you need to buy 2-packs, it appears you can get a single 2-pack for a single dual-core (hyper-treading appears to be ignored), 2 packs for a single single-core cpu, or 3 two-packs for 8 AMD cpus?

Annatar · on June 23, 2016

> Oracle has a nasty licensing model where they charge you per core regardless of if that core is a physical one or not (hyper-threading).

That's why you then switch to their T5 or M-series with up to 3,072 processors, and then Oracle comparatively charges you peanuts, since those systems have relatively few physical sockets. And if you know SPARC, and you know Solaris, you can squeeze some serious savings. The problem, it seems to me, is that most system administrators and managers today are neither familiar with SPARC nor with Solaris, so they end up paying more in the long term in licensing and maintenance for running something else like MySQL on GNU/Linux.

But that's their problem, not the guy's who knows SPARC and Solaris and how to save money, isn't it?

MichaelRenor · on June 21, 2016

Netflix doesn't have SAN because it doesn't run anything outside of AWS. AWS provides the throat to hold in your analogy.

emcrazyone · on June 22, 2016

I was pondering about the netflix data center prior to the AWS migration. "Migration of Billing infrastructure from Netflix Data Center(DC) to AWS Cloud was part of a broader initiative."

Sorry, should have been more clear.

As a follow up. Many projects I was involved with did use Oracle replication but it was a rolling log file type that was purposely delayed to account for mistakes. Rolling replication happen across geographic locations while SAN replication dealt with the hot fail-over situations locally.

Annatar · on June 23, 2016

And why is that good? Oracle is worth every penny of those $47,000 per core for an enterprise license, and then some. Provided my company makes a commercial product and I can pay, I would gladly pay Oracle that kind of money (and more) for their database.

Large companies have an enterprise wide license of the Oracle database, so they have thousands of instances (worked at a place like that, that's how I know), and at that point, with that kind of scale, Oracle becomes dirt cheap, considering it delivers the only database I have seen that can withstand the kind of abuse a large institution can throw at a relational database management system.

cia48621793 · on June 22, 2016

What if the Netflix side on AWS was hacked say like their security credentials was leaked?

ForHackernews · on June 21, 2016

Does this mean Amazon can mine Netflix's data to improve their Prime Video services?

thramp · on June 22, 2016

Customer data is sacrosanct within Amazon. Cannot touch it without the customer’s consent.

source: I work for Amazon Web Services.

kevin_b_er · on June 23, 2016

Amazon already showed its hand at maliciousness when it blackballed all chromecasts from stores, including 3rd party sellers. I don't put suddenly blackballing netflix beyond amazon's consideration. Amazon is cutthroat with public customers, corporate customers, and with its own employees. I think netflix is stupid to put more eggs in the vulture's nest.

In fact, I can pretty much guarantee that at the first opportunity where the lawyers agree it is a usable hole, they'll try to kill netflix through denying it service. Taking out netflix for a week or two while the engineers rebuild the backends with a different provider would be excellent for amazon's video division.

star-trek-fleet · on June 21, 2016

There is explicit policy inside AWS to not access any user data.

mgrennan · on June 21, 2016

And Sodden is wrong. The government is not collecting your data.

moosingin3space · on June 22, 2016

Different scenarios. Netflix could sue Amazon for breach of contract if there were evidence of them mining their data.

ascendantlogic · on June 22, 2016

AWS is a gigantic cash cow for Amazon now. If they were caught with their hands in a competitor's cookie jar the fallout would cost them in the billions. They probably wouldn't risk it, but this is just me hypothesizing.

alttab · on June 22, 2016

No, but they can probably look at how Netflix is utilizing AWS and make that better.

the_arun · on June 22, 2016

I would safely assume this data will be encrypted and "anonymized" to minimize the risk of Amazon using this data.

jason46 · on June 22, 2016

Is this why my daughter can't watch Young Justice? I've noticed quite a few titles show unavailable. Curious if Netflix is positioning to sell to amazon.

mikikian · on June 21, 2016

Dropped Oracle, using MySQL. Why not use AWS Aurora instead?

nemothekid · on June 22, 2016

FTA: While our subscription processing was using data in our Cassandra datastore, our payment processor needed ACID capabilities of an RDBMS to process charge transactions. We still had a multi-terabyte database that would not fit in AWS RDS with TB limitations.

jon-wood · on June 22, 2016

Somewhere else in the comments here one off the engineers on this project said Aurora was still in beta at the time but they went with MySQL so they had a reasonably simple migration path later.

back_beyond · on June 22, 2016

This thread is missing Google's PR team and links to its SRE book

innocenat · on June 22, 2016

Am I the only one who initially thought that Netflix is passing the bill for migration to AWS?

Annatar · on June 22, 2016

Considering how much code and data was interacting with Oracle, one of our objectives was to disintegrate our giant Oracle based solution into a services based architecture. Some of our APIs needed to be multi-region and highly available. So we decided to split our data into multiple data stores. Subscriber data was migrated to Cassandra data store. Our payment processing integration needed ACID transaction. Hence all relevant data was migrated to MYSQL.

Considering that Cassandra is not ACID compliant with her "eventual consistency", and that MySQL is notorious for corrupting data and not functioning correctly, I am compelled to wonder just what kind of people work at Netflix. And who gets the idea to go to AWS and pay the full virtualization on Linux performance penalty?

Now, I've done Oracle engineering at some very large databases (hundreds of millions of rows, OLTP and DWH), and I know that Oracle is a smoking fast database when the right people develop on it. Also makes me wonder what kind of code they had running, and what kind of people selected it, when they managed to gum up what is essentially the Bugatti Veyron of databases.

Given this information from "Netflix" I won't be considering them as a potential employer any time soon. It has to be a mess over there.

tedivm · on June 22, 2016

The majority of the corruption issues with MySQL comes from using the innodb engine. AWS built their own MySQL engine called Aurora that I would be shocked if Netflix wasn't using. It's designed for distributed workloads and should be harder to corrupt.

I'll admit I'm confused about picking Cassandra as well, but not for the same reasons you are. They're only storing subscriber data (billing address, subscription type, etc). That data is going to remain static for months at a time. When changed the only potential problem that could occur is the billing process using old data, but I'm guessing their system is smart enough to try again in thirty minutes.

Oracle may be fast, but it's also expensive. This is billing, which means batched jobs running in the background- they don't care how long each individual transaction takes, and I'm positive it's going to be cheaper to roll up more servers to compensate than it is to pay Oracle's licensing fees.

tokensimian · on June 23, 2016

I'm not sure if you saw this other thread, but I thought it answered one of your questions so was worth sharing. Sounds like Aurora wasn't stable when this migration began, so they did the migration with an eye towards the next migration (to Aurora).

Sorry about the c/p but I don't see a permalink on lapitopi's comment.

lapitopi 1 day ago I work on the Netflix Billing Team. PostgreSQL was indeed a very attractive option, but we wanted to keep a path to Aurora open. When we were working on the migration, Aurora was still in beta, so instead of going to Aurora directly, we decided run our own MySQL instances on EC2.

edit: begun/began

ec109695 · on June 22, 2016

Amazon doesn't offer instances large enough on RDS, so they are managing their own.

Do you have any sources for the corruption comparison between Aurora vs InnoDB?

Annatar · on June 22, 2016

> Oracle may be fast, but it's also expensive.

So? That's capitalism: one gets what one pays for, and Oracle is not just fast, it can do a lot, and it can be configured to be paranoid about protecting data, and it has clustering technology meant for scaling, RAC.

Truth be told they could have picked PostreSQL and it would have still been a better solution than Cassandra and MySQL.