Hacker News new | past | comments | ask | show | jobs | submit login
Netflix VP of IT on the Future of Infrastructure (amplifypartners.com)
78 points by dataisfun on March 13, 2014 | hide | past | favorite | 36 comments



> The notion that something needs to remain on-premise is really an Old World way of thinking and feels more like someone wanting control as opposed to there being a valid argument.

No, it's the business continuity way of thinking. Outsourcing commodities -- such as servers, virtual or otherwise -- is one thing.

Outsourcing your core operational tools, software, and all your data is another matter entirely. Preferring SaaS at a company large enough to afford on-premise solutions is just nonsensical, and I expect it'll either blow up in his face, or just create a never-ending tax on end users who are constantly dealing with a mishmash of vendors, accounts, disappearing services, broken software, and instability.

At scale, stability and continuity is worth more than the opex/capex costs of internal IT.


> Preferring SaaS at a company large enough to afford on-premise solutions is just nonsensical

Yes and no. First off, I don't think Netflix can afford to build the kind of infrastructure they're using -- certainly not without changing focus of their engineering resources. They famously do a lot of work to move data closer to the end users globally -- they're not "just" a US company. Essentially, they'd have to have an operation that would be "qualitatively" similar to AWS in order to be able to do what they do (and at a smaller scale overall, I think that would end up being quite expensive).

I'd argue Netflix is one of the few companies I can think of where this "all cloud all the time" idea for infrastructure might actually make sense. I agree it's a big risk though -- and probably not good advice for most companies.

It would also appear that Netflix is planning on actually selling a product (video rental) and make money off that, rather than have that merely as a vehicle to drive other, sometimes tangential, innovation. That remains to be seen, of course.


Netflix has been on AWS for awhile, and I've never once had any stability or continuity problems with it. Whatever garbage is going on in the background, they've done a good job of preventing it from becoming a never-ending tax on end users.


They built chaos monkey - software agents that randomly go through all their infrastructure on aws and randomly take things down. Network jitter, computer hangs, removing drives, and taking down availability zones.

They do that to themselves so that when it actually happens, they are prepared and the end user (almost) never sees it.


You have a point, but having services on-premise can still achieve the disfunction, mishmash and overall goods times you're describing with a SaaS service.

You might get better response times knowing it's a local problem vs let's say an office in NYC hosting your ticketing system. Regardless, I'm sure they have put a lot of thought into this set up and to me sounds amazing. No ties to a physical location = win.


Sort-of...

Its a cost benefit analysis issue.

Netflix took a very early stance that they wanted a large portion of IT ops pushed out of the company.

One still needs to provide the means to supporting the actual tools which are required to the core business continuity -- but the industry has matured a great deal in the last ten years to the point where you have multi-billion dollar enterprises where their whole core business is that singular department that Netflix has chosen to push out of their management purview via outsourcing that cost....

This is not a bad business decision - the AWSs DOs etc have commoditized all the portions of that previously in-house IT department in a way which everyone can benefit.

The risk is in the last mile.

The most needed disruption in the tech-world today is in the carriers in their current incarnation. THEY MUST DIE.


I can see both sides of the coin though - for a company that is as heavily cloud-invested as Netflix is - it might make less sense to maintain a cage for internal IT in a few datacenters. That said, I do get worried about sensitive data in the cloud.

I didn't get that he was talking about moving to cloud/SaaS vendors as a whole, but more the IaaS/PaaS space - the "Hey, this app runs RoR, can I just run it from a Heroku dyno rather than a VMWare box in a cage" type of move. Granted, there's still a cost associated with that, and a need to plan the move, but it's less of an issue. They are making heavy use of SAML SSO it appears, so the account/vendor thing probably wouldn't be as much of an issue.


It is related to focus. If you are focused internally, can you compete as well externally, and I say no. HR, Finance, Office Automation is less important.


The jargon and acronyms in this interview are intense. It's pretty clearly an industry interview so it's my fault that I don't know the phrases, but I'm a little surprised by how impenetrable it is to me (a software engineer who has worked in large corp environments).

Anyway care to expand on some of the less Googleable acronyms?

- MDM/MAM

- NAC

- EDW (synonymous with ETL?)


MDM/MAM = mobile device management/mobile application management (managing and provisioning mobile devices and their applications, generally automatically)

NAC = network access control

EDW = enterprise data warehouse (archiving old information while preserving access)


Thanks for tip. These are now expanded in interview text :)


It'd be great if Netflix (or some other company) manages to do some heavy lifting in creating a viable, modern, certificate-based authentication and authorization stack, that's easier to deploy. Essentially an upgraded take on kerberos (move off shared secrets, perhaps), AFS (I still don't know what a viable way forward for secure, distributed, locally cacheable network filesystem is -- maybe DAV+TLS+regular caching?). I suppose LDAP might be fine as a user/principal/authorization database, but some distribution that uses internal CA and demands TLS as default would be a good start.

The last "innovation" I'm aware of in this area, is skolelinux/edulinux work with packaging samba/ldap/kerberos/lts in a easy(ier) to manage package for Debian:

https://wiki.debian.org/DebianEdu/Documentation/Wheezy/Archi...


What you're looking for mostly already exists: Active Directory. It's one of the best products to come out of Microsoft ever. I haven't seen anything in the open source world to rival it.

Central auth, auto certificate deployment, encryption by default. The nice bits are Windows only unfortunately.


There's no good alternative for a locally cashing network filesystem to go with that (although AFS isn't bad) -- and, it's not running on the platforms I use (mostly Debian GNU/Linux -- but bsd would be good too).

Even if I was willing to introduce a w2k8 server -- it's hardly trivial to integrate across infrastructure. Eg: set up client auth for ssh in such a way that online verification of certs against a list of cancelled certs works -- and that there are no other ways to authenticate to ssh servers.

I absolutely agree that AD is one of the best things MS ever rolled out -- it's unfortunate they a) broke (or bent) some standards when doing it, and b) just like .net and sql server are great platforms, they're not for me (any more) -- I'd much rather play in an open environment. Mostly so I'm not dependent on a single entity for continuation of services and development.

I know RedHat have their directory server, and Samba4 has basically copied some of the architecture from AD (roll up LDAP, cifs, kerberos all in an integrated set of services) -- and that's great. I'd still like to see a single open design that actually works (and that last bit means it needs to be tested across heterogeneous environments).

I don't think such a system would actually be too hard to implement these days, we have a lot of great components that just need to be fit together and "blessed" with some rigorous packaging and documentation. Perhaps the "best" way would be to wrap kerberos principal key exchange in a public key transport of some sort (but at that point you'd really only be using kerberos for backwards comparability, you'd have moved the trust and authentication implicitly to your CA infrastructure (possibly with a low lifetime of service tickets) -- which could be good or bad depending on your point of view).

Basically what I want, is to have a way to throw a (most likely private) CA-cert on a new box, and then have that box request a cert via on-line csr to a gateway -- that gateway should then be able to forward the csr to the CA (which for high security setups should be air-gapped, for most settings might be a daemon running on the same box). Then once machine certs are set up, probably use service-certs for services (if this sounds a lot like kerberos, that's not an accident) -- or just assume one service per (virtual) machine.

For users we'd need something similar, and we'd need a working online check for validity that defaults to disallow, probably with some caching for local login on laptops/workstations to be able to to some* authentication even when offline (obviously configurable, depending on use-case).

After the years of attack on kerberos (among others) I think many of the risks are well understood -- the challenge is just to build something that is simple enough, but yet works. Dictate a single format for certs, possibly a very limited set of algorithms (but history seems to indicate that some sort of visioning is needed, maybe explicit "valid sets" rather than open negotiation?).

Anyway, sorry for the post, probably should've been a blog post :-)


kerberos is very good. everyone reinvents kerberos every month. it doesn't have to be new to be "modern". Kerberos is still modern by today's standards, in fact.

the problem is having tools that communicate with each others and an easy setup.

yeah, SAML kinda sucks to use too.. and works like kerberos anyway. OpenID, Hawk, etc - also in fact work exactly the same.


Seriously. At my day job we're investigating better ways to handle single sign on and authentication and every road leads back to Kerberos.


I'm not saying Kerberos isn't good, and obviously it's going to be better than any "new" system -- after all any new system hasn't seen any real-world testing. All the fluff around the various (http-centric) SSO-solutions is partly from wrapping them around SSL/X.509 -- just as IMNHO one of the problems with setting up (a secure and easily maintained) kerberos deployment isn't kerberos but LDAP.

As mentioned up-thread MS AD does a great job of enabling in-house CA and management -- and it's mostly that I want. I want to use certs for auth most places, and I want it easy! Openssh have shown that public key auth doesn't have to be hard -- but also doesn't have a very compelling story around managing access. The new cert-system might be an improvement -- but it absolutely needs some infrastructure around it to be easy to deploy (and verify).


in fact openssh has support for full-blown certificates but its also a little more painful. what makes ssh easy to use is that it doesn't have any central trust authority by default. you get a fingerprint and you trust it.

if it changes, it warns you.. but in most cases you're going to know why it changed or just accept the change anyway (which is a problem when you admin 10000 servers of course as the warning might be a real issue)

central trust/revocation is still an issue everywhere to this day, i think. both technically ("my client trusts this, but do i?") and from the useability pov.


> openssh has support for full-blown certificates

Well, yes and no. Do you mean the new cert stuff that's in standard openssh? Which has stuff like:

    The marker is optional, but if it is present then it must be one of “@cert-authority”, to indicate that the
     line contains a certification authority (CA) key, or “@revoked”, to indicate that the key contained on the line
     is revoked and must not ever be accepted.  Only one marker should be used on a key line.
While certainly simple, it doesn't strike me as very manageable.

Or did you mean the x509 patch?

http://roumenpetrov.info/openssh/


IIRC there are a couple of things that might be tweaked in the protocol wrt sign before encrypt? Also, with non-encumbered public key cryptography available, and no longer prohibitively resource intensive, we can do better than NxN shared secrets.

I do indeed like kerberos - but I still want a straight forward and reasonably robust framework built on certificates.


I thought the 2014 Technology Roadmap [1] was an interesting read. For an organization as "young" as Netflix, I was surprised by the technology debts that they've accumulated and the aggressive tone that they've set to transition.

I think it's amazing the decisions that get made with explosive growth/hiring that end-up on roadmaps that read similarly to organizations that have been around much longer.

There's no criticism here. I think Netflix is an amazing company and it is the this sort of strategic vision (and the openess of both it and the organization overall) that reminds me that we're all on this rocky ship together and it's amazing that any of it works sometimes.

[1] http://www.slideshare.net/mdkail/it-ops-2014-technology-road...


Forgive my ignorance. Is `IT` the same as `engineering` at other companies, or is this something else?


"IT" is too vaguely defined to mean much anymore, but based on this interview I suspect it's internal infrastructure at Netflix. Stuff like staff PCs, internal data warehouses, sales, finance, and marketing software support, WiFi APs, routers, keeping the backoffice servers and network up, ensuring reliable WAN and LAN connectivity so engineers can reach production securely, intrusion detection and analysis, and so on.

Generally in smaller software companies I hear R+D and consumer-facing applications referred to as "engineering" with external-facing infrastructure (like the production datacenter) referred to as "operations," with "IT" being reserved for this internal backoffice kind of stuff.

In other places, especially larger corporations, I've often heard everything having to do with a computer lumped in as "IT."


Yea I was surprised. When I saw the title I thought infrastructure meant network infrastructure, which seems like a white-hot area of innovation given how much internet traffic they consume. But no, this interview was on internal IT.


His actual title is "VP of IT Operations", which is usually part of an Engineering department, but very close to the executive team because of things like budget and business guarantees. Note the Unix / Networking background.


IT is usually the catchall infrastructure/support function. For some companies (like Netflix) you probably also have a product engineering group.


It's office automation. As opposed to Netops, that operates their business such as video streaming.


One thing that stuck out to me:

> We are implementing “certificate-based authentication” instead of the standard username/password auth against Active Directory.

I wish we were all doing this. How long is it going to take to get a usable certificate-based client/user authentication mechanism on the web?

edit: Also see e12e's comment.


TLDR: The approach we take for IT works (for us at this point in time in the scope defined as IT by me and/or our internal customers).

Netflix talks generally can be fascinating and inspiring. However, when considering IT it's also important to consider the charter and challenges of Netflix IT.

i.e. it's no more valid or invalid that the talks of how IT is delivered in so-called build vs. broker models in other companies in other industries http://dilbert.com/strips/comic/2013-07-05/


Reading the slides make me think this is full of nothing :| How is 802.11ac speed making things "more cloud"? Because you get slightly more bandwidth -maybe- if you have a new laptop and also you dont have everyone using it? I don't get it.

Requiring VPN everywhere, how is that cloudy?

Finally, using stuff like AWS is nice, but unless they have a specific contract (which they may since they advertise them a lot), its a LOT more expensive when you start having a lot of processing (ie big companies like netflix)


Well, if anybody could get a good deal from AWS, it's Netflix.


I really wanted to share this article with my friends, but it was so filled with buzzwords that even my dev friends wouldn't ascertain much.

Keep in mind I'm an idiot, and I have idiot friends.


haha. Well, I doubt you're an idiot. Buzzwords serve a purpose insofar as they're a good shorthand for defining categories of product offerings. E.g., ETL, Data Warehouse, Mobile Device Management, etc. all have pretty well understood parameters within which the vendors, buyers, analysts, etc. operate.


Why was there no discussion of the ethics of the Comcast deal?


Because this was a conversation about internal infrastructures?


> zero-trust network architecture

This makes me think of http://meldium.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: