Walmart Will Manage Distribution Centers with OneOps, Jenkins, and Kubernetes

avitzurel · on Nov 1, 2016

I had a very interesting discussion about this subject a couple of weeks ago with a CTO of a big company.

I've commented before that Kube absolutely takes over the bigger more complex cloud installations out there, you can see how many companies are betting their infrastructure on it.

The only thing that I don't see is standardization of the cloud, just like what Amazon did. You see too many companies doing too many of the same things and reinventing the wheel.

Personally, I would love to see smaller installation as a standard of how to take things into the cloud as a cluster. Imagine what Heroku did to deployments. You can't beat this ease of use. Deis and Convox are both trying but not really "hitting it".

As for Walmart, absolutely stoked seeing it from them. This move and what the white-house did with the digital shows a lot of promise and hope. I wonder how much of this is on top of "older" management and how much is just complete restructuring.

doublerebel · on Nov 1, 2016

Joyent and Distelli make this about as painless as possible with any current workflow. Docker containers can be launched directly in any number or size to Joyent, and Distelli will app-ify and/or dockerize your app as part of the CI/CD workflow and watch the processes.

Any custom script is just bash, and I can deploy to any OS.

I have a cluster of 5 or 6 instances running 20-30 services in this setup and it's been a dream. I tried to do something similar before, but Cloud66 dropped Joyent. It makes perfect sense because apps can be anywhere between traditionally deployed to fully containerized and still are managed with the same interface and config.

Since it's all open-source my deploy process is portable. But I've evaluated other vendors and processes and all seem more manual and less transparent.

hueving · on Nov 1, 2016

Kubernetes != virtualization

Some virtualization use cases can be solved with kubernetes, but many things an enterprise runs do not work at all in the kube paradigm. The majority of applications are pets (i.e. stateful) and many applications run on Windows or have some other kernel requirements that mismatch the baremetal kubelet.

If you think Kubernetes 'absolutely takes over the more complex cloud installations', you're living in an echo chamber of 12 factor apps that doesn't line up with the majority of what I've seen in big enterprise cloud workloads.

Case in point: Walmart has one of the largest openstack deployments in the world.

majewsky · on Nov 1, 2016

We (SAP's internal cloud platform) are running OpenStack on Kubernetes. In particular, I'm working on containerizing Swift, which presents its unique set of challenges, but is progressing well regardless.

We had a talk at the OpenStack Summit about how we do OpenStack on Kubernetes, if you're interested: https://www.openstack.org/videos/video/cisco-containerizatio...

maloney · on Nov 1, 2016

Have you checked out pet sets[1]? The last couple of kube releases have introduced some really nice work to help with these cases

[1]http://kubernetes.io/docs/user-guide/petset/

paultyng · on Nov 1, 2016

For development environments we are running MSSQL (yes that MSSQL, unclustered though as there is no SAN emulation of course), Elasticsearch, Mongo, Kafka, Zookeeper, Redis, Memcached, all in Kubernetes (mostly as Pet Sets) and looking to possibly do this in prod once we are happy with it. The story for state in Kubernetes is improving rapidly.

Edit: You can find Helm charts here: https://github.com/kubernetes/charts/tree/master/incubator

tmzt · on Nov 1, 2016

Does this fit in with hyperkube in some way? Can you see 'pods' that are really complete VM's which may or may not include OCI containers on them.

For instance, would a complete Windows Server instance fit the Pet Set concept in the scheduler?

101km · on Nov 1, 2016

Nothing to do with hyperkube.

PetSets are simply ordinally ordered container collections instead of randomly ordered.

The usual concept in Kubernetes is that a container has a name like 'thing-randomsuffix4242'. It is pushing you towards making your pods stateless and not precious so the failure handing logic is simple. If it is blown away for whatever reason can easily be replaced by the scheduler and 'thing-randomsuffix242b' is never far away.

With a predictable ordering you can actually make assumptions you otherwise couldn't. For example, perhaps you want the convention that 'thing-predictable1' is the master and 'thing-rpredictable{2,3}' are the slaves.

sandGorgon · on Nov 1, 2016

hyperkube is just a repackaging of all the different packages of k8s as a single binary.

so rather than 'kubectl -v', you will run 'hyperkube kubectl -v'

tmzt · on Nov 1, 2016

Closer to minikube then? That was not at all clear to me from reading the site, where it seemed to be discussing a kubelet on an OS on bare metal or hypervisor, but not managing containers.

sorich87 · on Nov 1, 2016

Not sure when you tried Convox – maybe they have improved since then – but my experience with it last week has been amazing (migrated our application from Heroku).

It was almost as easy and quick to setup as Heroku. All the docker and AWS config was automatically done by Convox. The only issue I had was with Docker: an old Docker Toolbox installation left some environment variables preventing Docker for Mac from starting.

Granted, I have some previous experience managing deployment on AWS via AWS CLI and dashboard so maybe that's what made me quickly understand Convox concepts.

My only regret is I didn't setup with Convox from the start. :)

bacongobbler · on Nov 1, 2016

> Deis and Convox are both trying but not really "hitting it".

Can you elaborate on this? Any feedback on how we "missed the mark" is always helpful.

(I am an engineer working @ Deis)

avitzurel · on Nov 1, 2016

First, important to note: You are awesome and doing amazing work. (Link to what I am doing later in this message).

The problem with all projects, not just deis is that the end user needs to know too much.

I launch about 30 projects a year on top of Heroku and it never ceases to amaze me how simple it is. Yes, the applications are simple. Classic app->db. That being said, the setup is stupid simple.

When I started working on the-startup-stack[1], I was amazed on how much you need to do when you start from scratch. Things I forgot I even did since I have a system running in production with incremental changes for 5 years.

You need to configure networking, decide on DNS, decide on so many thing. It's a lot of work.

Here's what I want (and what I am trying to do with the-startup-stack[1]).

0. AWS KeyPair 1. create-stack 2. launch-app

Lot of what's missing is best practices and production-ready environments.

Going back to that same lunch discussion I had with the CTO of a big company doing a replatforming of their infrastructure: Generalization is the hardest thing here. Reasoning on who's the customer and what does their stack look like.

If you give Deis/OpenShift/Kubernetes to the typical YC founder (I am not trying to insult anyone here) that's trying to get their app in their cloud, it's just too much. It's too much of things they don't care about right now. By not caring about it right now they are likely vendor-locking themselves for a very long time.

[1] http://docs.the-startup-stack.com

TheHydroImpulse · on Nov 1, 2016

You might have already seen this but you might want to check out [1] for inspiration.

[1]: https://github.com/segmentio/stack

It came out of our infrastructure at Segment and works really well.

avitzurel · on Nov 1, 2016

and b.t.w, this is all AFTER you have a docker container. Lets not forget about the huge leap of master -> ci -> docker -> deploy headache.

bacongobbler · on Nov 1, 2016

Agreed. That's the problem with running your own private Heroku; you've gotta know the entire stack to use it.

We had this dream that users can use Deis as a self-serve dev shop but we've been seeing a lot of people interested in running their entire PaaS stack, so the docs definitely reflect the administrator more so than the platform user (like Heroku).

avitzurel · on Nov 1, 2016

I'll be willing to help with facing engineers directly with docs.

jacques_chester · on Nov 1, 2016

Is there a reason not to stay with Heroku and then shift to another PaaS later?

Deis and Cloud Foundry both support buildpacks, so to some extent, it would be possible to move off when the time is right.

Disclosure: I work for Pivotal, we're the majority donors of engineering to Cloud Foundry. (I'm on the buildpacks team as it happens)

jacques_chester · on Nov 1, 2016

> The only thing that I don't see is standardization of the cloud, just like what Amazon did ... Deis and Convox are trying but not really "hitting it"

My colleagues at Red Hat might disagree, they're working on OpenShift.

My more immediate colleagues at Pivotal, IBM, SAP, Microsoft, Google, Cisco, Dell-EMC, VMWare et al might disagree too. We're working on Cloud Foundry and BOSH.

Disclosure: I work for Pivotal.

avitzurel · on Nov 1, 2016

OpenShift origin looks very promising for sure. That's the thing though, you still see too many people not using it.

Here are things that should be standard (IMHO) and you shouldn't reinvent

1. Underlying infrastructure auto scaling (Google, AWS) 2. Service Discovery 3. DNS (internal and external for multiple sources) 4. Networking

The real issue that I feel no-one really answered is who's this for. If a SRE is the target audience than there's a lot more we're missing as a community.

jacques_chester · on Nov 1, 2016

I think Red Hat's problem with OpenShift is getting out from under the brand-masking power of tech they chose. Too many engineers want to roll their own Docker+Kubernetes platform and underestimate the difficulty of doing so.

The thing is, everyone has a different don't-reinvent list. And they want to be not reinvented in different ways. Then they discover all the things that they didn't realise they'll need to reinvent.

> The real issue that I feel no-one really answered is who's this for.

I see PaaSes as serving three constituencies.

Operators, who wish to christ that Developers would stop making their lives hell by breaking stuff.

Developers, who wish to christ that Operators would stop making their lives hell by blocking stuff.

Business, who wonder why everything takes so long, costs so much and breaks so often.

jlgaddis · on Nov 1, 2016

Maybe (hopefully!) things have improved since I last looked at it, but OpenStack is (was?) a complicated, unorganized, over-engineered conglomeration of independent parts.

I sincerely hope that's not the case anymore but that lasting first impression has stopped me from looking into it since then.

cpitman · on Nov 1, 2016

OpenStack (IaaS) != OpenShift (PaaS)

Getting up and running with OpenShift is pretty easy. You can either use the all in one Vagrant VM[1] or you can download the CLI[2] and run `oc cluster up`[3] to install the docker container version.

Either one doesn't give you an "HA" install (probably want a cluster for that). I regularly deliver OpenShift installs, and in most enterprise environments an HA install takes 3-4 weeks, most of which is communicating to all the silos (networking, storage, security, virtualization, etc) what the requirements are.

Disclaimer: I work for Red Hat.

1: https://www.openshift.org/vm/ 2: https://github.com/openshift/origin/releases 3: https://github.com/openshift/origin/blob/master/docs/cluster...

sytse · on Nov 1, 2016

At GitLab we use OpenShift Origin (self-hosted) for our idea to production demo https://about.gitlab.com/handbook/sales/idea-to-production-d... It has been a great experience. Easy to set up and rock solid.

jacques_chester · on Nov 1, 2016

It sounds like you're building from the IaaS layer up, is that correct?

cpitman · on Nov 2, 2016

It depends on what the customer has already. We (Red Hat) support installing OpenShift anywhere that RHEL x86-64 is supported, whether that is bare metal, vms, private/public cloud, etc. For example, our hosted "Heroku like" OpenShift Online is on AWS (https://www.openshift.com/devpreview/).

jacques_chester · on Nov 2, 2016

Thanks. It makes sense that Red Hat would make that guarantee.

jlgaddis · on Nov 1, 2016

Ahhh, yeah, I totally misread that. Sorry!

avitzurel · on Nov 1, 2016

This!

I think #1 and #2 should be the same people. with PaaS I don't see any need for "operators".

If an engineer can't launch a new service with all the requirements EASILY, you're doing something wrong

You're == all of us, myself included.

jacques_chester · on Nov 1, 2016

> with PaaS I don't see any need for "operators".

Yes and no. What we're finding is that you need far fewer operators. But you still need someone to keep an eye on things and run 'bosh deploy' to upgrade the cluster or add a new service. The latter could be automated, but it's one of those things people like to do under supervision.

At Pivotal we run Pivotal Web Services (PWS) -- a version of Cloud Foundry which is usually less than a week behind the current release -- with three shifts of 3-10 people each. Two to five pairs, is how we think of it.

PWS has thousands of VMs and tens of thousands of applications running. Pre-CF, pre-BOSH, an installation of this magnitude would need hundreds of sysadmins to stop it from immediately bursting into pretty but expensive flames.

But in general, you're right. The contract with engineers is "tell me what you want and I'll give it to you". Cloud Foundry does that well.

gogopuppygogo · on Nov 1, 2016

Standardization of the cloud is what http://www.ucxchange.com is working on out of Chicago. It's an interesting model that I was shown recently.

The basic premise is that with standardization of hosting environment platforms an exchange is able to offer their customers a multitude of vendors who compete on price and will have the same features within their environments.

The most interesting opportunity with this is for resellers of cloud computing who can sell compute resources to customers at full price but only pay the exchange based on what they use. It won't last long but if it works resellers will make a fortune migrating customers from big names like Amazon to other big names like IBM through UCX.

_joel · on Nov 1, 2016

I only see US states listed. Is it assumed on this site that cloud infrastructure only exists in the US, or is this due to regulatory constraints?

olalonde · on Nov 1, 2016

> Deis and Convox are both trying but not really "hitting it".

I'm heavily biased towards Deis as I've been using it for a while (and even wrote an UI for it[0]), but what do you feel it is missing? Feature wise, it's almost at parity with Heroku and stability wise, it's not perfect but it doesn't require a PhD in DevOps to maintain either.

[0] https://github.com/olalonde/deisdash#deis-dash

avitzurel · on Nov 1, 2016

Your dash looks awesome!

I replied to the Deis engineer above with my thoughts.

empath75 · on Nov 1, 2016

I messed around with oneops a bit and it's basically a bunch of tooling built around their own fork of chef.

It's almost impossible to just drop into an existing workflow without a whole team of people managing it.

philip1209 · on Nov 1, 2016

NFS in a modern system seems like a huge red flag.

cosmie · on Nov 1, 2016

What's wrong with NFS? If you need a shared, POSIX-compliant drive between multiple hosts, there's not really an alternative. Even the fairly new AWS Elastic File System service is just a NFS4.1-based managed service

jacques_chester · on Nov 1, 2016

While the grandparent comment might be a little too emphatic, surrendering the flexibility of file stores in favour of object stores does lend itself to permitting nice non-functional properties.

fh973 · on Nov 1, 2016

Some of the file systems come with native clients. I'd say for container environments that's a viable alternative to NFS. When done properly the native client can do low latency as it usually has the minimum number of network hops and do failover without requiring things like virtual IPs.

falsedan · on Nov 1, 2016

You can pay money for a SAN, which is $$$$ & enterprise but is a viable alternative…

jo909 · on Nov 1, 2016

A SAN gives you block storage - virtual hard disks. You will need a cluster filesystem to be able to use the same virtual disks on multiple machines concurrently, which are not easy to operate and have their own bunch of performance problems.

existencebox · on Nov 1, 2016

Speaking as someone who has maintained some rather large NFS utilizing systems/datacenters in the last decade, and found it perfectly suitable to my needs, I'd be curious to why you'd say that?

brobinson · on Nov 1, 2016

Not a block device, so not having cgroups (read/write bps limiting + iops limiting) is one annoyance of NFS in large deployments

Karunamon · on Nov 1, 2016

I'd also add extreme fault intolerance. Even when you're mounting soft, or hard+intr, NFS failures have a way of requiring reboots to fully fix.

Scaevolus · on Nov 1, 2016

The intr mount option has been a no-op for 8 years.

/bin/umount does an fstat before the umount syscall, so if an NFS mount is broken, it will probably hang forever instead of unmounting it! You can invoke the syscall directly to get the behavior you need.

acdha · on Nov 1, 2016

That hasn't been dependable in my experience - the best workaround I've found is either a lazy unmount + remount if you can tolerate the hung process taking the full NFS retry timeout to unblock or downing the network interface to accelerate the process.

xir78 · on Nov 1, 2016

This is fair to a degree, but vendors have VIPs that make failures fairly transparent. But if you need low latency on a large data set is there something better? Where low latency means <100ma reads

X86BSD · on Nov 1, 2016

So use Jails, ZFS and rctl. Problem solved.

jamespo · on Nov 1, 2016

How does that fix a client accessing network storage?

X86BSD · on Nov 1, 2016

You can use Samba with it. Unless you absolutely require NFS, samba is far more robust and less prone to needing to be hard rebooted.

xorgar831 · on Nov 1, 2016

I'd agree, it's hard to beat if you need high bandwidth and low latency access to large data sets.

beachstartup · on Nov 1, 2016

throwing away 30+ years of protocol development for the latest and greatest containerization paradigm is what the cool kids south of market street talk about at happy hour.

superuser2 · on Nov 1, 2016

If you want volumes that can be dynamically attached to any machine, you're not buying a SAN, and you're not in a cloud provider with a portable block storage abstraction, what do you propose?

About the only thing I can think of is Ceph, and it stands to reason there is a lot more NFS than Ceph expertise on the job market.

bluedino · on Nov 1, 2016

Wal-Mart is a very odd to work with from the vendor side. They have a purchase order tracking system named PULSE that is completely independent from their EDI system and their supplier Retail Link program (which is the slowest, most horribly designed website you could imagine).

You're required to SFTP up a CSV file of what you've shipped to them that day, along with information about. They also have a creaky acknowledgement/acceptance procedure and the technical folks (seems outsourced) aren't very impressive. You have to pass a 'visual inspection' with your test files and it's the most ridiculous process you can imagine with box-checkers flagging you for the dumbest reasons.

So from this side of the transaction it always makes me wonder about articles like this, all their good tech must be internal only. To be fair to Wal-Mart the situation with other big store chains usually isn't any better.

user5994461 · on Nov 1, 2016

SFTP and CSV are robust and simple. That's solid choices.

jacques_chester · on Nov 1, 2016

This is one of the better bespoke cloud efforts I've seen, but in their position -- given the decision to bet on Kubernetes -- I might've chosen OpenShift instead of rolling my own PaaS.

I've also seen the Jenkins-to-Nexus thing a few times, never particularly happily. That said, I don't have particularly deep experience in Java shops, so it's possible that it works really well in some places.

Disclosure: I work for Pivotal, we're the majority donors of engineering to Cloud Foundry, a PaaS competing with OpenShift.

falsedan · on Nov 1, 2016

  > As shown in the above diagram, Jenkins pipelines deliver application updates directly to the enterprise Nexus instance

Well, at least they're not using Jenkins to run their deploys.