Anyone working on K8s at Box or I guess anywhere else that has deployed it parti...

sporkland · on July 31, 2016

I work at Box on this project.

Our network setup is constantly evolving due to a number of internal networking limitations related to nearly static ip-addressing and network acls. I'll describe our current setup and then describe where we'd like to go. The big piece of context is that we already have a number of services already being managed via puppet and a smaller number of new and transitioned services in Kubernetes so we need to allow interop though a number of different mechanisms.

We are currently using Flannel for ip-per-pod addressability within our cluster. No services are communicating inside the cluster so they aren't using kube-proxy yet. For services outside the cluster talking into the cluster we are using a heavily modified (https://github.com/kubernetes/contrib/tree/master/service-lo...) which we have contributed back yet. It supports SNI and virtual hosts. And we get HA and throughput for the individual loadbalancers by using anycast.

We have a number of internal services outside the cluster slowly moving to SmartStack. So I assume we will be figuring out interop with that and running it as a sidecar at some point. We would like to move to calico as we have some fairly high throughput services running outside of the cluster which we need to avoid bottlenecking on a loadbalancer for. We have separate project running internally to move our network acls from network routers to every host via Calico.

Hope that is more helpful than confusing.

tedreed · on Aug 5, 2016

Thank you for that answer, it's helpful. We've also been considering Calico but it seems like a fair bit of work and the project's pretty overdue as it is.

meta_AU · on July 23, 2016

The K8s slack channel is pretty good for things like this.

You can either bind the container to a host port and register the ip of the node (or use the k8s dns or api to find the ips). Otherwise register a service with a nodeport and all the nodes will accept traffic and load balance internally.

You can get a list of ips from the DNS (instead of just the service ip), and I think that interacts appropriately with host ports.

condiment · on July 23, 2016

We ran into the same ksoftirqd issue in our own bare-metal deployment. Turns out there's a performance regression in the linux kernel that manifests when we configured the system with more receive queues than we had physical cores in a single socket.

We dropped the receive queues down to 12, from 48, and hit line rate. More info here:

https://github.com/coreos/bugs/issues/1275

hosh · on July 23, 2016

I don't work at Box. It has also been 6 months since I touched K8S, so a lot of details I have about K8S in working memory is gone. I'm also interested in the answers to the question you raised.

Off the top of my head:

Have you thought about putting flanneld on the machines hosting the non-K8s services? Probably impractical, but it's something to consider.

The other is to treat the services inside the cluster as if it is in a different datacenter and explicitly expose nodeports that the other services need. If you're using HTTP as the transport, maybe use an http proxy running inside the cluster and proxying them to the services within the cluster. That's how I did it with getting AWS ELB to talk to the services within the cluster I set up.

tedreed · on July 23, 2016

The trick with flanneld on our other hosts is that AFAICT there's no way to run flanneld as purely a "grab routes and install them" without having it get a totally unnecessary (and completely unused) subnet lease.

I have considered just writing a quicky daemon that will do just the work of syncing routes without getting a lease (or trying to modify flanneld to do so).

The service in this case is memcache with a bunch of mcrouter pods in front of it to handle failure and cold cache warming. I still need to get traffic to the mcrouter instances and that's where I'm running into the bottleneck.

hosh · on July 23, 2016

Fair enough. I'm not familiar with mcrouter or memcache.

Fronting the mcrouter pods with a service and using a node port (http://kubernetes.io/docs/user-guide/services/#type-nodeport) is not workable?

mentat · on July 22, 2016

Are you running on physical hardware?

tedreed · on July 23, 2016

Yes, Dell 1950s and R420s. The gateways are R420s with Intel 10gbit cards.