Anyone working on K8s at Box or I guess anywhere else that has deployed it partially feel free to answer this, but:
How do you handle gatewaying traffic into Kubernetes from non-K8s services? I've been trying to get a basic cluster out the door with one of our most stateless services, but I'm having a having a hard time just getting the traffic into it.
The mechanism I'm using is having a dedicated K8s nodes that don't run pods hold onto a floating IP to act as gateway routers into k8s. They run kube-proxy and flannel so they can get to the rest of things, but ksoftirqd processes are maxing CPU cores on relatively recent CPUs trying to handle about 2Gbps of traffic (2Mpps) which is a bit below the traffic level the non-k8s version of the service is handling. netfilter runs in softirq context, so I figure that's where the problem is.
Are you using Calico+BGP to get routes out to the other hosts? What about kube-proxy?
Our network setup is constantly evolving due to a number of internal networking limitations related to nearly static ip-addressing and network acls. I'll describe our current setup and then describe where we'd like to go. The big piece of context is that we already have a number of services already being managed via puppet and a smaller number of new and transitioned services in Kubernetes so we need to allow interop though a number of different mechanisms.
We are currently using Flannel for ip-per-pod addressability within our cluster. No services are communicating inside the cluster so they aren't using kube-proxy yet. For services outside the cluster talking into the cluster we are using a heavily modified (https://github.com/kubernetes/contrib/tree/master/service-lo...) which we have contributed back yet. It supports SNI and virtual hosts. And we get HA and throughput for the individual loadbalancers by using anycast.
We have a number of internal services outside the cluster slowly moving to SmartStack. So I assume we will be figuring out interop with that and running it as a sidecar at some point. We would like to move to calico as we have some fairly high throughput services running outside of the cluster which we need to avoid bottlenecking on a loadbalancer for. We have separate project running internally to move our network acls from network routers to every host via Calico.
Thank you for that answer, it's helpful. We've also been considering Calico but it seems like a fair bit of work and the project's pretty overdue as it is.
The K8s slack channel is pretty good for things like this.
You can either bind the container to a host port and register the ip of the node (or use the k8s dns or api to find the ips). Otherwise register a service with a nodeport and all the nodes will accept traffic and load balance internally.
You can get a list of ips from the DNS (instead of just the service ip), and I think that interacts appropriately with host ports.
We ran into the same ksoftirqd issue in our own bare-metal deployment. Turns out there's a performance regression in the linux kernel that manifests when we configured the system with more receive queues than we had physical cores in a single socket.
We dropped the receive queues down to 12, from 48, and hit line rate. More info here:
I don't work at Box. It has also been 6 months since I touched K8S, so a lot of details I have about K8S in working memory is gone. I'm also interested in the answers to the question you raised.
Off the top of my head:
Have you thought about putting flanneld on the machines hosting the non-K8s services? Probably impractical, but it's something to consider.
The other is to treat the services inside the cluster as if it is in a different datacenter and explicitly expose nodeports that the other services need. If you're using HTTP as the transport, maybe use an http proxy running inside the cluster and proxying them to the services within the cluster. That's how I did it with getting AWS ELB to talk to the services within the cluster I set up.
The trick with flanneld on our other hosts is that AFAICT there's no way to run flanneld as purely a "grab routes and install them" without having it get a totally unnecessary (and completely unused) subnet lease.
I have considered just writing a quicky daemon that will do just the work of syncing routes without getting a lease (or trying to modify flanneld to do so).
The service in this case is memcache with a bunch of mcrouter pods in front of it to handle failure and cold cache warming. I still need to get traffic to the mcrouter instances and that's where I'm running into the bottleneck.
How do you handle gatewaying traffic into Kubernetes from non-K8s services? I've been trying to get a basic cluster out the door with one of our most stateless services, but I'm having a having a hard time just getting the traffic into it.
The mechanism I'm using is having a dedicated K8s nodes that don't run pods hold onto a floating IP to act as gateway routers into k8s. They run kube-proxy and flannel so they can get to the rest of things, but ksoftirqd processes are maxing CPU cores on relatively recent CPUs trying to handle about 2Gbps of traffic (2Mpps) which is a bit below the traffic level the non-k8s version of the service is handling. netfilter runs in softirq context, so I figure that's where the problem is.
Are you using Calico+BGP to get routes out to the other hosts? What about kube-proxy?