Thanks for this insight. Skimming through the documentation [0], I find no ‘issues’ section that mentions this. IMHO Microsoft behaves untrustful by hiding an issue like this.
> Early in the start-up process and at runtime, the silo will probe Kubernetes to find which silos do not have corresponding pods and mark those silos as dead [1]
This is an experimental fix in a MR [1] that comes 6 years after Microsoft announced support for Kubernetes [2].
> Orleans was created at Microsoft Research and designed for use in the cloud. [0]
I read: MS research developed the concepts of .NET Orleans 10 years ago without having Orchestrators like Kubernetes in mind. Now as an afterthought they have to come up with some patches for it not to cripple on short-lived Kubernetes nodes.
A proper Operator & Helm chart was requested in nov 2017 [3] and the issue still is open. I get the impression that it is either not possible to write a proper Kubernetes Operator without a major rewrite of .NET Orleans or Microsoft has a commercial interest for .NET Orleans not being a first class citizen on Kubernetes.
Whichever it is, I’m not going near a Kubernetes cluster hosting .NET Orleans.
Interesting take. We run Orleans on Kubernetes in production at Microsoft on multiple services. Other services run Orleans on Service Fabric in production - SF is similar to Kubernetes in terms of lifecycle. The project is open source, so things like Helm charts and k8s operators would be welcome contributions.
> why it took up to 6 years for MS to release an experimental fix [1] to run on Kubernetes?
> where I can find the outstanding issue in the release notes that [1] tries to fix?
- if .NET Orleans runs fine on Kubernetes, why is there a need for an experimental fix?
There is no experimental fix, just improvements. Things which can make life easier for developers running on Kubernetes by automating some things (setting addresses), and taking advantage of information that's available in a Kubernetes cluster (whether or not a pod has been deleted) and feeding that into the cluster membership system.
The reason the latter is useful is that it addresses something which can occur during initial dev/test, but which does not come up in production cases: when an entire cluster is deleted and redeployed with the same identity, the new instances try to contact defunct instances for a few minutes as a safety measure. The enhancement is to query Kubernetes to determine if it's worth trying to contact those nodes, or whether they're almost certainly dead.
- why Microsoft doesn't write and release an Operator & Helm chart, instead asking the community?
Helm charts aren't something we see requested often. Perhaps because Orleans is a framework which is embedded into the developer's application and not a service which gets deployed and stands alone (compared to, for example, a database). There are no separate Orleans pods, just the user's application pods. Internal users have been building applications on Kubernetes with their own Helm charts. Microsoft is not asking anybody to create those things, or anything, unless they want them for themselves.
> Early in the start-up process and at runtime, the silo will probe Kubernetes to find which silos do not have corresponding pods and mark those silos as dead [1]
This is an experimental fix in a MR [1] that comes 6 years after Microsoft announced support for Kubernetes [2].
> Orleans was created at Microsoft Research and designed for use in the cloud. [0]
I read: MS research developed the concepts of .NET Orleans 10 years ago without having Orchestrators like Kubernetes in mind. Now as an afterthought they have to come up with some patches for it not to cripple on short-lived Kubernetes nodes.
A proper Operator & Helm chart was requested in nov 2017 [3] and the issue still is open. I get the impression that it is either not possible to write a proper Kubernetes Operator without a major rewrite of .NET Orleans or Microsoft has a commercial interest for .NET Orleans not being a first class citizen on Kubernetes.
Whichever it is, I’m not going near a Kubernetes cluster hosting .NET Orleans.
[0] http://dotnet.github.io/orleans/Documentation/index.html
[1] https://github.com/dotnet/orleans/pull/6707
[2] https://azure.microsoft.com/en-us/blog/azure-collaboration-w...
[3] https://github.com/dotnet/orleans/issues/3692