It is not. And I am hoping that someone will correct me if I am wrong about some of this (because I don't really want to learn Mesos for myself), but here is how it is not:
In Kubernetes, you have nodes. Your containers run directly on nodes. This is very much the same as running your kernel inside of a VM. It's there, on the computer, running. One kernel, one VM. One node, many containers, but the containers are still each running on exactly one node.
There are however five or more abstractions between your containers and the nodes that they run on (ingress routes traffic into service, which points at deployment, and under deployment is a ReplicaSet, that spawns pods, which may run containers in them). Through these you may arrange to provide Highly Available service guarantees.
But if that node goes down, your container is down until the pod can be restarted on a different node, or possibly until that node comes back. So you ensure that your pods can be scaled out horizontally through some potentially painful process if your product is not new and you've made bad decisions, or if it's from a vendor and the vendor has made some bad decisions that are perhaps not easy to undo. (If you chose the wrong vendor. It happens. Time to update your resume perhaps.)
In Mesos, you have Agents and physical machines, and the agents take care of "work units" that are roughly a share of some task in the same sense that on VMware DRS[1], your RAM and CPU cycles are drawn from a pool, and they may come from any of the physical nodes in your cluster at any time.
I think this was called "non-locality" in vSAN. Do not think about your data as being on a disk, because enough hot copies of it may be scattered across a fleet of disks that at any given time, you might even have a local copy but an agent decides that you're going to read from and write to somewhere else across 10gbit backplane instead.
That's where I'll draw the parallel, and that's where I'll stop trying to draw parallels because I'm not trying to tell you about VMware, and hope that someone comes along to correct me if I'm actually telling lies about Mesos. Because I don't really know about Mesos and could be talking right out of my ass. But from what I've read, I'm thinking not too far off.
(I had access to full ESXi clustering and vSAN license for a year, and we turned on all those features before I left that company. This is why I think that I know something about VMware, at least.)
I'd assume that you can get the same kind of guarantees from Mesos that I could get from VMware's DRS and HA solutions. In other words, if a node goes down and I have enabled all the, err, checkboxes...
...then I won't need to wait for that node to be rescheduled, I won't even notice an interruption in the continuous operation of the container(s) unless the loss of a node represents enough missing CPU cycles that it puts my cluster "over the edge." I don't need to have pod replicas in the sense that Kubernetes needed me to have them in order to provide that guarantee. Because they are not down. Their shares of work are just sent to another worker from the pool seamlessly.
(They might not even have been scheduled onto one physical machine during normal operation, the agent decides where to send the shares and may change its mind about that any time based on new information about load. You're never supposed to notice that unless you're paying very close attention to it.)
I'm extrapolating how I think Mesos must work based on what I've heard about it, and how I know you can get VMware to do if you're willing to spend millions of dollars scaling it out, or if your hardware is already big enough and you know the salesman that is willing to sell you a zero-support no-maintenance-included contract at 97% discount so you can use the HA and FT features when you need them. (I used ESXi for 4 years without any of these features and it was fine, but they are nice features and you need to arrange things a bit differently when you don't have them. vSAN is not RAID5.)
[1]: "What is DRS"
VMware DRS (Distributed Resource Scheduler) is a utility that balances computing workloads with available resources in a virtualized environment. The utility is part of a virtualization suite called VMware Infrastructure 3.
Mesos actually handles the lower level part (determining which machines are up and which resources they have (CPU, memory, disks, network ports, GPUs), running tasks on machines, isolating resources from each running task on each machine, either with its own containerizer or using Docker, etc), but the tasks themselves and their scheduling details (how many tasks to run, should it try to run each task on a different machine or not, etc) are determined by the framework you are using. There are some popular "meta-frameworks" (like Marathon and Aurora) that abstract away the Mesos interface and let you just say "run five instances of this Docker container, one per physical machine". They also might handle higher level details, like service registration/discovery, rolling updates and more.
>Because they are not down. Their shares are just sent to another worker from the pool seamlessly.
Eh, depends on what you consider by "seamlessly": basically if a node goes down, everything it was running will be rescheduled and ran again on other nodes, but this might take some minutes, and ensuring this failure is not perceived from the outside depends entirely on you. Disk management is entirely up to you also, so you have to roll your own SAN if you want to have something like that (at my company Mesos disk space is completely ephemeral, and truly persistent state is always stored on S3 or external databases).
That helps. I won't try to compare it to HA ESXi environment again :D
Can you tell in a sentence, or paragraph maybe, why people would get excited about running Kubernetes on Mesos? Is it simply so that they no longer have to go on running their Kubernetes alongside of Mesos, as in this diagram:
In Kubernetes, you have nodes. Your containers run directly on nodes. This is very much the same as running your kernel inside of a VM. It's there, on the computer, running. One kernel, one VM. One node, many containers, but the containers are still each running on exactly one node.
There are however five or more abstractions between your containers and the nodes that they run on (ingress routes traffic into service, which points at deployment, and under deployment is a ReplicaSet, that spawns pods, which may run containers in them). Through these you may arrange to provide Highly Available service guarantees.
But if that node goes down, your container is down until the pod can be restarted on a different node, or possibly until that node comes back. So you ensure that your pods can be scaled out horizontally through some potentially painful process if your product is not new and you've made bad decisions, or if it's from a vendor and the vendor has made some bad decisions that are perhaps not easy to undo. (If you chose the wrong vendor. It happens. Time to update your resume perhaps.)
In Mesos, you have Agents and physical machines, and the agents take care of "work units" that are roughly a share of some task in the same sense that on VMware DRS[1], your RAM and CPU cycles are drawn from a pool, and they may come from any of the physical nodes in your cluster at any time.
I think this was called "non-locality" in vSAN. Do not think about your data as being on a disk, because enough hot copies of it may be scattered across a fleet of disks that at any given time, you might even have a local copy but an agent decides that you're going to read from and write to somewhere else across 10gbit backplane instead.
That's where I'll draw the parallel, and that's where I'll stop trying to draw parallels because I'm not trying to tell you about VMware, and hope that someone comes along to correct me if I'm actually telling lies about Mesos. Because I don't really know about Mesos and could be talking right out of my ass. But from what I've read, I'm thinking not too far off.
(I had access to full ESXi clustering and vSAN license for a year, and we turned on all those features before I left that company. This is why I think that I know something about VMware, at least.)
I'd assume that you can get the same kind of guarantees from Mesos that I could get from VMware's DRS and HA solutions. In other words, if a node goes down and I have enabled all the, err, checkboxes...
...then I won't need to wait for that node to be rescheduled, I won't even notice an interruption in the continuous operation of the container(s) unless the loss of a node represents enough missing CPU cycles that it puts my cluster "over the edge." I don't need to have pod replicas in the sense that Kubernetes needed me to have them in order to provide that guarantee. Because they are not down. Their shares of work are just sent to another worker from the pool seamlessly.
(They might not even have been scheduled onto one physical machine during normal operation, the agent decides where to send the shares and may change its mind about that any time based on new information about load. You're never supposed to notice that unless you're paying very close attention to it.)
I'm extrapolating how I think Mesos must work based on what I've heard about it, and how I know you can get VMware to do if you're willing to spend millions of dollars scaling it out, or if your hardware is already big enough and you know the salesman that is willing to sell you a zero-support no-maintenance-included contract at 97% discount so you can use the HA and FT features when you need them. (I used ESXi for 4 years without any of these features and it was fine, but they are nice features and you need to arrange things a bit differently when you don't have them. vSAN is not RAID5.)
[1]: "What is DRS"
VMware DRS (Distributed Resource Scheduler) is a utility that balances computing workloads with available resources in a virtualized environment. The utility is part of a virtualization suite called VMware Infrastructure 3.