It gives a bare-bones environment for you to run your programs that is presumably very low overhead. Think of it as an embedded system where programs run without an OS. This is the environment a program running inside zerovm will see. All you have is libc and the zerovm-provided APIs. If you want more, you'll have to statically link your programs.
The thing is, you can run many, lets say thousands, of these little programs inside a single machine in such a way that each one can never see the other ones (as long as it's impossible to break out of the ZeroVM sandbox).
Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.
The "run a C program for some period of time thing" would work kind of like the AWS dashboard, but instead of having to spin up a machine with linux on it and running your program inside that, you would only upload your binary and a manifest file. Kind of like what app engine does, but with less restrictions (you'll probably be able to do anything as long as you're able to compile a "safe" binary that does it).
Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.
Well, almost nobody. NearlyFreeSpeech[1] lets you compile and run unrestricted C and C++ programs on their servers, or any other binary, if you compile it somewhere else. I've done some tests with a Go based webservice.
They do have a very restricted time limit until they're killed, but that's because their servers are designed for web applications, not data crunching.
Thanks for the info. Seriously I should have said "almost nobody" in the first place. In fact I thought I had said that :)
I didn't know nfshost was doing it. From what I see, they're probably using FreeBSD jails in this case, which is nice if they are.
Anyways, we have to agree that this space is largely unexplored. I never thought there was a need for this kind of service, but just after reading the ZeroVM pages I think it's a very good idea. With a "little" more initial effort, it would enable writing systems in a very interesting way: self-healing (when the other end has failed, make an API call to provision another copy of it), self-provisioning (when traffic is high, make an API call to provision another copy of a worker), etc. Of course we can already do this already, it would just be more natural, and if you combine this with the idea of Mobile Agents, then the cloud suddenly becomes much "cloudier".
Actually as long as you set up your permission correctly, you can run any untrusted user-mode C code (non-privileged code). Just deny access to most of the system except the places you let it access. Chroot actually let you emulate the system directories for the process. Set up the firewall correctly to restrict its network access.
The multi-user environment in Unix is the very old idea to let untrusted codes and untrusted programmers run wild in the same machine.
I think in practice there have been a few historical problems with this approach. A major one is that while in theory it's possible to secure a process to a chroot jail (either the Linux or the BSD variety) in practice it's hard to defend against arbitrary, possibly malicious programs in that jail.
Kernel exploits that rely on local access are uncommon but not incredibly rare. You could easily have a hosting service like this that ran fine for a few months, or even a few years, and then a 0-day kernel exploit comes out (that relies on having local access to the machine) and then you're SOL.
Another issue is that in practice some system calls are very expensive. With a little knowledge you can DOS a machine hosting your process if you can run arbitrary system calls -- for instance, if you can create some sort of resource that is expensive for the kernel to track, and then create a lot of that resource. Sort of related to this, at least in Linux there are a very large number of syscalls and new ones are added relatively frequently, and not all syscalls have had the same amount of security auditing.
Even if you're not worried about your own servers getting hacked, you need to convince your customers that other users aren't going to be able to intercept or modify their data or attack their VMs.
In my mind, one of the main innovation of a lot of the VM stuff in recent years, and in particular the NaCL approach of limiting what system calls can be invoked, is that you are fundamentally reducing the interface by which malicious programs have to attack the hosting machine. If you can securely limit system calls to a small subset, you can more easily audit the paths those system calls make to ensure that they're secure and can't be used in a DOS scenario. In a virtualization environment like Xen or KVM, the amount of code that you have to audit is limited to the Xen/KVM code which is much smaller than the rest of the code base.
Thanks for that, I feel the same way for process isolation. In theory that should be a solution but in practice doesn't work for historic reasons.
Just one comment. ZEROVM IS NOT NACL. It uses NaCl, moreover we explicitly refrained to touch validator in order to remain under its proven security blanket (Google established hefty monetary prizes for each found exploit). However, except of validator it is heavely refactored and rewritten.
MAIN DIFFERENCE:
NaCl has "syscall firewalling" feature that is called Pepper. ZeroVM forbids all host syscalls. In fact ZeroVM is a new virtual hadware architectures (a subset of x86 and subset of ARM and new ones in future) so there is no such concept for code running inside as "host syscalls".
Thanks for the responses; however, most of the attacks to OS process can happen in ZeroVM.
- Kernel exploits can happen, so can exploits in ZeroVM.
- DOS attack on OS, so can DOS attack on ZeroVM. I don't see any quota system or resource management in ZeroVM to mitigate it. At least the OS have better tools to deal with it - priority, scheduler, resource manager, memory partition, etc.
- Same thing for the customer worry. They could worry about other users attacking the ZeroVM or the host system.
If you prefer security VM, there are a lot of mature VM's - JVM, .Net, Xen, or KVM.
My point is there have been so much research, work, and experience have gone into the security in OS that I would trust an OS isolation than a new research system in term of security. It's easy to have a lock-down OS to restrict access to most subsystems.
ZeroVM is not a lightweight container so all the above doesn't hold. ZeroVM doesn't use any "syscall firewalling" techniques. ZeroVM efficiently emulates new hardware platform just as XEN/KVM.
Now let me address your concerns one by one:
1. (Host) Kernel exploits cannot happen as no syscalls to host OS are allowed. ZeroVM app doesn't even have such a concept as host OS.
2. DoS attacks impossible on host OS as there are no access to host OS. The interface between ZeroVM application and ZeroVM itself is specifically designed to be impossible to DoS attack. It consists now of 4 functions. Setup and exit are callable only once during lifetime of app. Message queue read/write could be repeatedly called but the it intentionally designed to be synchronous, so throttling mechanism could be transparently implemented.
3. We intentionally haven't hacked NaCl validator as it the only component in the system that guaranties security. Google established 5-6 digit monetary prizes for any exploits in Chrome/NaCl and heavily invest in security. For many customers it is enough.
4. No one asserted that ZeroVM is mature and right now it is not! So this piece of advice is correct. If you need security now for production usage - KVM/XEN is the only way to go.
5. OS security... So much work was done to secure OS from outside... not from inside. I would appreciate description how it is easy to lock-down OS process in multi-tenant sense. Also when you start "syscall firewalling" and draconian restriction it would be very hard to program such as system. Just think for a moment. You take the whole syscall list, for every syscall you decide on restrictions (I haven't found any document on web on that). Now how you work with such API? How you enforce yourself before you issue a syscall, what to do when third-party code is causing violations? Syscall API is not built for such draconian capping....
1. So there are 100% guarantee that ZeroVM does not and will not have exploit? The OP point was kernel can have exploit so kernel is inferior than ZeroVM. My point kernel can exploits, so can ZeroVM. You just don't know yet.
2. DOS on ZeroVM indirectly DOS on the host. There are so many way to DOS a system. How do you handle an app running in a tight loop access all the memory randomly? Queuing the max payload in a tight loop? Spawn off new instances across the entire cluster in a tight loop? Claiming DOS can happen in kernel and not possible in ZVM is just naive.
1) there is $100K bounty on each Chrome/NaCl exploit and we have only one ZeroVM 'syscall' that we allow with a lot of attention put how to make it easily secure. The situation is not same on Linux. First of all kernel exploits by process are not really considered severe in Linux and for sure it is not top priority to anyone. Linux built to be secure from outside not from inside.
2) All these is impossible in ZeroVM except accessing memory randomly and thrashing caches and TLB tables. Hm... that could work, I guess. For the first time in this forum we talk about real vulnerability. However, I think the problem exists also in KVM/XEN (will do a proper research now, Googling EC2 TLB thrashing doesn't yield anything interesting), no access to other tenant data just temporarily slowing down specific processor chip.
It is a layered approach, first you wold have to find an exploit in the "VM" (which is a sandbox really) thn exploit the underlying OS. The VM has a much smaller attack surface as you have less you can do, so it is easier to audit. NaCl, which is used here has had minor flaws http://arstechnica.com/open-source/news/2009/07/google-nacl-... but nothing like straight kernel. Sure there are other approaches eg see http://sandboxing.org/ eg to use selinux to constrain processes, but none are easy. There is some more recent work on more directly limiting syscalls to processes which is another approach, so the OS provides an isolation service.
1. Filtering all resources accesses, letting some pass and others denied.
2. Enforcing different abstraction and then unwanted resources accesses become impossible as they are not even addressable.
Filtering is by definition less secure. As filtering get more complicated there would be false negatives and false positives. Both are harmful.
Enforcing different abstraction is usually less efficient as there is a need to simulate hardware devices. However, some devices have hardware support for virtualization as with Intel CPUs and MR-IOV devices and then enforcing abstraction is free.
True, even the JVM verifier was flawed, maybe neither approach can succeed. It just seems to me we now have good reason to believe it's a dead end to try to sandbox native code in legacy instruction sets.
Every system might have weaknesses. What matters is:
1. Small surface for attack. With NaCl it is all concentrated in single tiny validator module. The model is also simple and mathematically proven to be secure.
2. Prior testing. It is especially hard for security product. Establishing motivating prizes is good way to ensure it is not easily breakable.
3. The speed with which patch is made available
4. Defense-in-depth, ability to have multiple levels of defense cheaply.
It gives a bare-bones environment for you to run your programs that is presumably very low overhead. Think of it as an embedded system where programs run without an OS. This is the environment a program running inside zerovm will see. All you have is libc and the zerovm-provided APIs. If you want more, you'll have to statically link your programs.
The thing is, you can run many, lets say thousands, of these little programs inside a single machine in such a way that each one can never see the other ones (as long as it's impossible to break out of the ZeroVM sandbox).
Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.
The "run a C program for some period of time thing" would work kind of like the AWS dashboard, but instead of having to spin up a machine with linux on it and running your program inside that, you would only upload your binary and a manifest file. Kind of like what app engine does, but with less restrictions (you'll probably be able to do anything as long as you're able to compile a "safe" binary that does it).