From what I read it's something like this: It gives a bare-bones environment for...

icebraining · on March 23, 2012

Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.

Well, almost nobody. NearlyFreeSpeech[1] lets you compile and run unrestricted C and C++ programs on their servers, or any other binary, if you compile it somewhere else. I've done some tests with a Go based webservice.

They do have a very restricted time limit until they're killed, but that's because their servers are designed for web applications, not data crunching.

[1]: http://example.nfshost.com/versions.php

reginaldo · on March 23, 2012

Thanks for the info. Seriously I should have said "almost nobody" in the first place. In fact I thought I had said that :)

I didn't know nfshost was doing it. From what I see, they're probably using FreeBSD jails in this case, which is nice if they are.

Anyways, we have to agree that this space is largely unexplored. I never thought there was a need for this kind of service, but just after reading the ZeroVM pages I think it's a very good idea. With a "little" more initial effort, it would enable writing systems in a very interesting way: self-healing (when the other end has failed, make an API call to provision another copy of it), self-provisioning (when traffic is high, make an API call to provision another copy of a worker), etc. Of course we can already do this already, it would just be more natural, and if you combine this with the idea of Mobile Agents, then the cloud suddenly becomes much "cloudier".

ww520 · on March 23, 2012

Actually as long as you set up your permission correctly, you can run any untrusted user-mode C code (non-privileged code). Just deny access to most of the system except the places you let it access. Chroot actually let you emulate the system directories for the process. Set up the firewall correctly to restrict its network access.

The multi-user environment in Unix is the very old idea to let untrusted codes and untrusted programmers run wild in the same machine.

eklitzke · on March 24, 2012

I think in practice there have been a few historical problems with this approach. A major one is that while in theory it's possible to secure a process to a chroot jail (either the Linux or the BSD variety) in practice it's hard to defend against arbitrary, possibly malicious programs in that jail.

Kernel exploits that rely on local access are uncommon but not incredibly rare. You could easily have a hosting service like this that ran fine for a few months, or even a few years, and then a 0-day kernel exploit comes out (that relies on having local access to the machine) and then you're SOL.

Another issue is that in practice some system calls are very expensive. With a little knowledge you can DOS a machine hosting your process if you can run arbitrary system calls -- for instance, if you can create some sort of resource that is expensive for the kernel to track, and then create a lot of that resource. Sort of related to this, at least in Linux there are a very large number of syscalls and new ones are added relatively frequently, and not all syscalls have had the same amount of security auditing.

Even if you're not worried about your own servers getting hacked, you need to convince your customers that other users aren't going to be able to intercept or modify their data or attack their VMs.

In my mind, one of the main innovation of a lot of the VM stuff in recent years, and in particular the NaCL approach of limiting what system calls can be invoked, is that you are fundamentally reducing the interface by which malicious programs have to attack the hosting machine. If you can securely limit system calls to a small subset, you can more easily audit the paths those system calls make to ensure that they're secure and can't be used in a DOS scenario. In a virtualization environment like Xen or KVM, the amount of code that you have to audit is limited to the Xen/KVM code which is much smaller than the rest of the code base.

Just my 2c., etc.

camuel · on March 24, 2012

Thanks for that, I feel the same way for process isolation. In theory that should be a solution but in practice doesn't work for historic reasons.

Just one comment. ZEROVM IS NOT NACL. It uses NaCl, moreover we explicitly refrained to touch validator in order to remain under its proven security blanket (Google established hefty monetary prizes for each found exploit). However, except of validator it is heavely refactored and rewritten.

MAIN DIFFERENCE:

NaCl has "syscall firewalling" feature that is called Pepper. ZeroVM forbids all host syscalls. In fact ZeroVM is a new virtual hadware architectures (a subset of x86 and subset of ARM and new ones in future) so there is no such concept for code running inside as "host syscalls".

justincormack · on March 24, 2012

ARM support? Excellent, was going to look at adding this.

ww520 · on March 24, 2012

Thanks for the responses; however, most of the attacks to OS process can happen in ZeroVM.

- Kernel exploits can happen, so can exploits in ZeroVM.

- DOS attack on OS, so can DOS attack on ZeroVM. I don't see any quota system or resource management in ZeroVM to mitigate it. At least the OS have better tools to deal with it - priority, scheduler, resource manager, memory partition, etc.

- Same thing for the customer worry. They could worry about other users attacking the ZeroVM or the host system.

If you prefer security VM, there are a lot of mature VM's - JVM, .Net, Xen, or KVM.

My point is there have been so much research, work, and experience have gone into the security in OS that I would trust an OS isolation than a new research system in term of security. It's easy to have a lock-down OS to restrict access to most subsystems.

camuel · on March 24, 2012

ZeroVM is not a lightweight container so all the above doesn't hold. ZeroVM doesn't use any "syscall firewalling" techniques. ZeroVM efficiently emulates new hardware platform just as XEN/KVM.

Now let me address your concerns one by one:

1. (Host) Kernel exploits cannot happen as no syscalls to host OS are allowed. ZeroVM app doesn't even have such a concept as host OS.

2. DoS attacks impossible on host OS as there are no access to host OS. The interface between ZeroVM application and ZeroVM itself is specifically designed to be impossible to DoS attack. It consists now of 4 functions. Setup and exit are callable only once during lifetime of app. Message queue read/write could be repeatedly called but the it intentionally designed to be synchronous, so throttling mechanism could be transparently implemented.

3. We intentionally haven't hacked NaCl validator as it the only component in the system that guaranties security. Google established 5-6 digit monetary prizes for any exploits in Chrome/NaCl and heavily invest in security. For many customers it is enough.

4. No one asserted that ZeroVM is mature and right now it is not! So this piece of advice is correct. If you need security now for production usage - KVM/XEN is the only way to go.

5. OS security... So much work was done to secure OS from outside... not from inside. I would appreciate description how it is easy to lock-down OS process in multi-tenant sense. Also when you start "syscall firewalling" and draconian restriction it would be very hard to program such as system. Just think for a moment. You take the whole syscall list, for every syscall you decide on restrictions (I haven't found any document on web on that). Now how you work with such API? How you enforce yourself before you issue a syscall, what to do when third-party code is causing violations? Syscall API is not built for such draconian capping....

ww520 · on March 24, 2012

1. So there are 100% guarantee that ZeroVM does not and will not have exploit? The OP point was kernel can have exploit so kernel is inferior than ZeroVM. My point kernel can exploits, so can ZeroVM. You just don't know yet.

2. DOS on ZeroVM indirectly DOS on the host. There are so many way to DOS a system. How do you handle an app running in a tight loop access all the memory randomly? Queuing the max payload in a tight loop? Spawn off new instances across the entire cluster in a tight loop? Claiming DOS can happen in kernel and not possible in ZVM is just naive.

camuel · on March 24, 2012

1) there is $100K bounty on each Chrome/NaCl exploit and we have only one ZeroVM 'syscall' that we allow with a lot of attention put how to make it easily secure. The situation is not same on Linux. First of all kernel exploits by process are not really considered severe in Linux and for sure it is not top priority to anyone. Linux built to be secure from outside not from inside.

2) All these is impossible in ZeroVM except accessing memory randomly and thrashing caches and TLB tables. Hm... that could work, I guess. For the first time in this forum we talk about real vulnerability. However, I think the problem exists also in KVM/XEN (will do a proper research now, Googling EC2 TLB thrashing doesn't yield anything interesting), no access to other tenant data just temporarily slowing down specific processor chip.

prodigal_erik · on March 24, 2012

Has OS isolation ever been implemented successfully? I think I've seen local root exploits reported for every OS I've heard of.

ww520 · on March 24, 2012

And what are the evidence of VM based isolation have more security?

justincormack · on March 24, 2012

It is a layered approach, first you wold have to find an exploit in the "VM" (which is a sandbox really) thn exploit the underlying OS. The VM has a much smaller attack surface as you have less you can do, so it is easier to audit. NaCl, which is used here has had minor flaws http://arstechnica.com/open-source/news/2009/07/google-nacl-... but nothing like straight kernel. Sure there are other approaches eg see http://sandboxing.org/ eg to use selinux to constrain processes, but none are easy. There is some more recent work on more directly limiting syscalls to processes which is another approach, so the OS provides an isolation service.

camuel · on March 24, 2012

There are two techniques for resources isolation:

1. Filtering all resources accesses, letting some pass and others denied.

2. Enforcing different abstraction and then unwanted resources accesses become impossible as they are not even addressable.

Filtering is by definition less secure. As filtering get more complicated there would be false negatives and false positives. Both are harmful.

Enforcing different abstraction is usually less efficient as there is a need to simulate hardware devices. However, some devices have hardware support for virtualization as with Intel CPUs and MR-IOV devices and then enforcing abstraction is free.

LXC/OpenVZ uses mostly #1

XEN/KVM uses mostly #2

prodigal_erik · on March 24, 2012

True, even the JVM verifier was flawed, maybe neither approach can succeed. It just seems to me we now have good reason to believe it's a dead end to try to sandbox native code in legacy instruction sets.

camuel · on March 25, 2012

Every system might have weaknesses. What matters is:

1. Small surface for attack. With NaCl it is all concentrated in single tiny validator module. The model is also simple and mathematically proven to be secure. 2. Prior testing. It is especially hard for security product. Establishing motivating prizes is good way to ensure it is not easily breakable. 3. The speed with which patch is made available 4. Defense-in-depth, ability to have multiple levels of defense cheaply.