A central goal of the new Mill CPU project is to make strong hardware fault isolation incredibly cheap and incredibly easy and so simple as to have no obvious deficiencies.
I have prepared a much more up-to-date whitepaper on this that is going through internal review right now. Afraid we're focusing 99% on sim work at the moment and that 1% for other business is meaning the paper has been 'in review' since last autumn... hmm, have to go push :)
But I'm happy to elaborate if anyone has any security questions.
(Apologies if you're all still suffering from Mill fatigue)
"Cappsule is a new kind of hypervisor developed by Quarkslab (to our knowledge, there’s no similar public project). Its goal is to virtualize any software on the fly (e.g. web browser, office suite, media player) into lightweight VMs called cappsules. Attacks are confined inside cappsules and therefore don’t have any impact on the host OS. Applications don’t need to be repackaged, and their usage remain the same for the end user: it’s completely transparent. Moreover, the OS doesn’t need to be reinstalled nor modified ... Cappsule uses hardware virtualization to launch applications into lightweight VMs, which run a copy of the host kernel and have no access to the hardware. If an attacker manages to break into the VM (through a bug in the application for instance), the attack is confined into the VM."
Don't know implementation details, but Quarkslab has been credited with finding several Xen hypervisor vulnerabilities. Their FAQ states, https://cappsule.github.io/faq/
"Traditional virtualization solutions (e.g.: VMware, Xen, KVM) virtualize whole operating systems (such as Windows or Linux), whereas Cappsule virtualizes a running system. VMs launched by Cappsule don’t go through the boot step; they start directly on a running kernel. This particular feature allows an instantaneous launch of VMs. One can think of VMs as forks of the host operating system. In fact, the VMs’ kernel is a copy of the running host kernel. Another particularity is that no VM disk image is required. There’s no need to setup, configure, install, manage and keep new VMs up-to-date. The host filesystem is accessible as Copy-on-Write (with respect to a whitelist of files and folders accessible in read-only or read-write mode).
Every software has security bugs, and Cappsule is no exception. But it is developed from scratch, with the main goal of being secure. With less than 15K lines of code, the attack surface is extremely narrowed in comparison to mainstream hypervisors. Moreover, anything that isn’t vital for the VMs isn’t implemented and certain classes of attacks simply don’t exist. For example, there’s no way to access hardware (through I/O memory, I/O ports, or DMA). Also, there’s no need of instruction emulation: vulnerabilities such as XSA-105 are thus impossible."
So its similar Docker, except they don't use some base image from the cloud, but use the host instead.
The known security weakness of Docker (and containers in general) is that the host kernel is exposed. Capsule somehow "copies" the host kernel in flight?
One difference from Docker is that Cappsule uses hardware virtualization, so the isolation is that of a VM rather than kernel namespace. The VM gets a private, copy-on-write instance of the host kernel. From their FAQ:
"Docker isn’t meant for security. While a lot of improvements have been made recently in this area, it also relies on Linux namespaces and SECCOMP to ensure container isolation. A single kernel vulnerability allows an attacker to compromise the host. On this topic, the chapter 7 (Understanding Container Threats) of NCC Group Whitepaper Understanding and Hardening Linux Containers is a must-read," https://www.nccgroup.trust/us/our-research/understanding-and...
Thinstall did something similar; I've used it in the past and it worked pretty well. It captures system calls and writes changes to its own storage, and when a program reads a directory or file or registry key, it presents a combined view of the host OS + that program's changes, so programs think they are fully, normally installed.
This is an area where i wish there was more easy guides.
For example, i would love to understand how to take a program, limit the total count time and memory usage of it, and all its children, and be able to clean up after it.
Bonus points if i can limit what directories it can write to.
I'm sure one of these things will do it, but i don't know what level i want.
I've been looking for such a language for some time now but couldn't find any either. The feasibility to run untrusted code, combined with the ability to suspend, serialize and reschedule running tasks, as in Stackless Python:
would make it possible to have truly distributed applications. Imagine thousands of tasklets jumping through your filesystem, local network or the internet.
Another interesting property is determinism, which in turn would allow for remote execution and instruction metering - a computation market.
If the code is fully deterministic, verifying results becomes much simpler, one option is comparing merkle roots over the entire program state, for example:
It was inspired in part by just those agoric papers and meant to grow into supporting that kind of market. It was also designed to be fully deterministic, and included the suspending/serializing you mention (disabled because my first stab at resource accounting for that feature was wrong). It only had a few months of work put into it before I ran out of steam, though. Nowadays Web Assembly seems closest (and much further along), though it abandoned full determinism.
Kragen Sitaker (HN user kragen) posted some interesting ideas about taking advantage of determinism in a platform built for it from the start. He called it Kogluktualuk, but his post seems to have been lost to googling.
Whilst the complexity of the script is limited in what it can do, it seems actually not that hard to imagine some kind of next-gen CDN offering edge scripting to statelessly rewrite requests and responses.
Lua, in its vanilla form, is not a language supporting proper multi-tenancy. It works okay if only one (trusted) user writes the scripts, but it's not the good choice for running truly untrusted code from the internet. What will just happen when you do "string.rep" in redis?
A better design would be to run one Lua VM per tenant, but then we are at totally different set of problems (preemptive multi-tasking between VM's?)
Sorry for the meta comment, but reading this on an iPad Pro 12.9, and can't seem to get this to render as it would on a desktop, despite hitting 'Request Desktop Site'. Is there anything one can do with a bookmarklet in this situation, or am I beholden to what the server decides?
Author here. The markup is trivial. The CSS is basic. I'm not sure what problem precisely you have, but I did spend quite some effort to make it look good on iphone. (for added kicks press "d" on your keyboard on desktop)
Sorry again for meta, and thanks for responding. I just mean on the iPad pro 12.9 it renders as if it were a huge iPhone, rather than as a desktop site. It just means the text is huge. Not a biggie - I've seen this on some other blogs, and wondered if there were a workaround.
Indeed, google Appengine has sandboxed versions of python, java and go. It's becoming the 'old' way to do things though. If I understand correctly more recently VM and container-based solutions are the new way to do sandboxing on google cloud, e.g. Compute Engine and Container Engine.
There's an old talk on Security http://millcomputing.com/technology/docs/security/
I have prepared a much more up-to-date whitepaper on this that is going through internal review right now. Afraid we're focusing 99% on sim work at the moment and that 1% for other business is meaning the paper has been 'in review' since last autumn... hmm, have to go push :)
But I'm happy to elaborate if anyone has any security questions.
(Apologies if you're all still suffering from Mill fatigue)