Hacker News new | past | comments | ask | show | jobs | submit login
Landlock merged in mainline for Linux 5.13 (landlock.io)
138 points by p4bl0 on May 19, 2021 | hide | past | favorite | 70 comments



I'm completely lost with Linux sandboxing and security tools , AFAIK there is

- cgroups

- cgroupsV2

- namespaces

- AppArmor

- SELinux

- LSM

- seccomp

- seccomp-bpf

- eBPF (I know it's not a security thing per se, but it's a building block for sandboxes)

- and now landlock.

Could anyone here give me an ELI5 about which one does what? (also which one you're supposed to use together, and which are competitors to each others)

Or maybe give me some pointer to a good introduction to these?


You forgot firejail also...

I am completely lost too, and have the feeling that anyone has pros / cons, and no one is really an efficient product to be used in the daily life :/

I used firejail many months and got several issues with maintaining configurations for different softwares, especially after software updates.

Also, those are softwares on top of the kernel I think, and not utilities in the kernel directly, but I may be wrong for this.


I didn't mention Firejail because it's actually build on top of seccomp-bpf and namespaces. It's a user-space application, leveraging existing kernel features, but unless I'm even more confused than I thought (which wouldn't surprise me, tbh), the others I mentioned live in the kernel (though AFAIK AppArmor and SELinux are security module running in kernel space but not part of the kernel itself).


The slides [1] help distinguish landlock from the others, IMO.

Landlock uniquely has: the fine grained controls similar to SELinux, embedded policy (not up to the admin whether to deploy the security measure), and unprivileged use.

[1] https://landlock.io/talks/2019-09-12_landlock-summary.pdf


> - AppArmor > - SELinux > - LSM

Let's group these three up. LSM is how Apparmor and SELinux are implemented.

LSMs were the de-facto way to accomplish mandatory or role based access control on Linux, as opposed to the default user/group "discretionary" access control (DAC).

SELinux was developed for building very very powerful policies, often to its detriment as it was not unusual for policies to be flawed due to that complexity. These days though I think it's doing pretty well on Android.

Apparmor is much simpler, though less powerful. It leverages a per-process sandboxing model, mostly focusing on read/write/execute permissions for the filesystem for a given process.

> - seccomp > - seccomp-bpf

The OG seccomp gave a program access to four system calls. 'exit', 'sigreturn', 'read', and 'write'.

Basically if you had, say, a parser, you could fork and exec it within a seccomp sandbox. All it could do is read/write to the inherited pipes - so, it could read stdin, write to stdout.

Really cool, but limited.

seccomp-ebpf was an extension of the idea behind seccomp - that is, policies on system calls.

The main idea is that system calls are the way that shit gets done, whether it's exploiting the kernel or actually just performing malicious actions. So, we want to limit them.

This works by pushing policies up to the ebpf kernel virtual machine, and that thing makes decisions about what to allow or not. It's very powerful and effective, but it's hard to maintain because you have to know your syscalls ahead of time.

> - cgroups > - cgroupsV2

cgroups are less about sandboxing and more about restricting resource usage.

> - namespaces

Namespaces give a program its own view of some slice of your system. In a pid namespace it won't see other processes, unless they're in that same namespace. In a network namespace the process will think it's got its own network to itself.

This is obviously quite useful for containers.

There's even more than that though, way more.

There's Yama, another LSM. There's chroots, chroot jails. There's the entire DAC system with users and groups. Root "capabilities".

Windows isn't a ton better fwiw, it has jobs, integrity (including undocumented hidden integrity levels), virtual desktops, DAC, AppContainer (or something?), AppLocker, UAC virtualization, etc.

In terms of what to use, that depends. Apparmor and SELinux are distro-specific, but IMO Apparmor is pretty trivial to write and maintain so it's sort of a harmless thing to deliver alongside a service/application.

Seccomp is arguably the most powerful, I think in particular if it's combined with namespaces (since you can't do things like filter on pointer arguments to system calls, so filtering on file path etc is out) this is really damn powerful, assuming a tight seccomp filter. But you'll want really good test coverage so that you can determine ahead of time if you accidentally introduced a new system call.

The main thing imo is to take the dangerous part of your program and isolate it into a small component. That's just the most important thing to do, and from there sandboxing becomes (more or less) trivial. That component can run as an unprivileged user, or in a namespace with no access, or in a seccompv1 jail with nothing but a pipe for reading/ and writing.


Thanks for the detailed explanation, this is very useful! This is a pretty open ended question, but what approach would you choose in order to get the most security for the buck, for a piece of software that runs containers for non-technical users, where the apps come from 3rd party developers? Containerd and runc expose some of these security levers but it's a bit daunting to build the right approach without burdening the user too much.


If you're running 3rd party code that more or less moves you into the namespace/container world or the LSM world. You aren't going to want to maintain a seccomp filter for 3rd party software.

It's going to really depend on what you're going for, but probably something along the lines of "docker + best practices" is the most bang for your buck solution.


Ok, that's inline with what I expected. I was also thinking of mandating that developers embed some kind of manifest with the app which states what resources it consumes (public ports, Internet connectivity to specific hosts etc) so that the system could enforce some boundaries around that, perhaps using eBPF. For the syscall fencing the Landlock project might be of use.

Appreciate the advice.


I've heard multiple times that docker wasn't suited for security isolation as it wasn't designed for it (even though the underlying mechanisms can be used for that purpose). Had something changed recently on that front?


It's just a matter of your threat model.


Pledge is like 100% easier to understand. FreeBSD has similar stuff and it is also super complicated and basically unused.. If you can't make this idiot proof, there is little hope anyone will ever use it.


Reading the documentation my guess is that landlock works similar to OpenBSD unveil, where the kernel pins and tags a dentry object. If a path walk doesn't traverse a tagged dentry object w/ an allow flag, or the most recently traversed dentry has a block flag, the open fails. That neatly avoids the need to compare paths. One caveat is that renames, particularly renames of directories, might not behave as expected from the perspective of a user specifying paths.

Linux thread semantics wrt credentials are likely to be an achilles heel, though, from an exploit mitigation perspective. Also, the fact that all subprocesses inherit these restrictions means it becomes very difficult to refactor tools to use minimal permissions when those tools need to invoke other tools. This is a lesson that OpenBSD learned the hard way, has explained very clearly, and the fact it has fallen on deaf ears is disconcerting. This is one reason for the lack of seccomp uptake, and landlock is recapitulating the same mistake.


> Also, the fact that all subprocesses inherit these restrictions means it becomes very difficult to refactor tools to use minimal permissions when those tools need to invoke other tools. This is a lesson that OpenBSD learned the hard way, has explained very clearly, and the fact it has fallen on deaf ears is disconcerting.

If subprocesses don’t inherit sandbox restrictions, can’t you trivially escape the sandbox by spawning a subprocess to do whatever work you want? What exactly are you proposing here?


This confuses two different problems. The problem most people think of is how to sandbox a specific series of functional operations they have in mind, like opening and reading a PDF file. And then they think to themselves, what kind of API would I need to restrict a PDF viewer to just the file I'm passing.

But the problem for ubiquitously minimizing privileges across myriad tools (almost all OpenBSD command-line tools use pledge and/or unveil, now) is that the calling program can't make assumptions about the permissions and files a child program might need. So for example, a PDF viewer (or for the sake of simplicity, let's say a PDF text extractor) might need to access /bin/gunzip. The calling program can't know this ahead of time. So it has no way of reliably including those necessary files in its permitted set. But even if it did, without major refactoring it would end up w/ a union of the privileges it needs itself and of all the privileges its child processes might need, which is the opposite of what you're trying to achieve with privilege minimization. Ultimately, the only program that can truly know--and the only one that should be tasked with knowing--what privileges are needed to perform an operation is the program performing that operation.

(And note that major refactoring, such as splitting a program into a series of interoperating processes w/ discrete privileges sets, is usually not a realistic option. If it were then solutions like Capsicum would be more than adequate. Certainly new software should try to use this approach if it can.)

At the end of the day, when it comes to security you can't really save vulnerable software from itself without touching that software. Attempting to build sandboxes outside that software doesn't work well in practice. Witness SELinux, which most people disable because the rules (invariably written by people other than the authors of the programs) are too brittle and prone to breaking functionality. But neither, of course, is it practical to rearchitect all our software.

The reason why pledge and unveil and their default semantics of non-inheritance work so well is that they provide an optimal middle-ground. Each program concerns itself w/ minimizing its own privileges, and relies on programs it invokes to do the same. And w/ a good API, like pledge and unveil, in a shockingly short period of time those expectations (that a child program does know how to restrict its own permissions) can be substantially met w/ minimal effort. The typical patch to OpenBSD utilities included just several lines of code spread around various critical areas. For example, open or (if you can't open immediately) unveil any file paths passed on the command-line. Ditto for extraneous resources like /bin/gzip or /etc/magic. Then drop as many other privileges as you can as soon as you can, like network access. The appropriate way to do this is always highly dependent on the function and architecture of the program.

Of course, if you're a leaf program then, yes, you want restrictions to be inherited by default in case some bug permits uncontrolled invocation of a utility. But in most of these cases this can be solved by, e.g., blocking exec. As a framework matures, sure, extensions can be added for the niche cases where you want to toggle inheritance and/or pre-specify restrictions as the invoker of another program. But these features should not be the emphasis, and they definitely shouldn't be provided in lieu of interfaces for non-inherited restrictions.


I was also confused, but I get you and now it seems obvious. I think that we have two problems. One is that we want to constrain everything as much as possible and the second is to allow generic purpose computing.

In the first problem we want to allow programs to drop privileges, like a good citizen and also spawn programs with dropped privileges, which requires some sort of inheritance. Modulating the inheritance part seems vital to allow it. As you mentioned the gunzip issue. We could also imagine, that if a program is allowed to run other programs and without inheritance it could just run tee to write files wherever it wants.

That's why on Android and other similar systems an app has to declare it's limits. Then it can still do work on behalf of another app via intents - in a way pipe/fork/exec. Limits are not inherited then. You can always limit what the application is allowed to run or communicate with.


Yeah. There's an implicit assumption that some restrictions and capabilities do need to be inherited or passed. /bin/cat can't read /etc/shadow unless it inherits the root UID or is passed a descriptor to the opened file. While the traditional Unix programming environment falls short of an ideal object capabilities model, it doesn't fall that short--the Capsicum API extension is quite simple. Other programming environments, like Android as you point out, have their own intrinsic security model. Discussions about the efficacy of interfaces like pledge, landlock, seccomp, etc, are set against these background models and the existing software ecosystems they support.

The argument for pledge and unveil isn't that it's conceptually superior to, e.g., a pure object capabilities system; it's that in practice it's the interface most likely to be used effectively by software in the Unix programming environment to augment the existing model.


[SELinux rules] invariably written by people other than the authors of the programs

Isn't this the core of the solution? Why shouldn't the authors of the program themselves supply the expected boundaries of the program they've written?

Point is, SELinux is not bound by the same problem you explain: it uses exec() as a domain boundary, so a PDFviewer calling gunzip would transition to the gunzip enforcement domain (if so configured), allowing both gunzip and the PDFviewer to run with minimal permissions.


>Attempting to build sandboxes outside that software doesn't work well in practice.

I can't say I agree with this. If it were true, containers would not be so ubiquitous.


Except container permissions in practice are non-existent. They have essentially all the permissions by default.

There is a reason you have to run your containers as NON ROOT. It's because containers can leak root access(last I checked, though I doubt it's changed any).


> Witness SELinux, which most people disable because the rules (invariably written by people other than the authors of the programs) are too brittle and prone to breaking functionality.

This is a recurring criticism from BSD fans, but unfortunately I think it mostly reminds us that humans are lazy and isn't a reflection of an actual problem. This machine, which does a bunch of stuff, turns out to have had SELinux in enforcing mode for many years, I wasn't sure actually so I had to go check when I began this rant, but yup, it was enabled when I installed whatever Fedora was current when my last PC's power supply gave up, maybe 2015 or so? I think if you could just disable pledge, you'd see "fixes" that do that too, for the same reason.

It's true that "disable SELinux" is a common "fix" proposed for problems, but it is also true that "disable certificate validation" is a common "fix" (even today people will copy-paste a "disable all validation" configuration into their app in hopes it'll solve some unrelated issue), and I'm quite sure if I poke around in Rust forums I'll find the same sort of person recommending just sticking everything inside an Unsafe block to "fix" a problem too... Just in case it feels like I'm sticking the boot into our industry here, this happens even in safety critical environments, it's just that they have more fail-safes than we do, so here's a story about a non-IT lazy "fix":

Railway signaller signs on for his night shift, the mechanical box he works in has had some work done that day, but is signed off as OK. Some time later that evening, he gets a call, driver is sat at a Danger signal for some time. He tries to release the signal to Clear, but it won't release. -Sigh- he figures those maintenance idiots have just fouled up the interlocks, but the rulebook is clear about how to override this, so he gives a rote speech, something along the lines of "Pass this signal at caution, prepared to stop at any obstruction, and obey all further signals". Next signal, the same happens. Same drill, "Pass this signal at caution... blah blah blah". Next signal the same. Wow, those clowns working on the box have really messed it up. And then two drivers call him. Because they are now face-to-face.

There had been a physical points failure (something actually broke, wore out, etc.), and the reason all those signals wouldn't release is that the interlock makes that release conditional on the "Points locked" status ahead, which if the signaller had looked was now reading "Failed". But he didn't look, because humans are lazy, and so instead of passing harmlessly on the opposite track the train being "cautioned through" was actually sent towards its sister instead. Fortunately the "fail safe" instruction to proceed only at caution averted catastrophe by giving the driver enough time to brake once he realised the problem.


The classic example is the shell, if it's in a chroot say there's not a lot for you to execute. However the shell itself remains pledged.

Pledges must reflect the capabilities of a program. Yes the "proc" and "exec" pledge promises are powerful but they can be constrained by their environment such as chroots or calls to unveil.

As an aside if you have very good knowledge of what a pledged program executes you can opt into setting up pledge for it ahead of time in the form of execpromises.


re invoking other tools:

I built this years ago (if i'd do it again, I'd implement it with LSM, but was coming from a lab with lots of stackable fs experience, so my thought process went there), used exec() to transition rule sets.

https://www.usenix.net/legacy/events/lisa07/tech/full_papers...


> FreeBSD has similar stuff and it is also super complicated and basically unused..

Are you talking about Capsicum? It's definitely not unused. Unloved, maybe.

There was a Reddit thread where some people talked about the possibility of layering pledge/unveil on top of it.

https://www.reddit.com/r/freebsd/comments/jldsm2/do_freebsds...

Something along these lines has been done by Ryan Stone

https://papers.freebsd.org/2020/bsdcan/stone-oblivious_sandb...


Capsicum seems so useful, but some of the boundaries are so rigid. As far as I could tell, you can't capsicum anything like a TLS terminator or http proxy etc, because there's no way to allow opening new sockets after entering capabilities mode; you could have another process open up the sockets and pass the FD, but if I'm writing both the sandboxed process and the one that opens sockets, I'm not sure it makes enough difference. I ended up with jails instead, being stuck in a chroot with only a static executable, a config file, and a log file felt good enough to me.


It’s rigid, because it implements a security architecture (capability-based security) instead of providing a mechanism to implement random restrictions.


Capability-based security can be done flexibly: look at EROS and E:

http://www.erights.org/related.html


Yes, Capsicum, and libcasper and MAC and whatever else they are adding to it now to make it saner to use in practice(I don't really keep up).

I'm not fully up to speed, but apart from some commands in the base system, I know of nothing that even attempts to use it.

There has been talk of adopting pledge/unveil into Linux also using EBPF if I remember right, though that doesn't seem to have gone anywhere either.

I'm not saying pledge/unveil are the best way to do it, and in theory capsicum is AWESOME, but it's also super complicated and not for the feint of heart.

But pledge/unveil are mostly idiot proof, in about 15m you can figure out enough of how it works to feel comfortable trying it out in something like `cat`. I've now read the landlock docs and the capsicum docs and I still don't feel comfortable playing with either one of them. I'd need a lot of time and the docs laying open beside me to feel comfortable even trying them in a small thing like `cat`. Reality is? Nobody will ever bother until forced.

And "security" in computer land is still trying to figure out how to fix giant gaping holes like memory leaks and overflows. Things like capabilities are still not even on the roadmap in most software. If you care at all about cross-platform, there is zero chance you can implement capabilities.


zie may be referring to the Mandatory Access Control implementation on FreeBSD[0], not Capsicum.

[0] https://docs.freebsd.org/en/books/handbook/mac/


"make something idiot proof, and then they come along and invent a better idiot"


This is cool, and lays the groundwork for per-directory, per-app access controls like on current macOS.

Is there a comparable thing for network access? Last I looked OpenSnitch seemed to be unmaintained, but looking just now it apparently has some commits on master recently again:

https://github.com/evilsocket/opensnitch



I would recommend against firejail as it uses SUID, essentially making any escalation bugs into running as root, instead of just as the given user.


Amazing! Thanks!


There is a sort of opensnitch based on ebpf but not sure is fully feature completed as you intended:

https://github.com/harporoeder/ebpfsnitch


I use skuid based firewall rules + locked down permissions on the host so that processes can't elevate their privileges/change the user.

Works fine enough for untrusted non-gui SW, and trusted GUI SW, that I know will not try to hack my PC, but apps running inside it may be able to access stuff I don't want them to on my network (like Firefox).

cgroups may also work well for this without the need to use multiple UNIX users.


Linux has network namespaces for isolation and restriction, not sure if that's what you're looking for, though.


Namespaces + internal netns firewall + seccomp might do the trick :-)


Can't you already do that? Just run every app in its own container, and firewall each container individually (since they can each have their own networking stack).


Sure, but why would I want to run 'ls' in it's own container?


I'm not sure of all the history, but it feels to me like there were two cute ideas of the classic shell metaphor that we never quite achieved, and anyway we've totally outgrown:

1. That the shell's user is "in" a location of the filesystem (cwd), and can move around it like you're an avatar navigating rooms.

2. That the shell can "only" do program invocation, such that even its basic primitives like 'cd' and '[' can be just themselves programs.

It's heretical, but, I don't think ls ought to be its own program. It's part of the shell, in particular part of my view of the filesystem. It's not really an app and I don't care if it happens to be implemented that way under the hood.

As a user, I want a shell that lets me navigate the system (possibly multiple sets of systems), with a consistent set of primitives. It can let me invoke applications, and I want an absolute and extremely tight set of controls over those applications, and very specific ways that they're allowed to talk to each other.

I'm running far off-topic but to your original point, I think "ls" isn't really the type of app we ought to be talking about here, just bake it into the shell...... ok I just want to abandon posix and work in new metaphors. That really is off-topic for discussions about a new Linux feature I guess.


> 2. That the shell can "only" do program invocation, such that even its basic primitives like 'cd' and '[' can be just themselves programs.

"cd" is one of a few commands that basically must be built-ins no matter what, since it has to modify the environment of the shell itself. Any external command can only modify its own environment, not the shell's environment.


Just have cd spawn you a new shell that's in a new working directory, solved! Call it the immutable shell.


Have it contain a hash of the previous shell and cryptographically verify its authenticity with all of the electricity in Argentina and you can even get the crypto guys to promote it for you for free!


Ha, great idea. Unfortunately for you I just minted a non-fungible token that describes my claim to its representation in digital form, I own that idea. (Of course you might want to mint a different NFT using a slightly different way of digitally representing it, but let's not worry about that. Mine.)


Now I'm imagining an analogy to continuation passing style: exec (tail call) 'cd', tell it to exec /bin/sh (the continuation) in the modified environment.


Because you want to prevent it from using network? You don't need a full "container", just a network namespace, but it's the same tech.


I agree. Containers in the core are really two things, the latter is optional: unshare and cgroups.


Well, why do you want to run 'ls' in a sandbox?


container is the only way for this to be airtight. Otherwise your malicious script can just call out to wget/curl (which is probably whitelisted) to bypass your firewall.


Or you can just use Qubes OS.


Can someone explain the use case for this? If you control the system, you can use SELinux etc to control the same things. In what cases are you running an app on a system, and you don't control the system, and you don't want a user to access parts of the system? If it's about preventing system compromise, it's better to secure the whole system than one application...


SELinux is for users who have a software package, and want to restrict that software package. Systems like Landlock are meant for developers who want to voluntarily drop what their software can do.

So SELinux is meant to say "I am going to run this program, and I want to limit it to these things. Landlock / pledge / unveil are meant to say "I am writing this software, and I want my software to never be able to do X". This way developers can reduce the impact of any vulnerabilities in their own software. It is defense in depth on the development side.

SELinux can be used if you don't trust a piece of software. But it is generally applied by people who do not know about the internals of the software. In some sense it is treating the software as a black-box, and limiting what the black-box can do.

Landlock / pledge / unveil are implemented by the developer. If you don't trust the developer's intent, then these are not going to help. However, this is a great way for developers with good intentions to limit the impact of any mistakes they made. Notably, since this system is implemented by the developers, the implementers generally have deep knowledge of how their software works. They can consider the internals of their own software.

I expect that, on balance, systems like these are probably more useful. Black-box approaches are inherently limited, and notably expensive to implement. Moreover, it seems like exploiting bugs is a bigger problem than software that is intentionally malicious. Both of these points mean it makes sense to help developers limit the impact of bugs. It seems cheaper, and it targets an apparently bigger problem.


> the implementers generally have deep knowledge of how their software works. They can consider the internals of their own software

This is the weird part to me: if you're a developer, you already control how your app works. You can put any kind of restriction you want in place without landlock. You just program it. Don't want user to access a character device? Test if the file you're about to open is a character device, refuse to open it. Don't want your user to open something in /etc/? Check the file path, don't open that file.

My guess is that Landlock and its ilk are basically abstractions to allow programmers to be lazy? I'm still trying to understand the use case because it feels very much like a "webapp firewall", where the developer doesn't want to be bothered with understanding security, so they slap a thing on their app and tell themselves it's secure, when in reality it's still far from secure. Sandboxes get popped all the time.


The unstated assumption in conversations about computer security is that the software industry is bad at its job. Incredibly bad at its job. Suggesting that developers write correct code will get you (rightly) laughed out of conversations about security.

The problem that solutions like Landlock are trying to solve is "given that this piece of software is going to be compromised, what can we do to mitigate the damage?".

Solutions like Landlock (unlike SELinux/AppArmor/Traditional UNIX) make the additional assumption that if the software gets compromised, the compromise will happen after some set point in its execution (such as after it begins processing untrusted data). Once the program is compromised, the attacker controls the program, not the original programmer. This allows them to bypass any checks programmer put in, since those checks were only ever enforced by the program itself. Landlock solves this by moving enforcement outside of the program and into the kernel. Now, once a program has set up its restrictions, it is no longer possible for that program to bypass them (unless the attacker can also find a kernel exploit).


That makes sense now. It's sort of like an app bringing along its own SELinux policy. My ideal would be a formal specification that dictates to the system what it should allow the program to do, and also how other parts of the system should interact with the program. "I only want to be able to open a file, and the system/other programs should only send me file data from local disks." (I expect Android sort of does this?) It would also be handy to have 'taint mode' at the system level.


It's a second safety net. It probably can be used by lazy programmers, but that's not what it is intended for.

The idea is "make it impossible for my app to misbehave in certain ways". It is a lot easier to enforce this at the kernel (don't need to check every usage of e.g. open() ). Moreover, there isn't really a way to screw this up through a logic mistake.

But most of all, to say with absolute certainty your software does not allow unintended remote code execution is nigh impossible for anything complex. By adding this, you reduce the possible downsides of any such exploit. It also helps you detect such exploits earlier.

It's similar to the kernel using address layout randomization, compiling with stack-guards, having non-executable stack, etc. It is defense in depth.


This is the first I am hearing about Landlock, but a major part of my job is writing SELinux policy.

First, it should be possible to use both this and SELinux as they are stackable.

The main benefit I see to Landlock over SELinux I see is that it is far more dynamic. With SELinux, you need to know what accesses a program will make when you write the policy. Since Landlock creates the rules at runtime, it can be used in cases where you do not know exactly what it needs to do ahead of time. Obviously, this can also be thought of as a drawback instead of a benefit.

For a toy usecase, consider a hex editor. You want to be able to open and edit any file on the system; however you are concerned that a maliciously crafted could compromise the editor. With Landlock, you can let the user select the file they want to use, then dynamically restrict your permissions before you actually open the file. With SELinux, you would need to give the editor access to the entire system because you do not know ahead of time which file the user will want to access.

Granted, this toy example is simple enough that I could come up with a workable SELinux solution if I had to, but sometimes the application actually knows best what it needs to do.


I think this and sibling captures the difference quite well. For your hex editor, you could replace it with an image converter service - each image conversion need to open a source file for reading and a destination file for writing. You could limit the converters access to not read /etc/shadow various static ways. But assuming uploads/malicious.jpg blows up, it's tricky to statically say: deny access to uploads/other_user_medical_bill.jpg.

So the example isn't all that contrived. Run time dropping of privileges is very useful (and common for network daemons etc for a long time - like apache binding port 80, maybe opening log files, then dropping from root to nobody/www data).

But being able to say that "after this point I should never have to open a file" and having that pledge enforced - is really nice.

This is a tool for good programs to behave in the face of bugs, while static measures are a way to enforce access on potentially bad programs.


It doesn't replace SELinux, but it's similar. There are cases where app-controlled sandbox is easier to implement than selinux rules though. I'd summarise those as: app jails are for things apps know about themselves, selinux/apparmor are for things you know about the environment the app runs in.

(not sure if these are possible with landlock)

Example 1: You start `tar` to unpack something to /foo/bar. Ideally you'd tell tar that whatever happens, it cannot write outside of /foo/bar. This is not possible to implement at runtime with selinux, without relabeling the filesystem.

Example 2: You start an app which will open a socket and then never open another connection again (just accept). You could do it with selinux or apparmor by changing context/hat, but you'd have to actually rely on one of them being enabled and setup with appropriate rules. Instead with things like seccomp and other app jails you can say "from now on, deny any listen/connect syscalls".


I don’t know if landlock does this, but what I want is safe application sandboxing. I have zoom installed on my Linux machine. I don’t trust the zoom developers, and zoom only really needs a few permissions. Right now there’s nothing stopping zoom from recording key presses, or uploading my sensitive files to their servers. I wouldn’t even notice. With stuff like this, ideally zoom would have limited access to other areas of my computer so it just wouldn’t be able to do anything nasty without my consent or knowledge.

I always find it weird - linux’s security model obsesses about user accounts and user security. I don’t care about that - I’m the only user! But on the flip side there’s very little effort put in to application / developer specific access control. All applications I run can read, exfiltrate or ransomware all my stuff. That seems reckless and dangerous.

I’m a huge fan of efforts like this - security is never sexy, but it’s extremely important. Let’s make Linux as secure for end users as iOS.


You can run zoom, and skype for that matter, using firejail [1]. There can be glitches now and then, in my cases only with skype IIRC, but it's rare and usually quickly fixed (just update your local configuration from upstream).

[1] https://firejail.wordpress.com/


For the one occasion I had to use Zoom, I created a separate user account and after the meeting was done, I wiped the account.


This right here.

For programs that do not require access to an X11 display, you could also work around the problem of not really trusting the code to not do anything that isn't in your best interest by using `sudo` (or `su` or `setpriv` or ...) to change to an unprivileged account.

Or by having the (non-script) executable file have the SUID bit set, and owned by an unprivileged account on the system.


I think the point is that applications can sandbox (parts of) themselves in a way that works across distros and doesn’t require a specific system configuration. Think of BSD pledge(), or browser sandboxes. (Browser sandboxes currently use seccomp on Linux. seccomp is good for when you want to disallow almost all syscalls, which browser sandboxes can do due to their highly elaborate design, including routing all file and network access through another process. It’s not so good for when you would like to keep interacting with the kernel directly but just want to limit accessible file paths.)


I guess what I'm wondering is, are there cases where there's no other option except a sandbox? I get that you basically are shipping apps to random computers that aren't secure, so this is sort of a big band-aid for apps that can't tightly control for how their apps are used. But in terms of merging something into the kernel, wouldn't the kernel devs prefer to improve adoption of least-privilege/MAC LSMs?


SELinux is pretty inflexible and incompatible with multi-user systems that deploy stuff without privileges - it assumes you can have global policy that applies to everything, which doesn't work well with containers, or browsers, or even just running things out of your home directory.

Hence, kernel devs would like to increase adoption of something that's a little better designed - not SELinux.


>SELinux is pretty inflexible and incompatible with multi-user systems that deploy stuff without privileges

IMO that's not true -- Android has done a pretty good job at it.


I believe the usecase is similar to BSD's pledge. See https://man.openbsd.org/pledge.2

It allows you to drop privileges so that in the event there is a vulnerability in your application, it cannot be exploited to cause further damage outside the scope of the application's normal operations.


Not pledge, but unveil: https://man.openbsd.org/unveil.2 AFAICT, landlock even uses the same underlying technique in the VFS layer.


When I read "restrict themselves" I think of OpenBSD, is this the same?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: