A clever trick with a funny example, but I’m also fine with curl|bash — as fine as I am with “pip3 install thing” or installing “thing.dmg”.
I almost always decide whether to execute someone’s else’s software based on their reputability. Is the project well known? Is it maintained, with lots of GitHub stars? My decision is never based on reading the source code. I don’t have nearly as much of a problem with curling shell scripts as the finger-wagging “actually you shouldn’t do that” crowd seem to say I should.
The one thing that really does give me the creeps is adding someone else’s apt repository to my system-wide one. I just can’t seem to square that off in my head against my more laissez-afire “well do I trust them or not?” attitude that I take for every other installation attack vector. Maybe because it means trusting the third party forever, instead of just once?
The problem with Pypi is the transitive dependencies. You may trust a high profile package but it could pull in any number of projects that are exploited by bad actors.
This is a real problem that I believe will become increasingly commonly used by attackers to break into companies.
Unfortunately, it's not an easy problem to automate the detection. There are some heuristics, but you're basically writing something like Norton Antivirus for Python packages lol.
A sufficiently skilled attacker can make their code look totally innocent[0]. :)
Source: Former security engineer at Uber/Snap that's now working on tooling in this space[1].
There are various degrees of trust. One may trust a package maintainer to write correct, well-designed code and to maintain it. But should one trust that package maintainer to have solid security standards in their code repo, their CI/CD, and their distribution mechanism?
I find that this is more of an npm problem - thanks to the lack of a standard library and a cultural affectation towards small packages with a narrow focus.
A high level python package is more likely to depend upon 3-30 well known packages.
I think the biggest factor is just a much larger overall number of packages in the ecosystem. If there are, on average, 20 popular packages for solving a given problem instead of 3, you end up with exponentially more transitive dependencies.
5x worse on each level, maybe. That composes quite quickly.
Npm packages have really crazy dependency graphs. It's usually on the 1000s of times larger than what you get on most languages, 100s of times larger than python or ruby.
I think there is a trend towards reducing dependency bloat, at least in some circles. For example, the main React package has had this much recursive dependencies throughout the past few major versions:
The latest version of vite (a webpack replacement that's been gaining in popularity) has 14 recursive dependencies. It could be better, but it's still a significant improvement over 300+ package monstrosity you get by combining webpack with everything necessary to reimplement what vite is doing.
If you pick your packages carefully (for example, looking for alternatives this one ↓ instead of the much more popular react-router), it's not as bad as it used to be.
I think a large chunk of this is the tendency in larger Rust projects to split themselves up into smaller library crates, rather than single crate monoliths: single libraries that you use may depend on 5 different crates all from the same project, that count as separate in the number you’ve quoted. On top of that, C bindings tend to live in their own crate, so while you might depend on the openssl crate, it in turn depends on openssl_sys which contains raw bindings (rather than a nice Rust wrapper). That all said, I think crates.io is still at the initial stage of “tiny crates each doing a single thing” that I think every registry goes through to begin with.
No, the problem is bad developers pulling in dependencies for trivial functionally. If there was a `for-loop` npm package bad devs would be pulling it in instead of writing their own for loops. Padding on the left is something if it doesn't exist you write it in a few lines of code yourself. You don't add a package for such trivial functionality.
Nope, this is a bad take, parroted without understanding; if it got moved into the std lib, it was probably useful enough. You can even read why in the original proposal if you comb the archives enough (from https://github.com/tc39/proposal-string-pad-start-end):
> It is highly probable that the majority of current string padding implementations are inefficient. Bringing this into the platform will improve performance of the web, and developer productivity as they no longer have to implement these common functions.
You are neglecting the risk-factor of pulling in libraries from unknown authors on npm vs the stdlib. The package-bloat problem is one of culture, where developers keep neglecting this risk, only seeing the 5 lines of code they save by importing something, without seeing the potential cost and tech debt of having to review, maintain, update and security-monitor this dependency for all future.
Nobody thinks leftPad was not a useful function. The question is, was it useful enough to counter all the risks of npm, probably not. In the stdlib there is no such risk.
Ah, and now we’re talking about the real issue, which was the security risk.
My point has been this whole time that left-pad was not a story of a trivial function needlessly pulled from an external source as the person I replied to had claimed, and it appears you agree. Good!
Here's my theory. Older programming languages force you to think about sub-dependencies: if you decided to use a third-party library, you would have to look at and manually add its requirements to your build system.
But, with npm, suddenly it was trivial to include other packages without having to worry about sub-dependencies. The whole tree just magically added itself to your build, and only if you paid attention to the build process would you discover just how many developers you were transitively trusting.
Well, while yes, automatic dependency management is a really relevant reason why things got that bad, it can't be the only reason.
Programs in other languages with the same kind of tooling tend to stop at hundreds or at most low-thousands of dependencies. Javascript code often reach tens or low-hundreds of thousands of dependencies.
Dependency explosion is bad all around, but JS is exceptionally bad.
An interesting thing is that it should be the other way around.
Repository packages usually are signed with GPG. You can even use HTTP - it does not matter. And GPG could be used to sign packages on another machine, may be even on offline machine with HSM.
Things like pip or curl rely only on HTTPS. There are many agents between developer who built a deliverable artifact and your machine. There's WWW-server. There's some CDN with HTTPS termination. There's NSA which can issue fake certificate and MITM attack you (mostly imaginably threat for sure, but still).
And hacked WWW servers already happened. Transmission website was hacked and their binary was infected.
Now I don't claim it to be a big issue. Malevolent developer is much more dangerous and much more likely to happen. But if we want to compare those two methods - I think that repositories with proper signatures are safer.
At the risk of pointing out what I see as the obvious, but you’re focusing on different attacks. It’s a lot more nuance than “it should be the other way around”. A very valid argument can be made that it’s more likely that you’re installing a dodgy package rather than being subject to some sort of MITM attack.
An apt repository has to be trusted forever: every apt upgrade brings a risk that something new and nefarious will slip in. With curl|bash it’s a one time risk.
The point of TFA is preventing piping curl to bash, not preventing from running bash scripts.
The problem with piping is that bash will read line by line, keeping the connection open while it runs. If the connection fails for any reason, the script will stop, potentially breaking or corrupting things.
This can be prevented by first downloading the file, and then running it.
> If the connection fails for any reason, the script will stop, potentially breaking or corrupting things.
If the script is meant to be piped from curl, and is well written, it will be written so that it first defines everything as functions and then at the very end makes a single function call. This ensures that the script will only do anything if it has been completely downloaded.
For example, the script that rustup.rs tells you to pipe from curl is written in that way.
>it will be written so that it first defines everything as functions and then at the very end makes a single function call. This ensures that the script will only do anything if it has been completely downloaded.
This is not sufficient. More care is required.
lsp_init() {
...
}
lsp_init
If the last line gets truncated between the `s` and the `p`, then `ls` gets executed. Of course `ls` is harmless, but I'm sure you can imagine how it could be worse.
In other words, not only do you have to wrap your script in functions, but you have to ensure that any top-level function / command invocations are named such that they do not become different commands if truncated.
This is unsolvable in general, because the user can have any arbitrary names in their $PATH , such as custom personal utils in ~/.local/bin which can have any unforeseeable name.
It's much easier to just wrap the script in `()` to make it run in a subshell. bash will not run anything until it sees the closing `)` so truncated scripts are not a problem, and it also doesn't have the name collision problem.
> If the last line gets truncated between the `s` and the `p`, then `ls` gets executed.
My shell[0] dies with a error "source file did not end with a newline". This is easily solvable in general, regardless of what's in PATH, as long as the shell works correctly. Valid text files do not end with bytes other than 0x0A. Being truncated at the end of a line, on the other hand, is not manifestly obviously a error.
0: That I wrote, because of bugs other than this one in bash et al, such as $FILE_WITH_SPACES; not just "my shell that I use".
I've heard this theory before-- that a command could get cut off halfway and execute something really bad-- and tbh I'm skeptical that this has happened even one time in all the billions of invocations of curl|bash. It's just not worth worrying about, in the same way that a cosmic ray bitflip could cause a kernel bug that erases my entire drive but in reality I don't spend any time worrying about this.
For you, apparently. Other people do worry about it, which is why they do take care to wrap their script in functions. And my point is there's an even easier and more foolproof way than that.
>in the same way that a cosmic ray bitflip could cause a kernel bug that erases my entire drive but in reality I don't spend any time worrying about this.
Other people use ECC memory because they worry about this, and because it does measurably happen.
All the presence of the content-length header would do is change curl's exit code when the connection breaks. The part where curl output the truncated script and bash executed it would be unchanged.
My problem with curl|bash is that it tends to not take into account my preferences wrt where binaries, config files, etc go. If you’re lucky it’ll take an envar. If you’re really lucky they’ll tell you what it is in the docs. I recently had to install the latest curl from source and it was as easy as ./configure --prefix ….
The apt repo thing is actually why I switched to Arch a while back. In Arch you have most packages being up-to-date by default (because you're on a "rolling release").
There is also the "Arch User Repository" which includes a bunch of useful packages that are easy to install, but with full transparency about where the contents are coming from (you can inspect the script).
They have even released an easy installed for Arch now that you can use, or you can just use Manjaro Linux. Regardless though, I really appreciate this distro for development after many years of using Ubuntu!
curl|bash does't have an obvious UNinstall mechanism. You can pip3 uninstall, apt-get remove, but what do you do for curl? It could give a shit all over your system, /usr, /etc and you wouldn't know what to delete.
But I have the same problem building from source and using make install. There’s often not an easy make uninstall command. And pip uninstall only uninstalls individual packages, it doesn’t clean up auto-installed dependencies while also ignoring dependencies that I had at any point requested to be installed directly
Reputability won't help you if someone intercepts your connection and replaces the script. TLS helps, insofar as the version of curl you're using and its configuration is correct, your machine has the right certs installed, and the remote domain and/or host and/or repo hasn't been hijacked.
If eithet you or the host have been compromised, it doesn't matter where you curl into bash, run pip install or download an exe/dmg/deb file - you're hosed.
Packages are signed, unless you're using a niche, insecure distro or package manager. Of course, that signing key could also be compromised, but those are usually more tightly guarded than web servers, which are compromised rather more frequently (any reputable dev should have their key in a hardware enclave like yubikey or similar).
> Packages are signed, unless you're using a niche, insecure distro or package manager.
You still need to get that signature from somewhere, likely a web server. If your threat model is that you don't trust web servers but do trust the packages and repositories they give to you, then I guess this is true but that seems a little crazy given that the attacker can just change the website you're visiting to host a malicious package, and change the signature it verifies against.
> Of course, that signing key could also be compromised, but those are usually more tightly guarded than web servers,
Why is it reasonable to assume that? I don't think it is, at all. If their SSL certs are comprised, all bets are off.
> any reputable dev should have their key in a hardware enclave like yubikey or similar).
I suspect you will be very very very disappointed at the number of people who publish software that meet the criteria you've outlaid here.
Fundamentally, your argument is flawed right at the beginning because all the attacker needs to do is compromise the web server and change the installation instructions.
> Why is it reasonable to assume that? I don't think it is, at all. If their SSL certs are comprised, all bets are off.
I have no statistics to offer, but it does seem likely to me that TLS private keys gets compromised more frequently than software/package signing keys.
Then maybe a solution is script signing. PowerShell scripts can contain a signature at the end, and the shell will check that signature. It can either be mandated all the time or only if the script originally comes from a remote source (or disabled completely if you don’t care). Since it’s just a comment block, it’s fully backwards compatible to add too
This solves the part of downloading partial scripts, but doesn't solve the problem of malicious scripts. If you download a script from a compromised host, they'll just re-sign the modified script with their own signature. The solution is that you have to trust the script source, and transmit over secure channels.
If they sign the script with their own signature then it’s no longer a script that you trust. I can’t remember if PowerShell automatically trusts a script that’s been signed by a cert issued by someone like VeriSign, but if the system were to be adopted for Bash scripts there’s no reason it couldn’t use a store that only trusts the distro maintainers by default.
If it only trusts the certs from distro maintainers then surely it will be distributed as part of the normal package sources. If you need to add a cert, then the problem is exactly the same as adding a repository to the package manager; if the delivery mechanism of the instructions is compromised you're hosed.
Powershell will accept codesigning certs that are signed by verisign, so the workaround for an attacker who has already compromised a web site is to modify and re-sign the script with a certificate that can be obtained for $60.
> they'll just re-sign the modified script with their own signature
Nope. Of course if someone stores a Code Signing key on the same server as the distribution server then all bets are off, but otherwise it's no possible.
PS CodeSign isn't perfect, but there is enough hoops to jump around for this not to be an easy task for a malicious actor gained access a distribution point.
If I take a script, modify it and re sign it with a valid key, it will still be signed, just not by the original author, and will run.
Even on windows now, package are often signed by individual developers rather than an entity that you might recognise the name of.
> but there is enough hoops to jump around for this not to be an easy task for a malicious actor gained access a distribution point.
If this is an optional step (which it would have to be) then the easiest thing to do is to remove the signature from the script and modify the instructions on how to run the script. If an attacker has compromised the supply chain, unless the original host has done something incredibly dumb, chances are they're quite dedicated and jumping through a handful of extra steps arent going to stop them.
> If I take a script, modify it and re sign it with a valid key, it will still be signed, just not by the original author, and will run.
> Even on windows now, package are often signed by individual developers rather than an entity that you might recognise the name of.
This is implies what you, as an attacker:
have the access to the Enterprise PKI
can issue code signing certificates
somehow your certificate drops in the Trusted Publishers[0] on the endpoint ie you have the access or to the GPO/domain controllers (keys to the kingdom) or to the endpoint (making the whole dance with certs moot)
> If this is an optional step (which it would have to be) then the easiest thing to do is to remove the signature from the script
This would make the script to fail from running in a proper configuration
> and modify the instructions on how to run the script.
And again this requires the access to the endpoint, which makes your assumptions moot.
> If an attacker has compromised the supply chain, unless the original host has done something incredibly dumb, chances are they're quite dedicated and jumping through a handful of extra steps arent going to stop them.
Exactly, which is why you don't "signed by individual developers", don't provide the easy way to access the PKI, don't sign the scripts and yet run them with -bypass.
and the CA who signed the package isn’t a bad actor. But there are known bad actors in the ca-certificates list and worse yet they can sign any certificate.
You may trust them to produce legit software and have good intentions. But it's not really enough - It's hard to use reputation to divine someone's proficiency and diligence in operating a web server without getting it pwned and the published trojaned. Several dimensions of risk / trust are relevant here, and the distribution mechanism can address some of the opsec risks in a way that is transparent to the end user (you).
A platform like PyPI also increases chances of something like that getting detected after it happens, giving you a chance to react in a timely way.
curl | bash doesn't bother me because I do it like twice a year from sites I trust. On the otherhand, the node crowd uses "npx command" all the time where npx will download and execute code from the net. Unlike "curl | bash", "npx command" is something you're expected to do 10s or 100s of times a day. Each one of those times is a chance for you to have a typo and execute code from some randon source like if you type "npm commmand" or "npx comman" or "npx coomand" or "npx comman dargument", etc...
Right, the point of the thought experiment is to inject malicious code only in cases where someone is piping directly to bash without reading it, and not in cases where they might have the opportunity to read it. So in that case you would not inject malicious code. That is correct for the exersize.
On an old web server I ran for someone I renamed the original curl and wgets and replaces them with ones that just logged and did nothing else. After a year I had a nice list of URL’s for naughty scripts bots and exploits! (Server ran a load of Wordpress sites)
Cleverness and sneakiness of this technique makes people laser-focus just the Hollywood-worthy hack, and forget the larger picture that it's about trusting the host not to pwn you. If you trust the host won't screw you, it doesn't matter how many style points they could have scored on something they won't do. If you don't trust the host, don't use it, because reviewing 10 lines of bash won't save you.
• The sneakiness of this can be defeated with `tee` (or a variety of other ways like modified bash or a VM).
• dpkg runs arbitrary package scripts as root, so you still put trust in whoever signed the package.
• When you use `curl|sh` to install binary blob built from millions lines of code, it has countless other opportunities to include exploits with plausible deniability. If the host is hacked, they can put the payload in the binary, knowing you'll review the bash script like a hawk, and then YOLO the rest.
• Targeted attacks can can use your IP too.
• Your distro's overworked unpaid maintainer LGTMs packages with a glance, and won't review millions lines of code for backdoors.
• There is a huge gap between what people imagine they could have theoretically done properly to keep things secure and gather evidence of any potential wrongdoing, and what they actually do.
> people [..] forget the larger picture that it's about trusting the host not to pwn you
It's not about that. It's about pushing back the trust boundary to be as close to the source as possible. Of course your upstream can compromise you if they turn malicious, but having your trust boundary needlessly extend all the way out to the web server is not only unnecessary but also introduces real dangers to users (unless you also consider Transmission's incident to be one of your "Hollywood-worthy" hacks).
I'm glad people are rightfully criticizing this stupid approach. I'm glad most reputable systems use off-line signatures to verify software before running it. Security is not about absolutes. It's about thresholds.
> dpkg runs arbitrary package scripts as root, so you still put trust in whoever signed the package.
This is correct, but in those cases the maintainer is likely a much more trusted individual where more eyes are on the script as the hierarchy of maintainers sign off on things until the point it makes it into a readily accessible repository by end-users.
> Your distro's overworked unpaid maintainer LGTMs packages with a glance, and won't review millions lines of code for backdoors.
The same argument could be made about the Linux kernel itself, yet the system is surprisingly robust and examples of abuse are few and far between.
I don't understand, why do people throw in trash can one of the strongest Linux features: package management.
Left aside security implications of "curl | bash" installation - what will you do if installation fails and you need to roll-back partial changes to system? How will you update this software? How will you uninstall in the future? How will you track CVEs for it, if your package manager doesn't know about its existence and all periodic scripts of your distro can not check for CVEs? How you will update libraries used by this software, are you sure this will not broke it?
Linux doesn’t have “a package manager”, the distros each have a different one. Is there a tool that will generate a package for each district from a single input? How about testing them? It’s a bit easier now with docker but it’s good that a number of things have forked off of Debian in that you have at least a prayer of getting that package right and covering a lot of people.
To be honest, if you site is static-generated (where "generated" includes "typed by you in text editor") this approach hasn't any drawbacks to more complex schemes.
Installing software via running random scripts has a ton of drawbacks to using system package manager.
Change your distro. Or become packager itself. I don't know much about Linux packaging, to be honest, but I found that packaging software for FreeBSD is not much harder than install it locally. If it builds Ok, it is easy to package. If it has problems with build, it will be hard for you anyway - you need to be developer to fix build for you, no matter will you prepare package or not.
Almost all package managers have a graphical front end. I'm not sure how that's not "Normal user" oriented. Windows even has a "Windows Store" bastardized version of a package manager,
A lot of the use cases of those curl | bash scripts are to support non-standard installation, like with unusual distros and user-only. And unusual distros are kind of an unsolvable thing, because people that want them won't want your package manager.
Typical "curl | bash" script requires root, as it want to write to your /usr/bin, /usr/lib and others. You could trick it with chroot, maybe. And maybe not. But in such case you should to discover all dependencies it needs hard way, as you need to put them in your chroot too.
Funny, I thought about this today when installing Bun^. I read the bash script, saw that it was indeed downloading a release binary from the official GitHub repo, then gave up. The binary it downloaded could do absolutely anything when I run it, am I supposed to then go and verify every line of the source, and all its dependencies? Then verify the release package is the same as the one I make building it myself? Then verify the compiler source is a reasonable interpretation of the Zig spec (which doesn't seem to exist)? Then verify the complier binary I used is the same as the one created by a trusted compilers? Then verify all those binaries....
End of the day: the package is from a somewhat reputable source, and you don't keep life changing secrets user-accessible on your dev machine, fuck it. Also, resist installing as root. Figure out what the real permissions the thing needs are and go from there.
Just verifying that the release binary from github is actually the release binary from github you think it is, should be good enough.
The problem with `curl | bash` isn't that the upstream author might be malicious, or have had their working copy of the repo/compiler compromised. If that's where your problem is, yeah, you're kind of stuffed.
The problem with `curl | bash` is that you're not verifying that the thing you've downloaded is the thing that the upstream author actually released, unmodified, and uncorrupted.
Checking a hash is normally sufficient to accomplish this, and isn't overly burdensome or time-consuming.
If you're particularly paranoid, you might want to check the hash against a copy you've obtained from somewhere other than the server you downloaded the release itself from. You should be able to find a separate copy of the hash from a release announcement either from the author's social media, or maybe on a mailing list archive.
(If they're not posting hashes as part of their release announcements, start badgering them to do so.)
Unfortunately it takes awhile to compile Bun from scratch. It doesn’t need root privileges though. To compile Bun completely from scratch, you’d need to compile:
- Zig
- mimalloc
- libtcc
- zlib
- picohttpparser
- sqlite3
- BoringSSL
- WebKit’s JSCOnly port
- libarchive
- lol-html (+ rust)
From there, you also need clang 13, which you might also want to compile from source (though that can take an hour or more). When compiling Bun in CI, many dependencies are compiled from source (excluding Rust, LLVM, and Zig)
To be clear, "then gave up" means I ended up just running the install script. Thanks for your work on Bun! Hoping to use it for a speedy websocket server some time soon.
The first break in this bootstrap chain is mrustc, which is 11 stable versions back. Hope you've got a couple days if you need the latest stable release of rustc.
If all you do is download the release binary, anything could compromise you. If you read the source, an inauthentic release binary could compromise you. If you read the source and compile it, a compromised compiler could compromise you. If you read the compiler source, bootstrap compiling it from hardware code (reading all the source along the way), read the app source, and finally compile it using your compiled compiler, then compromised hardware could compromise you.
Every step along the way you increase the severity of the statement "If I'm fucked, then so is everyone using X". You stop when the set of people using X grows so large than you are sufficiently worthless in comparison.
I agree with your explanation, but actually bootstrapping the compiler might not even be enough, as pointed out by Ken Thompson in his classical essay in 1984 [1] "Reflections on Trusting Trust.
Bruce Schneier already said that in 2006 [2]:
> It’s interesting: the “trusting trust” attack has actually gotten easier over time, because compilers have gotten increasingly complex
Since 2006 compilers have become even more sophisticated, but also much more complex, thus even harder to validate.
Neat! A more foolproof method might be sending a bit of bash that triggers a callback request (ie "liveness check", "latest version check" etc) to origin server, and pauses.
When the callback is received, continue with malicious payload, otherwise after some timeout, send a benign script..
Always safer to first curl url > file, then read through file, then only if all looks good, run the file (under an appropriate user and with permissions as limited as possible, etc.)
Given that an attacker would likely obfuscate shenanigans, I don’t think there is much benefit to glancing over source unless you’re really ready to do a little research project for a few hours.
Whether or not there's any security benefit, I've often benefited from pre-reading install scripts by deciding to install the thing somewhere else, or with a different setting, or noticing that the entire install script is basically just a lot of paranoid noise around simply downloading the binary and putting it in a binary directory, etc.
Some are so elaborate it's hard to poke through what they're actually doing, but a lot of them aren't.
Making an all-in-one install script that doesn't require any arguments involves making a lot of decisions for the user that you may not particularly like. Pre-reading gives a chance to decide whether or not you like those decisions.
The benefit for me is knowing what installs where so that i can later uninstall everything without having to rely on an uninstaller that leaves bits behind. Of course I'm talking about a different scenario where I'm already assuming the script and binaries it installs are safe.
I’m old enough to remember when you had a makefile or even better an rpm or deb.
It was so much better than the windows way of running a random program with little visibility, and so sad when so much software changed away from that method.
To be fair, the package must be provided by your distribution, otherwise it's just another random download, only in a different format.
Even if you are adding a third-party package repository, you just change the downloader. Although you usually get signature checking with that step for free.
And therein lies the issue: It's not the job of the software developers to provide packages for your distribution, but the job of your distribution maintainers.
So, if your distribution will probably not backport $newver of some tool to your release, you are only left with installing it via different means.
If you find yourself always installing software like this because your package repositories don't contain what you want, it may be a good idea to just check out different distros.
There's a big difference there in that even if you do install the odd program from source, the distro likely has you covered in terms of all the libraries needed to run it in the first place.
That's a very different scenario from having to download the dependencies as well.
Having to download dependencies is a very Linux desktop problem. On every other platform (Windows, macOS, iOS, Android) it is customary for installers to come with all dependencies. Most programs also statically link non-system libraries.
An unsigned deb is better than an unsigned script as it keeps track of what files are deployed and you have a version number in a central list from dpkg -l
Sure something bad can still do things, but it tends not to splat files everywhere with no trace.
In practice personally auditing the installation script for every program you're going to use and the installation script for every update is grossly impractical, for the same reason nobody reads EULAs. In the end it still boils down to trust.
It's insignificantly safer. The only time it will help is if an attacker makes absolutely zero effort to hide their attack. If they're clever enough to hack the Rustup website or Brew or whatever then they're going to know not to put their exploit in plain sight in the installation script.
I wasn’t emoting lol, I was recognizing the orthogonality I found strange. Recognizing that partially securing a thing is incomplete isn’t paranoid, it’s just acknowledging the progress is limited.
Spend minimal resources and compromise 80% of users.
Spend an extraordinary amount of resources and compromise 99% of users. Of note is that those extra 19% of users are more security conscious and will likely detect and purge the exploit immediately.
I get so annoyed at the amount of Linux tools suggesting to install something using this method rather than focusing on package management inclusion. Even more so when they try to request sudo access.
I'm with you! If you're serious about your software (read: you want users)... either make a Flatpak manifest, package spec, or something equivalent.
Point is, somebody made something better than this little install shell script. I'll accept pip, I'm not picky.
There is almost surely no reason for $thing to be writing to my system directories. Nobody should be coaching this, it can go wrong in every way.
Binaries and libraries can come from my home directory. You won't find /home mounted with noexec outside of strict compliance environments.
Build services like OBS and COPR make the actual equipment investment of building packages basically non-existent. Roaming repositories for any distribution you could want.
Leaving the one-time cost of writing out the specs... and I suppose maintaining dependency creep.
That maintenance isn't that expensive because you'd be maintaining them in the script anyway. You're just using common domain language
Do these things and you cover 95% of the DSL:
- PKGBUILD (basically the script, AUR means you'll get maintainers)
- RPM
- DEB
Say a fourth significant option comes around, I'll guarantee you the concepts are similar enough.
Macro creep is real, but this is the cost of maintenance work. Give us something to maintain.
Signed,
- A person maintaining several packages for projects with no upstream involvement
Pip/npm/gems are just as bad as are debs/rpms from an untrusted source. Piping curl to sh is no worse than any other way of installing unvetted software from an untrusted source and the only better alternative is either verifying the software installation instructions yourself or relying on a trusted third party to do this for you (e.g. the Debian maintainers)
Those services I mentioned -- COPR/OBS, they can be the trusted sources by your users.
They're signed the same way as your first party packages.
If you trust the developer/maintainer, you can trust these services. It's literally the same infrastructure as OpenSUSE/Fedora/Red Hat.
As a developer you don't have to provide the infrastructure or equipment, simply whatever is to be built.
I'm not suggesting people provide their software by pure RPM or DEB files. The repositories do the important part of actually distributing the software.
If you're on either OBS or COPR you're on the fast path to having the OS maintainers do the work for you
> Those services I mentioned -- COPR/OBS, they can be the trusted sources by your users.
But there's nothing trusted about them is the point, you can ship a deb or rpm with all sorts of scripts running at installation, this is no safer than curl | sh.
If anything it's worse, when you "curl | sh" and it requests sudo you can go "mmm no we're not doing this I will happily risk compromising all my user data but it stops at my system integrity".
rpm or deb you're already running as root.
> If you trust the developer/maintainer, you can trust these services.
And you can also trust their site which you're curl-ing from.
This was the main thing I’m reacting to: installing from something like pip is usually running a ton of unvetted code downloaded from the internet. If you trust such package managers, you might as well curl to a shell.
Apologies, that's fair -- there are levels to my rant. That's a very important part.
I'll take package managers over shell scripts in concept for one main reason: they reduce reinvention.
The supply chain is always of concern, of course.
A shell script benefits from coreutils -- cp, mv, things like that. You're on your own for everything else, I don't trust that.
They themselves are untrusted code -- for all we know there's no VCS behind it at all. A package manager at least offers some guardrails!
With packages on OBS/COPR, you can at least know/verify the sources you're installing were built in a clean (offline) environment from the upstream sources. It's a small, but notable, improvement.
Also consider you need to rebuild your system and you'd like it to look the same after.
Will you find it easier to run 'pip freeze' / 'dnf history userinstalled'... or crawl your shell history for curl | bash and resolve any drift?
Package management isn't realistic unless you only pick a few distros to support -- and then you get criticized for not supporting every distro, some of the distros end up way behind on the version, some of them modify your code to make it fit better in their system...
Lately I've been coming around to the idea that AppImage is the best middle ground we have so far. Basically, it lets you provide static binary for people to download and use as-is if their repos don't have it available but then bundles in special flags starting with `--appimage-` for stuff like extracting the bundle libraries/assets into a directory structure that's the same for every AppImage. It seems like it's a step in the direction towards being able to automate making packages for various distros; It would be a tough design problem but really amazing if the format being expanded to be able to include source files so that each package manager could write their own plugins for converting the extracted AppImage into their own package format (appimage2deb, appimage2rpm, etc.). Maybe AppImage isn't the right basis for this sort of thing, but instead of trying to drive adoption of distro agnostic package managers which will face resistance from distro maintainers, I feel like the right solution would be something that provides for distros what LSP provided for editors and what LLVM provided for compiler authors. We're not lacking in package managers as a community, but we really could use a standard protocol or intermediate layer that _doesn't _ try to replace existing package managers but instead works with and relies upon them in a way that benefits everyone.
It’s worth noting that some big companies have something like what you describe. I was building packages for one and it’s basically reverse engineering however something is packed (e.g an RPM) and making it build/bundle in the way big corp operates.
It’s a lot of busy work but we guaranteed that we had all dependencies in house and that the same package had the same layout of files across every OS
I guess my hope is that rather than requiring a bunch of hardcoded boilerplate for each package management solution, we could come up with a reasonable set of primitives that they'd all need and then find a way to express them agnostically. The power of LLVM and LSP are that it's not just one entity producing all of the implementations, but that each side of the communication can contribute the stuff that's specific to their use case so that no one else needs to worry about those internal details. If I write a new programming language, all I need to do is implement a language server client, and then everyone can get plugins for pretty much any editor they want without needing to do a huge amount of work. It's possible something like this exists for package management right now, but my current impression is that the only products that try to provide something like this would not be as easily extensible to adding new distro package formats that might be invented but instead hardcode a few of the most common types (debian, rpm, etc.). The key part that seems to be missing is a standard for what the bits in the middle should look like; what's the package-level equivalent to LLVM IR or the LSP?
Author should not be packager. Distro users/maintainers should be packagers.
Look at FreeBSD ports: there are more than 25K packages, most of them with latest upstream versions. How much of these packages are prepared by authors? Maybe, 1 in 1000 (I'm optimist, I know). NetBSD's pkgsrc is the same. It works.
Authors better stick to traditional UNIX-style build system, which allows to pickup dependencies from local system without problems, and all distro-specific thing will be done by distro guys.
I've ported 10+ software packages to FreeBSD ports in last 20 years, on principle "I need this, it is not in the ports yet, make new port for it". Typically it takes 2x-3x time to single source build, i.e. one day top, if it is not something super-complex like KDE and it is possible at all (i.e. not very Linux-specific when FreeBSD doesn't have required APIs).
Modern build systems like npm and maven, which want to download and build all dependencies by themselves, are problem for this, I admit.
The alternative is something like AppImage. The benefit of curl | bash is that it's a command that can be run on any distro and just work. There is no need to worry about how the Linux ecosystem still hasn't agreed upon a single package format.
I've worked with people creating curl scripts to install software and most of the time it should scare people. At least with package management there is some community validation and version history of what changes are made, as well as checksum. I'm not saying you should discount all installs using curl, but you should at least check the script first and see if you're comfortable proceeding.
One of the the things I love about Gentoo & Arch is that packaging for them is incredibly easy. At least, as long as upstream isn't doing anything insane. But this means that I as a user can wrap an upstream package, without too much of an issue. (E.g., if you have a binary tarball that can be unpacked to, e.g., /opt. Even if it has to be built, so long as your dependencies are clear, that's usually not too hard. Some languages fight package management a bit more than others, though.)
I mean, you're right -- but security and maintainability aside doesn't it feel odd to advocate against the use of a universal method and FOR the use of one of many hundreds of package managers for NIX operating systems that claims to have gotten it right?
Adding maintenance overhead to a FOSS project to support a package manager is one thing, adding support for every Flavor Of The Week package manager after that initial time investment is tougher, especially when the first one is no longer en vogue.
tl;dr : the thousands of ways to package data for NIX creates a situation in which hurts maintainability unless the package maintainer lucks into picking the one that their crowd wants for any length of time. Piping data from curl works just about anywhere, even if it's a huge security faux-pas waiting to happen.
semi-unrelated aside : it strikes me as humorous that people on that side of OS aisle have cared so much about pipes being a security issue for years and years, whereas on the MS side of things people still distribute (sometimes unsigned) binaries all over the place, from all over the place, by any random mary/joe. (not to say that that's not the case on the nix side, but it feels more commonplace in MS land, that's for sure.)
> If I give you an rpm package, that isn't really any better. It can run any script during installation and you'll already be running it as root.
But it's not about you giving an rpm package at will. It's about the distro including packages in its official distribution and many people installing the very exact same package. Instead of people randomly pulling install scripts from the Web which can, for all we know, at every curl, be fetching a different install script.
In addition to that Debian has an ever growing number of packages which are fully reproducible, bit for bit. All these executables, reproducible bit by bit.
When I install a package from Debian I know many people have already both scrutinized it and installed it. It's not a guaranteed nothing shady is going on but what's safer:
- installing a Debian package (moreover reproducible bit for bit) which many people already installed
- curl bash'ing some URL at random
Anyone who says the two offer the same guarantees is smoking something heavy.
It's all too easy to sneak it in a backdoor for, say, once every 100 download when you moreover detect, as in TFA, that curl bash'ing is ongoing. And it's hard to catch. And it's near impossible to reproduce.
When you install from a package that's full reproducible: there's nowhere to run, nowhere to hide, for the backdoor once noticed. It shall eventually be caught.
Here's why it matters (FWIW there are tens of thousands of Debian packages which are fully reproducible):
consider the broader class of attack this article is demonstrating: stealthily delivering different payloads to different requests. i don’t know about rpm specifically, but most new-ish package managers do actually ensure this more strongly than any curl-based approach: a hash of the 3rd party content is provided through some more secure chain (e.g. directly in your OS’s or language’s package database, or signed by some key associated with one of those and which you’ve already trusted).
yeah, if the package is delivered through the same channel as the bash script, and not anchored by anything external, you lose those benefits. but even hosting the package contents through pip or cargo or AUR or just unaffiliated and manually synced mirrors is a (relatively easy) way to decrease that attack surface.
It's sometimes preferred because package repositories don't always include the updated version of the program, and saves the tedious work on uploading to every package repository like Arch Linux or Debian's.
For self-contained stuff, there is AppImage that's just a filesystem subtree [1], or a Docker container. The latter doesn't need Docker, or even Podman; you likely already have systemd, and it knows how to run a container.
If you want to depend on system-wide stuff, or, worse yet, provide shared system-wide stuff, it's time to create a proper OS package (a bunch of them: Debian, Ubuntu, Fedora, Nix, etc).
The curl | bash approach has only one upside: you can store the script, inspect it, then run. I did that a couple times. Because otherwise it's a pretty scary operation, a very literal RCE.
> The curl | bash approach has only one upside: you can store the script, inspect it, then run.
Not having to waste countless hours on whatever distro's package format of choice is a pretty big upside.
And those are also RCEs if you're not upstreaming the package (which you likely are not because that increases the amount of time needed by an order of magnitude) as most package managers have multiple scripting points which run arbitrary code, and usually run as root already.
… you don't have to be part of the distro's official package set.
Ubuntu, Debian, Gentoo, Arch, etc., support third-party repositories and packages with bad licenses: the user adds the repo / installs the deb / etc. Pacman, in particular, even has direct support for such, and calls them out (to help ensure the user knows, and perhaps reads, the license).
Then I know I can gracefully uninstall the package by just asking the package manager to do that.
(You don't have to unvendor libs, either: if you install into something like /opt/$pakage_name, you can keep vendored libs in there. You should unvendor them, though.
Yeah, downloading stuff from the Internet in the middle of a package install is definitely harder with some package managers, but IMO that's a poor practice.)
I agree with your sentiment, but do any of those package managers prevent some random repository from adding Firefox version 190 which will install on next apt upgrade? Not that they need to - presumably any package I’ve installed already basically has root access.
Yes. Apt has preferences and pinning. Basically the official repositories by default have higher priority than other repositories. You can change the defaults, but you'll know when it happens (because you have to do it yourself).
I have been getting tired of node dependencies and not knowing what I'm running or when it will break. I make simple things for small number of users and started playing with just writing my own CGI in C++ or bash behind thttpd. This appears to be working very well so far and runs on a vanilla Linux install with deps just on system libs and gcc. With all the pain and unsafety of deps, this might actually make most sense. What new vulnerabilities am I inviting? Why aren't we still doing things this way? It seems... much easier, perhaps not to make it but to secure and sustain it.
Along the lines of "curl | bash" detection, one can add "sudo -n" to see if the person has passwordless sudo. In my own testing I have found that many people have passwordless sudo.
Even without sudo it turns out that people have bad posix permissions throughout their filesystem and one can hijack many things including existing SSH tunnels because of control-master. Even without sudo one can hop around a persons production network riding existing SSH channels. Many devops people have SSH multiplexing and control-master enabled in their SSH client. Then it is just a matter of finding bad permissions in production environments which is also often the case. One rarely needs root to download or even modify database contents as the database credentials are often stored in an environment file that gets sourced by automation.
The biggest problem here I see is that companies rarely let their penetration testers a.k.a. red-team into production. Corporate leaders would be taken aback at the findings.
But why can’t I copy the URL into my browser? Why discriminate on user agent? Why couldn’t it be https://clickhouse.com/install.sh - so that it would work with browsers and curl alike?
> - the deb, rpm, tgz, and docker packages are also provided, but only as advanced usage;
How are deb and rpm advanced? How do I uninstall something installed with your custom script? Also, why are the RPM packages [0] still using /etc/init.d and not systemd?
I used to do this for a product... I was finding that some of it was apparently happening asynchronously. I had a separate shell function I grabbed from the first, with all the functions. I eventually put a variable at the bottom of it with a sentinel value, and from the first script, would block until that variable got populated, before running the main function.
I think I'd always do it as an inline script. Admittedly I was slopping a lot of mud on the wall back then as a tactic.
If dependency hell is pretty much unavoidable at this point, is the best solution for everyone to have something like Little Snitch or similar installed that monitors for unusual outgoing traffic so that the hackers can't get away with it for very long before getting shut down? And to do all high risk activity like coding in VMs?
wouldn't it be a whole lot easier for the script being curled to just detect it was running as part of a `curl | bash` pipeline and bail out with a message?
if [ ! -t 0 ]; then echo "script must be run interactively, etc."; exit 1; fi
of course that's equally useless because as someone else said, the root problem with doing `curl | bash` in the first place is that you are putting the trust in someone else's server, so you can't fix that "server side". what the article is proposing is a whole lot of magic detect `curl | bash` from his safe server, where's it's already safe.
> Installing software by piping from curl to bash is obviously a bad idea and a knowledgable user will most likely check the content first.
I’ve yet to hear a coherent explanation why this is any worse than installing software from the internet in general. It’s not like you check the source of all software you install.
Before SSL certs were as ubiquitous as they are now, we used to warn about the danger of curl to bash for unsecured URLs, as it's a vector for a DNS spoofing attack.
Now that nearly all URLs are HTTPS with valid certificates, the remaining risk seems to be that the host could be intentionally or unintentionally doing something destructive.
Sure it would be great to review the code of every install script before you run it, but as you allude to, it isn't practical.
Main reason I will download and examine instead of blindly run is that often the install script will fail to work as advertised or it will mess up my desired file system conventions. In both cases I'll need to check and perhaps modify the script so it must be bodily downloaded prior to executing. While looking for these problems I also check for obvious malicious instructions.
I guess it's a question of effort on the side of the person doing the trojan - if enough suckers just copy those snippets regularly you can be lazy and just do it like that, you don't need to go through the hassle of creating fake Ubuntu PPAs with trojans or distribute a big repo on GitHub and risk someone seeing the trojan part.
let’s say some software has 100,000 users. the maintainer goes bad and wants to steal some users’ bank passwords to profit.
if they ship a keylogger to every user, the odds of being noticed before they’re able to cleanly get away are substantially lower than if they ship that to a subset of users. so they may prefer to scam only 100 users, chosen by delivering a malicious payload to only every 1000th curl/https request for the source code. even if one of those users notices, said user will have a tough time confirming and attributing it.
now try doing that with a modern package manager. you can’t, because the package manager ensures every user gets the same code. you can still deliver different experiences at runtime — but you’re not likely to have the superuser privileges needed to run a leylogger or read ~/.ssh/id_rsa, etc, at that point.
it’s a safety in numbers game. i’m sure you play that game elsewhere in society. i won’t say it’s foolproof, but i’m not sure why it would seem incoherent to you when applied to the digital.
> you can still deliver different experiences at runtime — but you’re not likely to have the superuser privileges needed to run a leylogger or read ~/.ssh/id_rsa, etc, at that point.
Keyloggers are trivial to do in userspace Linux via LD_PRELOAD attacks[0], and typically your user account has permission to read ~/.ssh/id_rsa.
As a side note: the culture or installing stuff via curl is like a virus. If you check the npm docs, they say "We strongly recommend using a Node version manager like nvm to install Node.js and npm". Then if you check nvm docs, the primary installation method is curl to bash, and not even from their official website - it pulls stuff from github...
Seems like the best solution is to not use this technique. Been touching linux servers for 10+ years, and have never had to use it. I definitely used it when learning linux though.
I curl to files for review when I need / want to run a remote script. A bonus is you still have to `chmod u+x some_remote.sh` before it can be run, so it would be extremely difficult to accidentally run.
Having said that, I write a lot of JS / Node. When you npm (a package manager for NodeJS) install something there could very well be some curl commands piping to bash or sh.
Makes me think of an idea -- and maybe it exists -- create a npm package whose only purpose is to run checks on the code of the other packages being installed to ensure they are "safe".
I almost always decide whether to execute someone’s else’s software based on their reputability. Is the project well known? Is it maintained, with lots of GitHub stars? My decision is never based on reading the source code. I don’t have nearly as much of a problem with curling shell scripts as the finger-wagging “actually you shouldn’t do that” crowd seem to say I should.
The one thing that really does give me the creeps is adding someone else’s apt repository to my system-wide one. I just can’t seem to square that off in my head against my more laissez-afire “well do I trust them or not?” attitude that I take for every other installation attack vector. Maybe because it means trusting the third party forever, instead of just once?