Hacker News new | past | comments | ask | show | jobs | submit login
PyPI halted new users and projects while it fended off supply-chain attack (arstechnica.com)
37 points by consumer451 29 days ago | hide | past | favorite | 46 comments



Am I the only one who is scared by the entire ecosystem of "drag random crap and dependencies off the Internet from who the hell knows"?

I've had a couple of minor incidents with NodeJS dependencies over the last few years on this front which sort of opened my eyes to running untrusted code. I tend to err on the side of distribution packages since, with the restrictions that imposes on what I do.


You're not the only one!

This is why I strongly prefer languages with a comprehensive standard library. I trust my Python/golang/dotnet/whatever install, so the number of third party packages I need to pull in is much smaller and more easily audited.


I build python servers. I've replaced most of the dependencies with just standard library code.

Removed Flask for http.server Removed requests for urllib

Removed requirements.txt Removed pip

Life is good


I use urllib.request to avoid having a dependency in little scripts, but I can't say I think that requests is a big supply chain risk.


It may not be the request package itself but a sub dependency. It only has a few from looking at the repo but something like flask can have a lot (especially with the plugins) and that’s a mainstream and well supported library


What is your audit schedule/process?


Well it starts with a culture of taking responsibility when adding a dependency. We want people to look at child dependencies and just think more critically about them before checking in code. This is mostly to prevent bloat but sometimes we catch unmaintained packages or accidentally pulling in a random form rather than the community supported variant, for example.

We also run SAST in our gitlab pipelines and include reviewing the output as part of the code review and release process so that we can catch CVE's that may not have been known if when we first installed a package.


Some packaging ecosystems are more risky than others, primarily because they allow running arbitrary code at some point during the install cycle. Node and Python being two notable ones, especially considering how commonly they are used[1]. Others do it more safely where, at a minimum, no code can run until the library is imported and run with application code.

Depending on how and where you deploy, you can mitigate some of that by isolating the installs and not keeping sensitive information there (e.g. in a docker image).

[1] - I don't follow node/npm closely anymore, so this may have changed.


I'm not convinced of the additional danger in letting packages run code during installation. You install them because you want to use them, so the code they ship will get run anyway. Are there really common environments where the final product only gets run with less permissions than the package manager?


Most any deployment based setup will have a separation between the code that is executed on the developer's machine and the code that is run on a built application?

Yes, it is common for developers to have some unit/build testing setup available so that they can run the code locally, but even that should be done by a system that makes sure anything actually running during the test is declared as part of the project workspace.

More directly, it is common for many package managers to try and do a global install of some things. If not global for the computer, for the current user. Thankfully, this is changing a lot. (At least, I think it is?)


How does that add any danger? You're pulling in code because you want to use it. If the package is malicious and your package manager doesn't have post-install scripts, the malicious code is just going to run 5 seconds later when you import it and start working with it.

In the case of NPM with post-install scripts disabled, you'll simply get pwned when you `npm start` rather than `npm install`.


Honestly, I'm going off memory on python. In the olden days, it was not at all uncommon for devs to want the ability to "sudo pip install foo".


Deployments are irrelevant for this conversation; libraries get to run code there anyway. For code execution during installation to be an attack vector, you'd need an environment where npm install gets run with _more_ permissions than npm start (or the equivalent for other package managers). I can't really think of an environment where that is the case. Usually the build and package manager is more restricted than the application, not the other way around.


Right, my understanding is that this was not too uncommon for some older packages? Especially in early python, it was not too uncommon to accidentally install to the whole system, no?


The issue isn't when you get what you're wanting. The issue is when either you accidentally get something you didn't want (such as type-o squatting - a not too distant issue on PyPi) or a package was published maliciously (imagine bumping a patch version and it being compromised) - a few fairly recent issues on npm.

I agree that the happy path is ideal and hopefully the common case. Regardless, anything with access to production secrets for my team is run on the most minimal image possible (and none of those secrets are available during dependency installation and compilation).


> Are there really common environments where the final product only gets run with less permissions than the package manager?

Yes, only running the final product in a VM/container is pretty common.


And this is a very sensible precaution where developer environments have SSH keys and other privileged credentials available and exposed in predictable locations, ready for exfiltration over the unfiltered internet connection that developers insist on having available.

Hopefully the VM/container run environment is also in a network-isolated environment too, so it can only be accessed and invoked through the expected routes, and it can't make arbitrary network calls to external hosts that haven't been manually reviewed and approved.


The types of secrets ought to be a bit different and less consequential on a developer's machine. If they're not, that's a pretty big red flag. It's one thing to gain access to clone some repositories (e.g. ~/.ssh) but an entirely different thing to get production aws credentials. Not to mention all the other protections that should be in place that mitigate the fallout (for example: no pushes to main/master/prod branches, requiring status checks and reviews before merges, etc).


this is still true of node/npm. It's also true of Cargo (Rust), Nuget (C#), and a handful of others. I'd say it's probably the _norm_ for most ecosystems to allow some form of pre/post-install execution.


For what is worth in nix after the code is downloaded the code is built in a sandbox without network access. So one does have a viable alternative for Rust.

And is true that most package managers for popular language allow arbitrary code execution during the install process. That is how husky adds git hooks to the developers machines.

For example in Ruby I need to patch the Kafka gem, karafka because it downloads, builds and stores librdkafa.so in the gem's directory.

I understand that this as well as the husky example comes from a desire to make developer lifes easier but I'd rather we erred on the side of caution. Making sure that software builds without access to the network and without being able to modify your system (ej. Adding files to $HOME)


Hey, you have to patch nothing. Nix support was merged to karafka two months ago.


Looks absolutely insane to me too. Recently I had to deal with a NPM-based theme for a minor website. It downloaded 140 dependencies. Most of them were trivial like "string-width", which finds the visual width of a character (some Unicode characters may appear as double-width). The package consists of a rather simple 25-line function that tests a character against several if-s in a loop. A function that trivial does not justify adding +1 to the potential sources of supply-chain attacks.


https://www.npmjs.com/package/is-odd

It's stuff like this which makes me hate npm and the modern Javascript ecosystem in general.

You start a new project and pull in a bigger thing like Quasar and npm fetches hundreds of packages and the summary tells you that 5 of them have vulnerabilities in them before you even get started. What am I supposed to do with them?

But regarding the string-width problem, im not sure if you've come across "What every developer should know about Unicode" [0], because it appears to be a problem which requires an external package.

[0] https://news.ycombinator.com/item?id=37735801


this is hardly restricted to the javascript ecosystem. it's a problem with any language where third-party libraries are readily available.


Unfortunately JS people believe this is a feature not a bug.


The current system is crazy to me as well, I've come to refuse to use online dependencies in my projects. It's brittle and it's a security nightmare. If I need a library, I look for a self contained one and I vendor it into the project.

In other words I bypass the package manager but I still appreciate the ability to browse the online catalogue. :)


This is how I used to do stuff when I was working on .Net classic years ago. Check the damn dlls into SVN which was all in-house.


You're not the only one. I have become hesitant to use applications developed in certain languages and ecosystems (like Python) because they encourage the prolific use of unvetted code not under the control of the developer, although there is no real safe space anymore. It's a cultural shift that has affected almost everything.


From your perspective, the application itself is untrusted un-vetted third-party code until you yourself have vetted it, no? The fact that you also have to vet dependencies is meaningless implementation detail. You’ve got a bucket of code, you want to run it on your machine, and you decide that you want to vet it first. How does a package boundary change that? You’re describing security theatre.


> The fact that you also have to vet dependencies is meaningless implementation detail.

I disagree. minimizing dependencies means reducing the risk exposure. That's not meaningless.

What I'm really talking about is a cultural change where package managers have made it so easy to just throw a package at a problem that devs tend to do this too much. People using packages to do simple things, people using packages without understanding what the packages do, etc.

Every time an application uses a library or package of any sort, that decreases the security of the application. So it's a tradeoff, and I think that too many devs ignore or forget that there's a tradeoff here and just go for "install a package/library to do it" as if it were cost-free.

Minimizing the use of external code is not security theater at all. It's good practice. I think avoiding applications that use languages and platforms where lots of external code is common and expected is a reasonable thing. It's absolutely not a complete security solution, but it does reduce the risk.


Im horrified by the extensions I've got installed in VS Code. They are so good but one day one will go rouge and install malware on my machine.


Please don't get me started on VScode. That is basically a remote execution environment. Just adding an extension and opening a file has the cheek to ask if what is on your filesystem is trusted immediately after downloading several tens of megabytes of who knows what for language server and parser.

I was using it for LaTeX stuff and programming but I shelved it. I'm on a Mac so it's system provided vim for me and the Apple command line dev tools, mostly for git, llvm and make. It works and I don't need or use any extensions for it or pull anything else onto my computer.


How could you think that you’re the only one? This has (thankfully) been a widely discussed attack vector for many years now, and even longer within the more switched-on circles.


I work in a bubble with people who are ignorant.


We need a neutral code signing vendor that does international identity verification. We can fix this tomorrow, but the big three tech corps all make money from charging $$$ for what can be done with an openssl command.


I'm one of the co-founders @ Phylum. We've been tracking this campaign [1] (along with several other unrelated ones). The collective group of security researchers (Shoutout to https://vipyrsec.com/) in our Discord have been reporting these packages to PyPI for removal. If this is something you'd be interested in helping with, please join our Discord (https://discord.gg/Fe6pr5eW6p)!

Besides the gigantic analytics platform we've constructed to monitor supply chain attacks targeting open source, we've also open sourced a few tools to better mitigate attacks targeting developers. For example, a sandbox to minimize the impacts of malicious packages during installation [2] (with a pre-check to our API for known malware), which allows you to do things like

    phylum npm install <pkgName>
Happy to answer any questions about this campaign or others we've uncovered!

1. https://blog.phylum.io/typosquatting-campaign-targets-python...

2. https://github.com/phylum-dev/birdcage


I think people should stop using PyPI altogether. It's full of abandoned garbage and malware because there's really no filter on who can upload what. I don't even use it to search for packages anymore.

If Linux distro packaging worked the same way, Linux would be a hellscape of malware and weird random broken apps. I'd rather use old software than constantly worry about fat fingering a package name and ending up with a crypto miner on a thousand machines. Thank goodness for that culture of vetting packages.


Fun fact: Unix used to work in slightly this way. You’d see something neat on comp.sources.unix, download it, and if it was useful, deploy it on your site for your local users. A bit later, huge FTP sites with everything ever written for Unix were routinely used as package repositories are used today. Modern Linux (and other Unices) distributions, with maintainers, strict inspection, limits on what programs can do, etc. came as a reaction to the obvious problems with that. It always seems to me that language-specific ecosystems like PyPI (RIP Cheese Shop), NPM, crates.io, etc. have not yet learned this lesson.


Isn't this the case for most programming languages' package indices? crates.io for Rust, the NPM registry for Javascript, etc. They are all public in the sense that anyone can just create an account and upload a package.


Maven Central is notoriously fiddly to get an account for - it require a manual registration step and you have to GPG-sign all your packages. Seems like that barrier to entry may have been useful.


> I think people should stop using PyPI altogether

Sorry if this is a naive question, but what would the alternative be?


What an absolute joke. What’s your alternative?


Dependencies are liabilities, and any package manager that includes something like build.rs, setup.py, etc is vulnerable to RCE. None of this is news, but it's unnerving that a vast section of our industry seems to either be totally unaware, or totally apathetic.


There's a sort of "layer 0" problem above this which is: requiring 1000 different libraries from 900 different publishers to make a hello world program is going to end in tears. The modern languages need to consolidate into a few stdlib type libraries. Once that's done the problem of assuring the supply chain becomes more tractable.


Python, one of the two languages mentioned in the parent comment, has if anything a standard library that is TOO extensive, as made evident by the ‘dead batteries’ stdlib removal effort.


I’ve been building Packj [1] to detect such attacks. Packj can flag malicious, abandoned, typo-squatting, and other "risky" PyPI/NPM/Ruby/PHP dependencies. We use static, dynamic, & metadata analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc.) OR presence of vulnerabilities.

1. https://github.com/ossillate-inc/packj




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: