Hunting for Malicious Packages on PyPI

jwcrux · on Nov 13, 2020

Hey friends! Author here.

If you're looking for a tl;dr you can find one on Twitter (with pictures!) [0]

This research was a blast to do, and I learned a ton. Happy to answer questions!

[0] https://twitter.com/jw_sec/status/1326908628411047937

ericholscher · on Nov 13, 2020

One of the PyPI maintainers noted:

> This is a great approach to detecting malicious code execution in Python packages.

> ... anyone want to fund making this part of @pypi?

https://twitter.com/di_codes/status/1327121326734241797

I think this is an obvious place that someone in the ecosystem could apply money and make their supply chain (and everyone else's) safer.

woodruffw · on Nov 13, 2020

For more context: this kind of work could fit nicely into the malware auditing system that was designed and implemented as part of the RFI that the blog author originally linked to[1].

Most of the infrastructure on the PyPI side is in place[2], but the current checks are mostly proofs of concept/exercises of the new APIs to ensure that they don't atrophy.

[1]: https://discuss.python.org/t/what-methods-should-we-implemen...

[2]: https://github.com/pypa/warehouse/tree/master/warehouse/malw...

jwcrux · on Nov 13, 2020

I've been in touch with folks from the Open Source Security Foundation [0] who is interested in making this a centralized service.

I'm a big believer that functions like this should be centralized under a foundation like that, and have really close connections to package manager maintainers so that we can work together towards solving the problem.

[0] https://openssf.org/

codezero · on Nov 13, 2020

Something that just occurred to me - has anyone checked registries for owners with email addresses on either expired domains, or now available public domain addresses?

This seems like a ripe angle for package take-over.

kowlo · on Nov 13, 2020

Potentially catastrophic... they've introduced 2FA (https://blog.python.org/2019/05/use-two-factor-auth-to-impro...) with more to come - this may help!

kortex · on Nov 13, 2020

> I still don’t like that it’s possible to run arbitary commands on a user’s system just by them pip installing a package.

Is there a build system out there that doesn't have this feature? Pip is both a package manager and build system since many packages are compiled at install time.

pletnes · on Nov 13, 2020

The author refers to install time, not build time.

I thought pip did not run code when installing .whl files? A —-wheels-only option would then be useful.

x1798DE · on Nov 13, 2020

You are correct that installing a wheel doesn't execute arbitrary code.

If you use `--only-binary :all:`, that is equivalent to --wheels-only. Though not all packages ship wheels

nmstoker · on Nov 13, 2020

Looks like a great initiative and glad to hear this is getting attention.

The technique of observing syscalls has clear benefits. However might there be ways of evading this simply by setting up a some kind of delayed process so the syscall doesn't happen during the observation window or is only triggered rarely or on certain combinations that might not typically be tested (meaning it could still be caught in theory but the chances are much lower)?

jwcrux · on Nov 13, 2020

So there are two things to consider here:

1) The “observable window” is the entire installation time. If they make installs take forever, that’ll affect everyone which should raise alarms pretty quick.

2) The conditional execution is possible but the installation is done using a vanilla alpine container which will match many legitimate hosts too. And any fingerprinting activities that involve syscalls would be detected in the process.

All this to say, there’s always room to continue raising the bar!

bigiain · on Nov 14, 2020

Evil-me is thinking "So I need to check for the existence of super common but not core modules before I run my exploit code, so a vanilla environment never runs the code"...

jwcrux · on Nov 14, 2020

> "So I need to check for-"

That's the thing. If we're watching syscalls, we see these checks. These would be things like attempted file-reads. Would they be enough to set off alarms? Maybe, maybe not.

This is generally the cat/mouse game of malware detonation in general. There are attempts to make sandboxes appear realistic, but I'd argue that our use case is even simpler since running commands or making network connections during installation is not a normal thing. It might be benign, but it's abnormal enough to warrant investigation.

There will always be ways to try and get around the system, but I'm pretty firm that this will significantly raise the bar which is a Good Thing.

pkage · on Nov 14, 2020

Usually in a containerization environment all the requirements are installed at the same time, which may or may not make that kind of introspection difficult.

On a dev machine though... diabolical. That might show up in a syscall, but perhaps not obviously enough that it sets off alarms.

1e · on Nov 13, 2020

i really enjoyed this write up. i am glad someone far smarter than me is thinking about this problem.