Dozens of malicious PyPI packages discovered targeting developers

erustemi · on Nov 2, 2022

I think a proper way to solve this issue, not specific to python but languages running in a VM in general, would be to have some sort of language support where you specifically define what access rights/ system resources you allow for any given dependency.

Example of defining project dependencies:

  {
    "apollo-client": {
      "version": "...",
      "access": ["fetch"] // only fetch allowed
    },
    "stringutils": {
       "version": "...",
       "access": [] // no system resources allowed for this dependency, own or transitive
    },
    ...
  }

It would probably require the language to limit monkey-patching core primitives (such as Object.prototype in javascript), and it would be more cumbersome for the developer to define the permissions it gives to each dependency. These required permissions could be listed on the package site (eg npm or PyPI) and the developer would just copy paste the permissions when adding the dependency. But if you upgrade a dependency version and it now requires a permission that seems suspicious (eg "stringutils" needing "filesystem"), it would prompt the developer to stop and investigate, or if it seems justified add the permission to "access" list.

c0balt · on Nov 2, 2022

At the beginning the permissions aspect of deno[0] was actually on of the major selling points for me. The approach used there was to begin at zero and offer granular permission control, e.g. `--allow-read=data.csv`, for filesystem, network etc. I would love to have this for, e.g., python or npm packages.

[0]: https://deno.land/manual@v1.27.0/getting_started/permissions

jens0 · on Nov 2, 2022

Doesn't this only apply to the entire process? Not the individual dependencies, right? Just confirming, Deno was my first thought with this, it requires the developer to deliberately enable permissions needed.

SahAssar · on Nov 2, 2022

Yes, it applies to the whole process. It's incredibly hard to sandbox dependencies individually since you don't know how your code or other dependencies interact with it. If you want you can run dependencies in a worker process and sandbox that tighter, but that is quite a bit of work.

juancampa · on Nov 3, 2022

This is exactly what I've done for Membrane[0]. It's capabilities based, even to get the time (and thus introduce non-determinism) you need a capability. Dependencies run as separate processes and everything is orthogonally persistent. It's a typescript/javascript system for personal automation built entirely within VSCode. Stay tuned, I'll be posting a video this week.

[0] https://membrane.io

uhhyeahdude · on Nov 3, 2022

Hi there! I just wanted to let you know that I read the blog posts, and membrane sounds extremely cool. Ambitious, though. If you don’t mind a small bit of feedback: it would be encouraging to potential users or testers to see some semi-regular posts related to development. It would also be great to see how membrane might work to build a tool using current APIs. I know there is video forthcoming, and perhaps this will be addressed. I didn’t look at the GitHub, where I probably could glean some additional info. But from my perspective, a development blog builds trust and anticipation. It’s also a great way to check your assumptions (or have them checked, rather).

Good luck with the project. I hope it delivers, because I’d love to use it. Signed up for the mailing list.

ashishbijlani · on Nov 2, 2022

That’s exactly why I use strace-based to sandbox ALL dependencies: https://github.com/ossillate-inc/packj

louislang · on Nov 2, 2022

Phylum's extension framework is built on Deno for this exact reason. The ability to provide granular permissions was something we were really interested in.

Deno is a really cool project, imo.

comprev · on Nov 2, 2022

Interesting read, thanks.

rollcat · on Nov 2, 2022

Check out OpenBSD's pledge(2): https://man.openbsd.org/pledge.2

It does exactly that (although on a per-process basis).

I don't think this kind of permission system can be retrofitted into an existing language without direct OS support, and probably not at the library level (you'd need something like per-page permissions which would get hairy real fast).

dilawar · on Nov 3, 2022

I think @jart has been porting it to Linux https://justine.lol/pledge/ .

jart · on Nov 3, 2022

Indeed! If your dependencies are able to be command line programs that are shell scripted together, then you can in fact have an access policy on a per-dependency basis, using the pledge.com program linked on my website. So shell scripters rejoice.

But it gets better. If you build Python in the Cosmopolitan Libc repository:

    git clone https://github.com/jart/cosmopolitan
    cd cosmopolitan
    build/bootstrap/make.com -j8 o//third_party/python/python.com

Then you can use cosmo.pledge() directly from Python.

    $ o//third_party/python/python.com
    Python 3.6.14+ (Actually Portable Python) [GCC 9.2.0] on cosmo
    Type "help", "copyright", "credits" or "license" for more information.
    >>: import cosmo, socket
    >>: cosmo.pledge('stdio rpath wpath tty', None)
    >>: print('hi')
    hi
    >>: socket.socket()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/zip/.python/socket.py", line 144, in __init__
        _socket.socket.__init__(self, family, type, proto, fileno)
    PermissionError: [Errno 1] EPERM/1/Operation not permitted

Since we didn't put "inet" in the pledge, you can now be certain your PyPi deps aren't spying on you or uploading your bitcoin wallet to the cloud. You can even use os.fork() to rapidly put each dependency in its own process, then call cosmo.pledge() afterwards to grant each component of your app its own maximally restrictive policy.

Cosmopolitan Python also ports OpenBSD's unveil() system call to Linux too. For example, to disallow all file system access, just call cosmo.unveil(None, None). You need a very recent version of Linux though. For instance, I use unveil() in production on GCE but I had to apt install linux-image-5.18.0-0.deb11.4-cloud-amd64 in order for Landlock LSM to be available to use unveil().

gjadi · on Nov 3, 2022

I thought the thread model of pledge/unveil was to restrict a program that you are writing, but that you couldn't wrap around other program in a safe way.

That is, you can protect your own program from doing network stuff because of incorrect input, but you can't use it to sandbox another program.

See this thread: https://marc.info/?t=162367803300003&r=1&w=2 and this mail about sandboxing: https://marc.info/?l=openbsd-tech&m=162367954705721&w=2

jart · on Nov 3, 2022

You can, if you use pledge() and unveil() on Linux. SECCOMP and Landlock use a monotonically decreasing permissions model. It's inherited across exec(). This is a good thing. OpenBSD devs don't need it because they built their own hermetic system. They're more afraid of having their servers compromised remotely than they are of programs they've installed locally. The tradeoff is you can't use pledge() and unveil() to build your own SSH server on Linux, since SSH needs to shed restrictions when launching a shell. But the benefit is you can safely leverage more code written by strangers on the Internet, which is what Linux is all about.

mrlemke · on Nov 3, 2022

I thought of OpenBSD too since it includes Perl modules for OpenBSD::Pledge and OpenBSD::Unveil. Now I'm wondering if I can get something to work where these are used before importing CPAN modules to reduce the damage of potentially hostile modules.

musicale · on Nov 2, 2022

I like this. I'd try to keep the permission sets as small, limited, and simple as possible though.

rollcat · on Nov 2, 2022

> I'd try to keep the permission sets as small and simple as possible though.

You've described OpenBSD in general. I recommend a deeper dive - it's fantastically refreshing, how simple yet functional an OS can be.

paulmd · on Nov 3, 2022

Same for FreeBSD. Incredibly code-stable and well-documented by Linux standards.

I often use the FreeBSD Handbook as an example of first-party documentation done right, and that's only possible because of deliberately limited "churn for churn's sake". The kinds of regular code rot and attrition that Linux suffers just does not take place on BSD systems because if you contribute something new you're expected to make your thing mesh with the ecosystem, they don't tolerate the "I'm gonna churn everyone's environments because my version is 10% better than the existing one" that tends to take place on Linux.

As an example, the amount of init systems that Ubuntu has gone through over the last 20 years is completely insane by BSD standards. They've gone from sysvinit to upstart to systemd. You don't have people ripping out the graphics and audio subsystems and doing total rewrites, either. It's nuts that a lot of Linux people don't even realize that it doesn't have to be that way - that's not "just how maintenance is supposed to work", that's a bunch of incredibly bad behavior on the parts of distros and Poettering specifically that has become highly normalized and overlooked. Maintenance shouldn't be breaking things like that. "we never break userland" doesn't have to be a suicide pact either - BSD has actually maintained a much more stable ABI than the linux kernel without crystallizing a bunch of shitty bug-compatibility stuff like Linux is doing either. It's all insanely stable and competent and professional by linux standards.

If I was developing a product for true long-term support, with the minimum possible engineering effort devoted to fighting churn, BSD would be at the top of the list.

washadjeffmad · on Nov 3, 2022

What attracted me to Linux so long ago is what ultimately drove me away. I now apply it only as needed because it's no longer simple ("stable" over the timescales I'm interested in) as a project, and it's not moving towards any type of consistency.

If you read "rolling stones gather no moss" to mean "keep moving or risk becoming obsolete", choose Linux. If you understand it as "change for change's sake prevents the achievement of mastery", go with a BSD.

CoolCold · on Nov 3, 2022

And with all that beauty, FreeBSD is not something I would [nowdays] look into as base OS for hosting my services and products. It means something, probably something about humans.

rollcat · on Nov 3, 2022

Netflix uses FreeBSD for their CDN/edge caches - it was consistently more performant both in benchmarks and with real-world workloads.

CoolCold · on Nov 4, 2022

Yeah, that's quite interesting reading from them, some sort of specialized appliance really.

I'm and the average Joe around me, totally far from Netflix's task of packing bytes from disk to network. Simple 2vCPU VPS serving 4GBit without being saturated on system resource level is quite often much more than enough. Extra note - it's not even using kTLS.

Moreover, even for Netflix, noting they know FreeBSD in and out, do you think/have info on using FreeBSD as base OS beyond distribution level - running applications/services in particular?

I've quickly checked on their repos like https://github.com/Netflix/conductor and it smells like they use containers/Docker, which doesn't work on FreeBSD => I'm in very much doubts it's OS of choice for them.

bernawil · on Nov 3, 2022

A resource access model for dependencies doesn't make much sense to me, there's basically only 2 things you want to gate access for libraries: filesystem and network. And it's all-in. A library that needs network access may be legit today and after an update start exfiltrating data to a different url. It seems easier to grep for fs and network calls in the library code than any of that.

jononor · on Nov 4, 2022

Restricting a process to only be able to access opt-in list of directories underneath the project directory would be useful. Assuming one uses a venv, and all the dependencies are contained there. Then one might want some data folder. And have at least prevented dels from scraping user-wide secrets.

bernawil · on Nov 5, 2022

You're describing a chroot jail. key there is "process". Dealing with processes permissions is the OS's job.

A if a language wants to deal in library's security it should strive to make static analysis possible. Eg: the language guarantees that network and filesystem calls can only be done with a single function, statically so I can audit that leftpad indeed doesn't make network calls .

jjav · on Nov 2, 2022

This is essentially the fine-grained control the Java Security Manager enabled. But hardly anyone used it and it was deprecated sadly.

charleslmunger · on Nov 2, 2022

And it didn't really work in practice. Every callback or thread hop is a security vulnerability by default.

louislang · on Nov 2, 2022

This is one of the projects we're working on (and open sourcing)!

Currently allows you to specify allowed resources during the package installation in a way very similar to what you've outlined [1].

The sandbox itself lives here [2] and can be integrated into other projects.

1. https://github.com/phylum-dev/cli/blob/main/extensions/npm/P...

2. https://github.com/phylum-dev/birdcage

ryukafalz · on Nov 3, 2022

I broadly agree, but if I'm reading your suggestion correctly I think that "access" list is too coarse-grained still! It looks like you're suggesting a predefined list of permissions that can be granted to a dependency, but... why not go even further? If the list of things you're passing in are references rather than strings, then you could do...

    const apollo = require('apollo-client', {fetch});
    const stringutils = require('stringutils');

That way, you can pass in at a very fine-grained level any object you want to, including things like `fetch` and `fs`. But then, you could just as easily pass in a custom `loggedFetch` module like so, that wraps `fetch` and logs all network requests made:

    const apollo = require('apollo-client', {fetch: loggedFetch});

This is the object-capability security model, and as a sibling commenter pointed out, LavaMoat basically does this for JavaScript dependencies. (This is also basically just lexical scope, by the way, if your language is strict enough - though most unfortunately aren't.)

FractalHQ · on Nov 3, 2022

I love this idea. But what about peer deps? And their dependencies? What if multiple deps share peer deps? Is there a simple way permissions could be resolved in the web of modules?

Also- tangential but I really wish people would stop using require and use ESM whenever possible. I cringe when I see `require` just as much as when I see `var` being used in 2022, but I’m not certain there’s never a good reason not to use ESM.

nirimda · on Nov 3, 2022

If you want to override the dependency of your dependency, then it's your dependency too - so if you were using the javascript approach up there you might have

    var bar = require("bar", {baz: loggingBaz})
    var foo = require("foo", {bar})

This is true even if you never otherwise use bar.

This is certainly how peer dependencies work in nix flakes, for instance, where you can say something like

   bar = //....
   bar.inputs.baz.follows = loggingBaz;
   foo = //....
   foo.inputs.bar.follows = bar;

Since nix flakes is a package/dependency manager, it generates a lockfile that you could inspect too see what all the dependencies are, and make sure you got it right (oh, I still have two bar's, I must have forgotten to override some other dep). I suppose any compiler that involves a linking stage would in principle be able to generate some comparable output at the language level.

(Apologies for using require and var, but I'm convinced the functional syntax is semantically clearer in this context.)

Legogris · on Nov 2, 2022

For JS, You are basically talking about Lavamoat. It provides tooling and policies for SES, which aims to make it into standards.

https://github.com/LavaMoat/LavaMoat

https://github.com/endojs/endo/tree/master/packages/ses

p1necone · on Nov 2, 2022

You could remove the need to explicitly specify the permissions somewhat by enforcing semver for permission addition.

If you own a package at version 1.X.X and you want to add a permission requirement you have to bump the version to 2.0.0. If you also allow people to opt in to a less strict "auto allow all the currently required permissions for these dependencies" mode, they would at least know for sure nothing can touch anything new unless they explicitly bump the major version.

If you're extra concerned about security you can explicitly specify them so it's really obvious when a major version bump adds new ones, but it removes some of the friction.

nicoty · on Nov 2, 2022

I watched a video[0] about the Roc language recently, and they do something interesting to address this: they have a layer in their language called "platforms" and the idea behind these are that there are many different platforms that you can choose between to run code with and each one has different permissions. So one platform might be sandboxed and disallow the use of certain unsafe APIs whereas another might be less sandboxed.

[0] https://m.youtube.com/watch?v=cpQwtwVKAfU

dheera · on Nov 2, 2022

Another thing I think might help is

(a) Discourage any future use of ">=" in version dependencies. Specify an exact version. That way a future compromised version doesn't get pulled

(b) Every build system needs better ways of having multiple versions of a same dependency coexist. I should be able to have one of my project's dependencies depend on "numpy==1.15" and another dependency depend on "numpy==1.16" and they should be able to coexist in the SAME environment and "see" exactly the numpy versions they requested.

For python we should think about how to support something like this in the future:

    import numpy==1.15

and have it just work.

That way if a hacker compromises PyPI and releases a malicious numpy 1.19 it won't get pulled in accidentally.

Here's a bit of a joke I made before that might be an interesting starting point, though since it uses virtualenv behind the hood it doesn't have a way for multiple versions of one package to exist. I don't think it's impossible to do though with some additional work.

https://github.com/dheera/magicimport.py

Sample code:

    from magicimport import magicimport
    tornado = magicimport("tornado", version = "4.5")

MikePlacid · on Nov 2, 2022

> I should be able to have one of my project's dependencies depend on "numpy==1.15" and another dependency depend on "numpy==1.16" and they should be able to coexist in the SAME environment

Now I see what people at one respectable, big project were thinking when they allowed 7 different versions of OpenSSL to be statically linked to the same executable…

Seriously, this idea may save you from some not very interesting work, but it will create the need of much bigger amount of work which while potentially interesting is not very productive. You are toying with exponential growth here, like a chain reaction - like a bomb.

cozzyd · on Nov 2, 2022

Would you run into dynamic linker problems in this case due to symbol conflicts? Or does symbol versioning magically resolve that somehow?

dheera · on Nov 2, 2022

Prefix all symbols with versions?

foo() becomes v1_15_4_foo() automatically

ciupicri · on Nov 2, 2022

There were some attempts made for Python 10 - 15 years ago and the conclusions were that it's very hard to do it right. If I remember correctly Zope was using some sandboxing and because of it, it took a while time to catch up with newer versions of Python. You had to compile you own Python because the one that came with the Linux distribution was too new. Also If I'm not mistaking I think that PyPy has some sandboxing.

Anyway I'll leave this from 2013 here:

> After having work during 3 years on a pysandbox project to sandbox untrusted code, I now reached a point where I am convinced that pysandbox is broken by design. Different developers tried to convinced me before that pysandbox design is unsafe, but I had to experience it myself to be convineced.

> It would also be nice to help developers looking for a sandbox for their application. Please tell me if you know sandbox projects for Python so I can redirect users of pysandbox to a safer solution. I already know PyPy sandbox.

-- https://mail.python.org/pipermail/python-dev/2013-November/1...

mdavidn · on Nov 3, 2022

In a post-Spectre and post-rowhammer world, I have my doubts that any sub-process security boundary will prove durable in the long run.

insanitybit · on Nov 2, 2022

I've been messing around with some ideas.

1. `autobox` (to be renamed lol) [0]. It's basically a Rust interpreter that performs taint and effect analysis, reporting on both, allowing you to use that information to generate sandboxes. ie: "autobox sees you used the string '~/.config' to read a file, and that is all the IO performed, so that is all the IO you get".

2. I'm working on a container based `cargo` with `riff` built in that aims to work for the vast majority of projects and sandbox your build with a defined threat model.

The goal is to be able to basically `alias cargo=cargo-sandboxed` and have the same experience but with a restricted container environment + better auditing of things happening in the container.

3. I previously built a POC of a `Sandbox.toml` and `Sandbox.lock` with a policy language that allowed you to specify a policy for a given build step. Unfortunately, I couldn't decide on how I wanted it to work in terms of "do I generate a single sandbox for the entire build, or do I run each build stage in its own sandbox" - there are tradeoffs for both.

Here's a lil snippet:

    [build-permissions.file-system]
    // All paths are relative to the project directory unless they start with `/`
    "../" = {permissions = ["read"]}
    // "$target" being a special path
    "$target" = {permissions = ["read", "write"]}
    // Source this path from the environment at build time, `optional` means it's
    // ok if it isn't available
    "$env::PROTOC_PATH" = {permissions = ["read", "execute"], optional=true}
    // Default protobuf installation paths, via regex
    "^(/usr)?/bin/protoc" = {permissions = ["read", "execute"], regex=true}

Once I'm done with (2) though I think I'll tackle (3).

`autobox` is fun but I think it may be impractical without more language level support and no matter what I'd end up having to implement it in the compiler at some point, which means it would be unusable without nightly or a fork.

I'm going to try to wrap up an autobox POC that handles branching and loops, publish it, and see if someone who does more compilery things is willing to pick it up. As for (2) and (3) I believe I can build practical implementations for both.

[0] https://github.com/insanitybit/autobox/

louislang · on Nov 2, 2022

This is really cool work! Also a fan of Grapl.

insanitybit · on Nov 2, 2022

:D Thanks!

nijave · on Nov 2, 2022

I saw a similar proposal (I think with JavaScript/node) not too long ago that deacribed limiting packages to data in their own namespace. For instance third-party-dep-a would only have access to data it created or was passed in versus indiscriminately accessing anything in the language VM. Even this would be a good step in the right direction although you'd likely still need something like you e described for accessing shared system resources (aka the mobile phone security model)

berniedurfee · on Nov 2, 2022

Yep, a declarative mechanism would be nice like OAuth scopes.

Though, like scopes, I think many times packages would need broad access, but maybe not?

ransom1538 · on Nov 2, 2022

It's only a matter of time, until, someone with some cash + a good connection to a package just does what is going to happen.

rileymat2 · on Nov 2, 2022

I thought that Java Applets (and maybe flash, I am less familiar) had an advanced security model, but it was exploit after exploit because of the huge attack surfaces?

I suspect you may run into similar sandbox escapes once things are complicated enough. So it seems like a good idea if they can be made bug free, but good luck with that?

insanitybit · on Nov 2, 2022

Part of the problem with the Java sandbox is that it was enforced entirely by the VM + the VM is written in C++. The idea is not inherently bad.

nl · on Nov 2, 2022

It's been a while since I worked in this area but my recollection was that most JVM security issues in this areas were bypasses of the Java Security Manager often by confusing it about code origin. That's all Java code, not C++.

int_19h · on Nov 3, 2022

For both Java and .NET, there were actual verification bugs, as well - when bytecode that's not supposed to be valid gets past the verifier and results in e.g. mistyped references (which can then be used for all kinds of creative vtable abuse). Sometimes it can even be a bug in the VM spec itself, because implications of two different features interacting weren't fully considered.

But, yes, we've tried this idea many times now, and it never held up for long.

insanitybit · on Nov 2, 2022

It's been so long I could be remembering incorrectly.

erustemi · on Nov 2, 2022

Maybe the issue was with how powerful and unrestricted reflection was in java before introduction of modules.

pjmlp · on Nov 3, 2022

Java and .NET had it, and in both cases they eventually dropped it, because many security exploits were cause by developers not really understanding how to use them.

In the end they became yet another attack vector, and now everyone should use OS security services instead.

e1g · on Nov 2, 2022

Node.js is building something very similar: Permission Model https://github.com/nodejs/security-wg/issues/791

uncletammy · on Nov 2, 2022

Your proposed solution sounds an awful lot like a manifest file

https://en.wikipedia.org/wiki/Manifest_file

lionkor · on Nov 3, 2022

Sort of like BSD's pledge() and similar APIs[1]

[1]: https://man.openbsd.org/pledge.2

hayley-patton · on Nov 3, 2022

Newspeak?

https://newspeaklanguage.org/

0xblinq · on Nov 3, 2022

Wouldn't running your development environment inside docker provide the same safety levels?

louislang · on Nov 3, 2022

Containers aren't great security boundaries. To get the safety you'd really need, you should absolutely use a VM.

_7bxa · on Nov 3, 2022

Wasmtime / WASI does this extremely well.

lovelearning · on Nov 2, 2022

In a previous HN discussion on the topic of rogue Python packages, readers had suggested bubblewrap and firejail for sandboxing. They limit the access a script and its packages have to your filesystem and network.

I think that's the better approach - just assume all packages are malicious by default. Can't rely on scanners because of the large number of packages and attacks.

unnah · on Nov 2, 2022

That's not going to help much if code from the malicious attacker is still going to end up integrated into the software product being built.

insanitybit · on Nov 2, 2022

I disagree. I need to write this up in more detail but it helps quite a lot.

1. An attacker who can access your development environment and your production environment is worse than one who can only access your production environment. You might say "but the end goal is prod", but it's not that simple because of (2).

2. We already have very good tooling for isolating services at runtime. Separating them onto different instances, firewall/security groups, limited API keys, docker/ containers, apparmor, selinux, etc. We have a lot of tooling for "a service in production is owned". What we lack is "a library in dev environment is owned".

3. Devs often have more privileges than your services. It's unfortunate but at a lot of companies, perhaps given some lateral movement around dev envs, you'll find SSH keys to production, browser session cookies that give you console access, source code, chat sessions, internal documents, git keys, gpg keys, etc.

So I'm actually fine with a tool that sandboxes the build process but leaves open the hole of "but the attacker can patch the binary and execute code in production". That's a huge win.

lovelearning · on Nov 3, 2022

Based on my experience, tools like firejail, ebpf, and opensnitch help us keep security in the forefront, train us to verify behavior instead of trusting blindly, and even persuade end users towards that mindset through our installation steps.

If we can spot odd behavior during development and eliminate it from our stacks, the product will be more secure for end users too.

There was a time when convenience overrode any security doubts in my mind. But now I routinely use these tools to restrict access, monitor, and review runtime behavior.

berkes · on Nov 3, 2022

Ive contracted on a project where another, IMO better, solution was employed. This was ruby.

The only allowed package (gem) server was one ran by the project. This package-server scanned, vetted and manually checked any version of a lib before "publishing it".

If you wanted to e.g. upgrade a package, you'd have to do this on this server first. It would then go through some steps, -automatic scanning, risk analysis, sometimes even needing the eyes of someone from a security team. After that the package was published on this server, and you could pull it onto your dev machine and use it in CI/staging/test/prod etc. Similar steps, to get a new package listed.

IMO this is better, because it stops supply-chain attacks before they hit your code, not after they've (potentially) infected the system.

Edit: for clarity "only allowed package" wasn't enforced very strictly. A linter and CI would catch any changes to code that would want to fetch packages from elsewhere. It wasn't to protect against rogue developers, but against "stupid me, accidentally upgrading to a version that is infected" and such.

twawaaay · on Nov 2, 2022

The issue is where your tools are supposed to be generally available on a machine or when your application has access to secrets (like keystores, configuration files, log files, etc.) which is pretty much every application.

chlorion · on Nov 3, 2022

Another good option is to create a new user and run everything under the new UID. Running under a new UID has less chances of accidentally leaving something exposed that can allow for sandbox escape.

If you run everything from the new UID, it will mostly be contained to it's own $HOME directory and be unable to modify your user's files or system files. Some distros do not protect home directories from being read so it might be worth setting your actual user's $HOME to umask 0700 or whatever.

If you are using bwrap while running X11 and not running the sandbox with a new UID, the sandboxed processes may be able to escape via the X11 socket! This can happen even when you don't mount the X11 socket into the sandbox (see abstract sockets)! I think unsharing the network namespace fixes this specific issue (not 100% sure), but there are probably more subtle footguns like this.

I really suggest running Wayland with XWayland disabled, and the Wayland socket protected from the sandbox if you want to use bwrap for security purposes!

Quentak · on Nov 3, 2022

How do you escape the sandbox through a Wayland or X11 socket? Do you have specific code examples?

Is there no way to safely run graphical applications in a bwrap sandbox? I thought Wayland was supposed to be better about this.

varajelle · on Nov 3, 2022

I think Wayland is fairly safe, but any X11 client can take screenshots or listen to the keyboard, or emit keyboard event, without limitations.

chlorion · on Nov 7, 2022

I do not have a specific code example, but you can use the normal X11 client interfaces to interact with the X server, which allows a lot of dangerous things such as sending events to other clients. We can imagine a rouge X11 client spawning a terminal and entering text through a virtual input interface, to run an arbitrary command for example.

On Wayland, assuming you don't have XWayland enabled and running, it depends on the specific compositor you are using and what Wayland protocols it supports.

Sandboxing GUI stuff on Wayland requires at the very least not having XWayland running, and also requires understanding what the compositor allows clients to do by default. Some compositors may have permission dialogues that prevent clients from doing stuff that you didn't expect.

lovelearning · on Nov 3, 2022

Good insights, thank you!

quickthrower2 · on Nov 2, 2022

Do both and more. When using an unfamiliar package check it’s upload history. How far does it go back? How did I discover the package, do i trust that source? Etc.

Unless your code is never going to touch important data or resources, like for example (but not limited to) being used commercially in any vein then you can’t keep it in a padded cell forever.

cortesoft · on Nov 2, 2022

So that means you can never use any package in code that has to handle sensitive data or manipulate the host machine?

lovelearning · on Nov 3, 2022

No, it means don't trust it blindly but instead learn techniques to monitor and verify what it does. I use firejail, ebpf, and opensnitch to restrict access, monitor, and verify runtime behavior.

Where possible, persuade end users too to be equally careful. The "linux is safe" cliche blinds both us and end users to its obvious security problems, like running every script as the logged-in user with the same level of access. These malware developers know it and rely on it. That's why we need to move everybody towards a restrict-monitor-verify mindset by default.

ashishbijlani · on Nov 2, 2022

Plug: I've been building Packj [1] to address exactly this problem. It offers “audit” as well as “sandboxing” of PyPI/NPM/Rubygems packages and flags hidden malware or "risky” code behavior such as spawning of shell, use of SSH keys, and mismatch of GitHub code vs packaged code (provenance).

1. https://github.com/ossillate-inc/packj

mr_mitm · on Nov 2, 2022

There is also this, although I haven't tested it yet. The approach is interesting though. https://github.com/avilum/secimport

actually_a_dog · on Nov 2, 2022

I agree, "assume unknown, unaudited packages are malicious" is the ideal stance. However, I would say that a simple scanning approach could probably take you pretty far. For instance, if you're not using the requests module or the socket module, chances are pretty good there's no data exfiltration going on.

It's absolutely not a foolproof approach, but it is a lightweight layer that can be used in a "defense in depth" approach.

7373737373 · on Nov 2, 2022

In Python, dynamic imports exist, making this impossible

actually_a_dog · on Nov 2, 2022

I don't see how having dynamic imports matters if all you want to do is detect if a specific file is imported. Run the install and see what gets imported. That's it.

7373737373 · on Nov 2, 2022

If you actually have to execute a program (but have no safe way of doing so), to see if a complex routine that may return any filename imports a safe file or not, then you are facing up against https://en.wikipedia.org/wiki/Rice%27s_theorem

actually_a_dog · on Nov 3, 2022

So? Any method of detecting a "malicious package" faces Rice's theorem, unless you want to claim that "malicious" is a trivial property.

7373737373 · on Nov 3, 2022

Which is why any approaches relying on identity verification or scanning are bound to fail - sandboxing/capability security MUST become built into languages

ashishbijlani · on Nov 4, 2022

Are you suggesting that you would rather wait for languages to provide robust sandboxing capabilities and not use available static/dynamic analysis tools (e.g., Packj [1]) to audit packages for malicious/risky indicators, particularly when we hear about new attacks on open-source package managers almost every week?

1. https://github.com/ossillate-inc/packj [Disclaimer: I built it]

7373737373 · on Nov 4, 2022

No, I'm merely suggesting that they are not a solution to the problem, and that the fundamental issue has to be approached at the language level.

What you are building is mainly a smoke detector (and maybe a bit of a sprinkler if it takes some decisions itself), not fireproof doors (that only at install, not test- or runtime). Smoke detectors by themselves cannot prevent fires from spreading and are not completely reliable.

Analysis tools are still useful - even with perfect language-level access and resource control, packages which are given many required permissions may behave maliciously (e.g. through compromise of any component in the development or distribution pipeline), or return malicious data (which is out of scope/unsolvable at the language level). Both approaches complement each other nicely.

heretoo · on Nov 3, 2022

Another options is "nsjail". I landed on it having considered at firejail, apparmour, selinux, bubblewrap.

77pt77 · on Nov 3, 2022

What are the advantages?

I'm very frustrated with firejail since I can't for example block execution in my home directory, with the exception of one subdirectory.

It just can't be done.

hulitu · on Nov 2, 2022

[flagged]

lovelearning · on Nov 2, 2022

It's a problem with every open ecosystem where libraries can be downloaded and run. Rust, Golang, Node all have the same problem. That's why I think it's better to assume anything we download is malicious. Stuff like Bubblewrap and Qubes OS seem to be the better approach compared to relying on vulnerability hunters and scanning tools.

quickthrower2 · on Nov 2, 2022

Do both and more. When using an unfamiliar package check it’s upload history. How far does it go back? How did I discover the package, do i trust that source? Etc.

Unless your code is never going to touch important data or resources, like for example (but not limited to) being used commercially in any vein then you can’t keep it in a padded cell forever.

coffeeblack · on Nov 2, 2022

I started to develop only inside VMs, with a full Desktop, IDE, browser etc. inside the virtual machine.

There have been to many contaminations of major package repos lately. Only one typo in an import statement up the dependency chain and you’d be compromised.

louislang · on Nov 2, 2022

Full disclosure, I am a co-founder at Phylum.

We are actively working on a solution that will fully sandbox package installations for npm, yarn, poetry and others.

It's rolled up as part of our core CLI [1], but is totally open source [2]:

[1] https://github.com/phylum-dev/cli [2] https://github.com/phylum-dev/birdcage

coffeeblack · on Nov 2, 2022

Sounds awesome.

Though I’m not sure of the solution really is / should be increased sandboxing.

The alternative may be a rethinking of the increasingly smaller packages. Maybe it’s better to have few large packages maintained by reputable organisations or personalities?

louislang · on Nov 2, 2022

The problem is large, sprawling and complex. In an ideal case, we'd have high quality packages maintained by reputable people/organizations. But today this just isn't true. Open source takes contributions from a large number of unknown authors/contributors with motivations that may or might night align with your own.

We really need a defense in depth approach here. Sandbox where it makes sense, perform analysis of code being published, consider author reputation, etc.

comboy · on Nov 2, 2022

Why is there so much discussion about sandboxing? Why wouldn't I put some malicious code in the package itself limiting myself to installation only?

louislang · on Nov 2, 2022

A lot of the malware targeting developers is leveraging the installation hooks as the execution mechanism. So sandboxing the install helps stop this particular attack vector - which is why it gets talked about so much.

If you put code in the package itself, this would side step the "installation" sandbox. However we're also doing analysis of all packages introduced to the ecosystem to uncover things that are hiding in the packages themselves.

So you're right, we need a defense in depth approach here.

jasfi · on Nov 3, 2022

I think this is a very good idea. It should actually be built into the OS. I know of BSD jails, but not sure what else there is for Linux/Windows/MacOS.

lupire · on Nov 2, 2022

Virtual is part of a solution but not the key: the key is to separate your dev env from your real life/business environment -- including all your personal and professional business data and web accounts that expose your financials and private data.

If you log into your email from the virtual machine, you are at risk.

hollerith · on Nov 2, 2022

That protects me (the software developer/maintainer) to some degree, but does nothing to protect the users of the software I am maintaining.

chromakode · on Nov 2, 2022

Development should be more exploratory and experimental than prod. For the past decade I've had a similar strategy: I freely install and demo new dependencies on separate dev hardware (or a VM when I'm on the road). Then I code review (incl. locked dependencies) and deploy from a trusted environment with reduced supply chain exposure.

suchar · on Nov 2, 2022

As long as your are creating web applications then browsers are pretty good at limiting blast radius of a single attacked website. Well, at least until attacker discovers that he can inject some fancy phishing into trusted site.

With local development environment it is a bit different, because unless you are running build/test etc. in a container/vm/sandbox, then attacker has access to all of your files, especially web browser data.

orblivion · on Nov 2, 2022

I think that separation is the point of the VM. Do the dev work in the VM, don't give it sensitive info about yourself.

coffeeblack · on Nov 2, 2022

The only place I log into from the VM is Github, protected by 2FA in case any malware gets my password.

ylk · on Nov 2, 2022

The malware will just take the session cookie. Some actions still require 2FA approval, but it’s not many, iirc.

remram · on Nov 2, 2022

So the malware can delete all your projects or inject malware into them, but thankfully it won't be able to log in again later?

kibwen · on Nov 2, 2022

This is a good approach, though presumably the VM still has access to your Github credentials (via the browser) and your SSH keys? It'll limit the fallout of getting owned to anything reachable from Github (is it against Github's TOS to have multiple accounts?), less if you have 2FA (does there exist 2FA for SSH keys (I don't mean passphrases)?), but I think it would be better for just my build/run/test cycles to be cordoned off into their own universe, with a way for just the source code itself to cross the boundary.

fsflover · on Nov 2, 2022

> though presumably the VM still has access to your Github credentials (via the browser) and your SSH keys?

Not in Qubes OS:

https://github.com/Qubes-Community/Contents/blob/master/docs...

https://www.qubes-os.org/doc/split-gpg/

namaria · on Nov 2, 2022

It might be too cumbersome for most, and I might be more paranoid than average, but each project for me means a fresh VM, a new Keepass database and dedicated accounts. Then again I work mostly in ops, and I've seen first hand how badly things can go wrong so isolation and limiting blast radius takes precedence over daily convenience for me.

fsflover · on Nov 3, 2022

Why wouldn't you use disposable VMs [0] and secure inter-VM copy [1] on Qubes OS instead? It's much less cumbersome and more secure.

[0] https://www.qubes-os.org/doc/how-to-use-disposables/

[1] https://www.qubes-os.org/doc/how-to-copy-and-move-files/

bt1a · on Nov 2, 2022

Could you please share some resources/tactics for protecting your host machine from these development VMs? If I were to do this, I would want some assurances (never 100%) that my host is protected from the VM to the best of my ability.

(If it makes any difference, I would probably be using VMWare Workstation Pro)

namaria · on Nov 2, 2022

I can't give you what you're looking for. You need to decide on the trade offs for yourself. There will always be a risk. Directed attacks can get out of VMs. You could slip up and log into a personal account inside the VM.

bt1a · on Nov 3, 2022

I tried to make it clear in my reply that I understood there are no guarantees. What I’m asking is if you have any guidance on reducing the likelihood of these attacks succeeding

0cf8612b2e1e · on Nov 2, 2022

That does sound incredibly cumbersome. I suppose that means you are an ace at provisioning machines.

How do you move data in/out of the guests? I always found that part of interacting with VMs to be annoyingly painful.

namaria · on Nov 2, 2022

There are always trade offs. You do get better at things you do a lot. My mother won't use a password vault because copying and pasting is too much work for her. I'd just rather pay with my time and inconvenience than one day find out some python package I fiddled with for a late night project once means I need to call my bank.

fbhabbed · on Nov 2, 2022

SSH.

Doesn't even need to be command line, you can just open remote addresses in your favourite graphical file browser, at least under Linux.

jve · on Nov 2, 2022

> does there exist 2FA for SSH keys (I don't mean passphrases

Yes. Yubikey. ecdsa-sk key requires you to tap yubikey to have a working key. It consists of 2 parts - a private key file, but which is useless without yubikey. https://developers.yubico.com/SSH/

https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.ht...

jve · on Nov 2, 2022

Github offers fine grained personal access tokens. https://docs.github.com/en/authentication/keeping-your-accou...

Azure DevOps does it too

Riverheart · on Nov 4, 2022

Far as I know, in AzDO you can't even limit a PAT to a single project/repository. Not good for limiting access cause even a read only can see private stuff in other projects. You might create a specific account and assign to only that project but what a pain.

jiripospisil · on Nov 2, 2022

I've tried the same but the graphics performance was too slow (no GPU acceleration). The current setup is to use a virtual machine but connect to it via VS Code's Remote SSH extension from the host.

inetknght · on Nov 2, 2022

I hope you've turned off VS Code's "workspace trust" settings.

https://code.visualstudio.com/docs/editor/workspace-trust

jiripospisil · on Nov 2, 2022

Sometimes but I wonder to what degree it actually matters. Tasks, debuggers, extensions etc. run in the context of the VM, not the host. The Remote SSH extension turns VS Code into a "thin" client which presents pretty much just the UI.

https://code.visualstudio.com/docs/remote/ssh

suchar · on Nov 2, 2022

Readme says: https://marketplace.visualstudio.com/items?itemName=ms-vscod...

> A compromised remote could use the VS Code Remote connection to execute code on your local machine.

So I would say that it might be a bit harder for an attacker to gain access to your local machine, but you should not rely on it, because it's more like security by obscurity.

jiripospisil · on Nov 2, 2022

Well damn. I was under the impression that the communication channel uses/accepts only well defined VSCode specific messages related to the UI...

collinmanderson · on Nov 3, 2022

Darn. Maybe the solution is to use vs-code client in the browser? Like vscode.dev or https://github.com/coder/code-server ? It limits what keyboard shortcuts and extensions are available, but at least it's in a secure sandbox on the client side.

fsflover · on Nov 2, 2022

With Qubes, you can do GPU passthrough: https://forum.qubes-os.org/t/another-2-gpu-passthrough-post/....

weinzierl · on Nov 2, 2022

This is goid defense in depth measure but doesn't solve one fundamental issue. You might be protected during developement by the sandbox but your users are not necessarily. I think we as developers should not give any sotware we do not trust to our users.

fsflover · on Nov 2, 2022

Then you might be interested in Qubes OS: https://qubes-os.org.

orblivion · on Nov 2, 2022

That's why I chose it. A lot of peace of mind there.

ashishbijlani · on Nov 2, 2022

Packj sandbox [1] offers "safe installation" of PyPI/NPM/Rubygems packages.

1. https://github.com/ossillate-inc/packj/blob/main/packj/sandb...

It DOES NOT require a VM/Container; uses strace. It shows you a preview of file system changes that installation will make and can also block arbitrary network communication during installation (uses an allow-list).

Disclaimer: I've been building Packj for over a year now.

collinmanderson · on Nov 3, 2022

Only secures installation, not runtime, but still helpful. I'm not a package maintainer, but I do wish that packages were not allowed to run any code at install-time.

koolba · on Nov 2, 2022

> Only one typo in an import statement up the dependency chain and you’d be compromised.

Doesn’t even have to be a typo if the actual project is compromised. Like one of the 100s of NPM modules without 2FA for publishing.

Bright_Machine · on Nov 3, 2022

I'm following the same workflow. I use a Linux host and then a Linux guest with OpenGL acceleration on virt-manager. I do all my development and browsing inside the VM. I do not trust any of the npm packages or PIP packages. Any personal stuff like banking, password manager, Nextcloud goes on the host.

With modern virtio interfaces for network, disk and graphics practically giving near metal performance, there's no reason to not utilize VMs for development.

throwaway290 · on Nov 3, 2022

Similar. I run text editor on the main OS but run language server in the requisite environment with all the requirements in a container.

secondcoming · on Nov 2, 2022

This is the way.

heleninboodler · on Nov 2, 2022

This type of stuff is one reason I like vendoring all my deps in golang. You have to be very explicit about updating dependencies, which can be a big hassle, but you're required to do a git commit of all the changes, which gives you a good time to actually browse through the diffs. If you update dependencies incrementally, it's not even that big a job. Of course, this doesn't guarantee I won't miss any malicious code, but they'd have to go to much greater lengths to hide it since I'm actually browsing through all the code. I'm not sure the amount of code you'd have to read in python would be feasible, though. Definitely not for most nodejs projects, for example.

I think it's an interesting cultural phenomenon that different language communities have different levels of dependency fan-out in typical projects. There's no technical reason golang folks couldn't end up in this same situation, but for whatever reason they don't as much. And why is nodejs so much more dependency-happy than python? The languages themselves didn't cause that.

yamtaddle · on Nov 2, 2022

> And why is nodejs so much more dependency-happy than python?

Part of it—but I'm sure not all—is that the core language was really, really bad for decades. Between people importing (competing! So you could end up with several in the same project, via other imports! And then multiples of the same package at different versions!) packages to try to make the language tolerable and polyfills to try to make targeting the browser non-crazy-making, package counts were bound to bloat just from these factors.

Relatedly, there wasn't much of a stdlib. You couldn't have as pleasant a time using only 1st-party libraries as you can with something like Go. Even really fundamental stuff like dealing with time for very simple use cases is basically hell without a 3rd party library.

Javascript has also been, for whatever reason, a magnet for people who want to turn it into some other language entirely, so they'll import libraries to do things Javascript can already do just fine, but with different syntax. Underscore, rambda, that kind of thing. So projects often end up with a bunch of those kinds of libraries as transitive dependencies, even if they don't use them directly.

mdavidn · on Nov 3, 2022

It’s worth mentioning that Underscore started before browsers widely implemented the same features in standard JavaScript. Underscore is much less necessary now that Internet Explorer EoL’d.

augusto-moura · on Nov 2, 2022

The problem is the tree of dependencies you might check. Sure you can check the changes in a direct dependency, but when that dependency updates a few others and those update a few others, the number of lines you need to read grow very quickly

heleninboodler · on Nov 2, 2022

Golang flattens the entire dependency tree into your vendor directory. It's still not that big. The current project I am working on has 3 direct external dependencies, which expands out into 22 total dependencies, 9 of which are golang.org/x packages (high level of scrutiny/trust). It's really quite manageable.

tornato7 · on Nov 3, 2022

Indeed, gophers often make it a point of pride to have no dependencies in their packages.

plugin-baby · on Nov 2, 2022

> And why is nodejs so much more dependency-happy than python?

Could it be that nodejs has implemented package management more consistently and conveniently than other languages/platforms?

Scarblac · on Nov 2, 2022

That's one thing, the other is the almost complete absence of a standard library.

heleninboodler · on Nov 2, 2022

Yeah, I think this is a big one. One of the things that I have always liked about Golang is that the standard library is quite complete and the implementations of things are (usually) not bare-bones implementations that you need to immediately replace with something "prod-ready" when you build a real project. There are exceptions, of course, but I think it's very telling that most of my teammates go so long without introducing new dependencies that they usually have to ask me how to do it. (I never said the ux was fantastic :) This also goes to GP's "consistent and convenient" argument.

peteatphylum · on Nov 2, 2022

Totally agree. It feels like there is a pretty strong inverse correlation between standard library size, and average depth of a dependency tree for projects in a given language. In our world, that is pretty close to attack surface.

galangalalgol · on Nov 3, 2022

Rust is another example of this. Just bringing in grpc and protobuf gets about a hundred dependencies. Some of them seemingly unrelated. For a language aimed at avoiding security bugs, I find this to be an issue. But a good dependency manager and a small (or optionally absent) stdlib has lead to highly granular dependencies and bringing in giant libs for tiny bits.

dheera · on Nov 2, 2022

pip throws your dependencies in some lib directory either on your system (default if you use sudo), in your home directory (default if you don't use sudo), or inside your virtualenv's lib directory.

npm pulls dependencies into node_modules as a subdirectory of your own project as default.

Python really should consider doing something similar. Dependencies shouldn't live outside your project folder. We are no longer in an era of hard drive space scarcity.

cozzyd · on Nov 2, 2022

Have you seen how much space a virtualenv uses? It can easily be >1 GB. For every project, this adds up. (Not to mention the bandwidth, which is not always plentiful).

hnbad · on Nov 3, 2022

Well, npm uses a cache so it won't re-download every package every time you install it.

dheera · on Nov 2, 2022

4TB hard drives are $300 these days.

mardifoufs · on Nov 3, 2022

4tb HDDs are closer to 80$ now, but that reinforces your point :). Even SSDs are now close to 300$ for 4tb!

dheera · on Nov 3, 2022

Yeah i meant 4TB SSDs, who uses magnetic HDDs anymore lol

coredog64 · on Nov 2, 2022

As of Python 3, pip install into the system Python lib directory is strongly discouraged. ISTR that even using pip to update pip results in a warning.

That’s not to say that there’s not still some libs out there that haven’t updated docs to get with the times.

kevin_thibedeau · on Nov 3, 2022

More distros should adopt the Debian practice of installing into dist-packages and leaving site-packages as a /usr/local equivalent for pip to use on it's own.

Beltalowda · on Nov 3, 2022

It also blows up the size of your git checkouts pretty fast though.

I don't think you really gain much either; vendoring was useful before modules, but now we have modules and go.sum I don't really see the advantage. If you have "github.com/foo/bar" specified at version 1.0.4 the go.sum will ensure you have EXACTLY that version or it will issue an error in case of any tomfoolery.

GrumpySloth · on Nov 3, 2022

Vendoring also means your builds don’t need an Internet connection.

Going on a trip somewhere without an Internet connection? Checkout the repo on your laptop and go. Without vendoring: oh shoot, I forgot to download the deps, I guess I’m going to be forced into a work-life balance. With vendoring: no additional step needed after checking out the repo. The repo has everything you need to work.

Another case: repo of your dependency is removed, or force-pushed to overwriting history. You’ve lost the ability to build your project, and need to either find another source for your dependency, or rewrite it. With vendoring: everything still works, you don’t even notice the dep repo went under.

Generally, with vendoring your code is in just one place instead of being a distributed being which crumbles when any part of it gets sick.

Moreover, relying on checksums to me seems a bit overcomplicated. It’s like going to a pub and giving each drink from a stranger to a chemist for verification to make sure they didn’t slip any pills, when you could just carry your own drink around and cover the top with your hand.

Beltalowda · on Nov 3, 2022

You should have the modules downloaded to the module cache for the occasional case when you don't have direct internet access.

> Another case: repo of your dependency is removed, or force-pushed to overwriting history. You’ve lost the ability to build your project, and need to either find another source for your dependency, or rewrite it.

The GOPROXY (https://proxy.golang.org/) still contains that removed repo, and since everything is summed people can't just force overwrite it. Plus, you still have it in the module cache locally.

You can of course always come up with "but what if...?" scenarios where any of the above fails, and all sort of things can happen, but they're also not especially likely to happen. So the question isn't "is it useful in some scenario?" but rather "is it worth the extra effort?"

> Moreover, relying on checksums to me seems a bit overcomplicated.

It's built-in, so no extra complications needed.

GrumpySloth · on Nov 3, 2022

> You should have the modules downloaded to the module cache for the occasional case when you don't have direct internet access.

That’s assuming I’ve built the thing previously on that same computer. I’m talking about the common case of working on a normal desktop day-to-day and then switching to a laptop, when travelling to a place without internet (or internet of such a poor quality you might as well not bother). With vendoring I don’t need to think about any other steps than copy/checkout the repo. The repo is self-contained. Without it, I’m making the quantum leap to a checklist.

Beltalowda · on Nov 3, 2022

You need internet access to either checkout or update the repo; you can use "go mod download" (or just go build, test, etc.) to fetch the modules too. It's an extra step, but so is vendoring stuff all the time.

But like I said, it's not about "is it useful in some scenarios?" but "is it worth the extra effort?" I'm a big fan of having things be self-contained as possible but for this kind of thing modules "just work" without any effort. Very occasionally you might go "gosh, I wish I had vendored things!", but I think that's an acceptable trade-off.

jerpint · on Nov 2, 2022

I wonder why we can’t have pip packages be published by username or organization, like

    pip install google/tensorflow

It would significantly reduce the attack space

louislang · on Nov 2, 2022

npm does something similar with their scoped packages. It fixes the problem for the top level packages, but you'd still have to contend with the transitive dependencies written by smaller organizations or individual contributors. In this case, you have to guarantee that no one involved in the dependency chain ever typos anything.

collinmanderson · on Nov 3, 2022

"you'd still have to contend with the transitive dependencies written by smaller organizations or individual contributors" - generally these are the higher risk dependencies anyway and should probably be used with extra caution anyway.

jerpint · on Nov 2, 2022

This is true, and wouldn’t remove the entire space of attack, but would still limit it to some extent.

louislang · on Nov 2, 2022

Oh absolutely. Unless everyone wants to be cool and stop publishing malware, gotta take a defense in depth approach here.

blibble · on Nov 2, 2022

Maven had this 20 years ago

quite why python refuses to learn from anything that went before it I really dont know

“Namespaces are one honking great idea — let's do more of those!”

YetAnotherNick · on Nov 2, 2022

It gives false sense of security. What about google_official/tensorflow

germandiago · on Nov 2, 2022

It would still be an improvement if companies make clear what their namespace is.

nine_k · on Nov 2, 2022

What if the package is signed by a key available at google.com/pypi/key ?

Actually it should be not just a key but a whole TLS certificate, with references to a CA, activity dates, etc.

sedatk · on Nov 3, 2022

How would you know where you’d find that key?

lolinder · on Nov 3, 2022

Again Maven has the answer: if your namespace is a domain name you own, then the key needs to be available at a well-defined path on that domain (or as a TXT record in it).

comprev · on Nov 2, 2022

Perhaps something like Docker hub where "official" images are like "/_/nginx"

So "_google/tensorflow" would be official.

"google/tensorflow" would not be (plus it would be reserved by default to avoid confusion).

cozzyd · on Nov 2, 2022

google.com/tensorflow (and you'd have to prove you own google.com)

not perfect, but better.

weberer · on Nov 2, 2022

I've always been a fan of how Java packages do it where the TLD is first.

__warlord__ · on Nov 3, 2022

TIL https://www.oracle.com/java/technologies/javase/codeconventi...

permo-w · on Nov 2, 2022

one of main issues I have with java is how messy it is to import external modules. python is a breath of fresh air comparitively. introducing this kind of thing as mandatory is a step away from that

zepearl · on Nov 2, 2022

> Upon first glance, nothing seems out of the ordinary here. However, if you widen up your code editor window (or just turn on word wrapping) you’ll see the __import__ way off in right field. For those counting at home, it was offset by 318 spaces

Haha, simple & effective...

RockRobotRock · on Nov 2, 2022

The guy who runs the C2 openly has the source code for the stealer on his GitHub. Why doesn't GitHub do anything about this shit?

I've personally been hacked by a supply chain attack via a GitHub wiki link. I contacted GitHub support and didn't hear back from them for 3 months. They are completely useless.

tomatotomato37 · on Nov 2, 2022

Does GitHub actually prohibit programs that are up front about the fact they do something questionable? Considering there have been active repos for those steam pirating DLLs on the site for ages I thought they only really go after hidden maliciousness

mr_mitm · on Nov 2, 2022

Considering the entire open source pentesting community almost exclusively uses GitHub to host their projects: no. There is actual malware being hosted on GitHub, with the caveat that malware and pentest tools or proof of concept exploits are sometimes indistinguishable.

GitHub announced a few years ago that they would crack down on malware and were about to introduce some very strict T&C. After a huge backlash from the pentesters (justified in my opinion), they backpedaled a little bit. Hosting pentesting tools is fine, using GitHub as your C2 server or to to deliver malware in actual attacks is not.

RockRobotRock · on Nov 2, 2022

I completely agree, despite the wording of my comment. In this case, the user has a different GH account for hosting their malware and C2, but the fact that they're so flagrant about it is what bothers me.

I was a skid once, I get it, probably a lot of us were.

louislang · on Nov 2, 2022

They are trying. The level of effort to release these things is so low, the effort required to catch it and remove it at scale is much harder, unfortunately.

RockRobotRock · on Nov 2, 2022

Are they? I know I'm biased because this affected me and I'm still mad about it, but I just don't buy it.

I contacted them, showing the plainly obvious malicious account that was distributing malware. Two months later, they send me a generic message saying that they've "taken appropriate action", but the account and their payload was STILL THERE, they hadn't done anything. The attacker was rapidly changing their username, and honestly I'm not sure their support staff has a way of even dealing with that. I tried to explain the situation as best I could, but they were not helpful in the slightest.

jorvi · on Nov 2, 2022

I don't know what their standard for 'malicious' is, but they nuked Popcorn Time and Butter (the technological core without the actual piratey bits) from orbit until there was a huge amount of backlash.

RockRobotRock · on Nov 2, 2022

I'm not even asking them to deal with the problem "systemically" or "at scale". I just want them to respond when I am trying to stop an active criminal campaign whose goal is to steal money and cryptocurrency from people.

nomdep · on Nov 2, 2022

Talk to the FBI or any authorities, then.

I despise the idea of GitHub removing any code just because YOU (anyone) think they are criminals.

RockRobotRock · on Nov 2, 2022

Read mr_mitm's comment. I have no problem with potentially malicious code being hosted on GitHub, I think it's a good thing. Using GitHub's infrastructure for your theft campaign is clearly not okay.

MichaelCollins · on Nov 2, 2022

We're not talking about some quirky money-strapped startup. We're talking about Microsoft.

323 · on Nov 2, 2022

The standard HN answer is because freedom of speech. That the problem is the one using the code, not the code itself.

WalterBright · on Nov 2, 2022

These sorts of things is why D doesn't allow any system calls when running code at compile time, and such code also needs to be pure.

Of course, this doesn't protect against compiling malicious code, and then running the code. But at least I try to shut off all attempts at simply compiling the code being a vector.

iudqnolq · on Nov 2, 2022

I've never understood this position. How often do you add a dependency to your project, compile your project, and then never run your project ever? I can't think of a single case where this would have protected me.

WalterBright · on Nov 2, 2022

It means you don't need to run the compiler in a sandbox. People do not expect the compiler to be susceptible to malware attacks, and I do what I can to live up to that trust.

I haven't heard of anyone creating a malicious source file that would take advantage of a compiler bug to insert malware, but there have been a lot of such attacks on other unsuspecting programs, like those zip bomb files.

iudqnolq · on Nov 2, 2022

> People do not expect the compiler to be susceptible to malware attacks

I'm not familiar with D, so I'll use the example of Rust. My usual workflow looks something like this

1. Make some changes

2. Either use `cargo test` to run my tests or `cargo run` to run my binary.

In both those cases the code is first compiled and subsequently run. I care if running that command gives me malware. I don't care at what step it happens.

ratmice · on Nov 2, 2022

With rust quite often (e.g. if you are running rust_analyzer) it will run `cargo check`, to produce errors. When `cargo check` is run, build.rs compiled and run. So quite often by step 1, just opening the file in your editor before even making any changes code is compiled and run.

Walter's solution here allows the compiler to be used by the editor without the editor being susceptible. Which at the very least negates the need for a pop-up in your editor asking for permission.

iudqnolq · on Nov 2, 2022

> With rust quite often ... it will run `cargo check`

Yup. But making "cargo check" safe while "cargo run" stays vulnerable just reduces the number of times you run malicious code. And whether malicious code runs on my laptop every time I edit a file or every hour or every week makes absolutely no difference. One run and the malware can persist and run whenever it wants going forwards.

> Which at the very least negates the need for a pop-up in your editor asking for permission.

My argument is that the pop-up is security theater. I've disabled it, I don't think it should be enabled by default.

[1]: I'm handwaving slightly to get from "your code depends on a malicious library" to "malicious code is run". If I recall correctly there's linker tricks that could do that, or you could just have every entrypoint call some innocuous sounding setup function that runs the malicious code.

ratmice · on Nov 2, 2022

only if you intend to run the program though, if you want to just read the source code, perhaps to see if it contains malicious code you really don't want your editor doing such things by default, so something has to give.

chlorion · on Nov 3, 2022

Some people try to inspect code outside of a project before including it into something.

Say there is a Github repo with a crate called "totally_safe_crate", and I want to use this crate in my project, but I am not sure whether I can trust it or not. What do I do?

What I would likely do is clone the repo, then open my editor and look through the source and whatnot and make my decision.

In this case I never intend to run "cargo run" at all, but I may want to run an LSP server to help inspect the code, or I may accidentally enable LSP in the editor out of habit or something.

In this case, it would be nice if I could be certain that simply inspecting and reading code was safe, but as it stands now, in Rust, this is not the case. We can't even inspect code to make sure it's safe unless we open the source files in a "dumb" editor.

In your example you are just adding a crate to cargo.toml and firing off cargo, so of course it's not going to be useful there. Some of us may want to be more cautious than that and actually read code before putting it in our projects.

daedalus_f · on Nov 2, 2022

Perhaps it’s about responsibility. It’s not the compilers fault if you chose to compile and run malware. But you could blame the compiler if it ran malware during the compilation process.

iudqnolq · on Nov 2, 2022

All else equal I'd agree. But I'm perplexed why people spend a lot of effort on what seems to me like a purely philosophical benefit.

WalterBright · on Nov 2, 2022

It's not philosophical. All people who write programs that consume untrusted data should be actively trying to prevent compromise by malware.

iudqnolq · on Nov 2, 2022

In general, I agree. I think developer tools are a special exception because there are so many gaping vulnerabilities inherent to it it's meaningless.

I think of that kind of thing as the equivalent of "your laptop won't be vulnerable on odd-numbered days". That'd be a great plan if there was a pathway to going from there to no vulnerability. If that was the low-hanging fruit and you're stopping there it's a complete waste of time.

rnk · on Nov 2, 2022

It's just address part of the problem, which of course is why it seems somewhat pointless. I need to:

1. Install packages/deps/libraries etc safely

2. Run code that includes those libraries that limits their capabilities centrally.

WalterBright · on Nov 2, 2022

> It's just address part of the problem, which of course is why it seems somewhat pointless

I cut my teeth in the aviation industry, where the idea is to address every part of the problem. No one part will fix everything. Every accident is a combination of multiple failures.