>both the pypi devs and the python security team decided to ignore it. Without t...

danjoc · on Sept 15, 2017

When I read things like,

https://caremad.io/posts/2013/07/packaging-signing-not-holy-...

I get the impression they don't care about security at all. They seem like children plugging their ears and shouting "nah nah nah" while putting repo users at risk. They've obviously done nothing since that post was made four years ago.

In contrast, Maven central requires signing. Unsurprisingly, Maven central doesn't have typosquatting problems. That's not a coincidence. That's also a strong reason why Java still dominates the enterprise.

If you (the reader) are a PyPi/NPM user, I challenge you to watch this,

https://www.youtube.com/watch?v=pBJafU0p_Nk

and tell me why you shouldn't use a repository manager like Sonatype Nexus which validates package signatures, checks licenses, and does vulnerability scanning.

By itself, package signatures aren't the holy grail? That's not the point. Security is achieved in layers. Too bad they aren't mature enough to understand that.

tclancy · on Sept 15, 2017

>I get the impression they don't care about security at all. They seem like children

I had the absolute pleasure of working with dstufft for 6 or 7 months years ago and learned a ton from his breadth of knowledge and willingness to teach (and I had more than a decade of experience at the time). I can appreciate being unhappy with the situation and not agreeing with the approach, but personal attacks and assuming ill intent don't help. It is, of course, possible the developers actually do pay attention to the details and just see the issue better than you do.

I read that link at the time and I do wonder how you would address the issues he raises. The concern seems not so much not wanting to apply security but actually thinking about the underlying issues. Which is worse: an open but dangerous Pypi or one with a bunch of security theater around keys and hashes and other stuff everyone pays lip service to but never actually checks?

cookiecaper · on Sept 15, 2017

Typosquatting and package signatures are separate issues. Package signing only prevents typosquatting insofar as either the user or some intermediate layer resolves the typo to the intended package. If someone was going to this effort, they'd probably go to the effort of double-checking the package name before installation anyway.

PyPi needs moderators to sit in the middle and remove anything that is obviously malicious, whether the packages are signed or not. Bad guys can sign packages just as easily as good guys.

Software should also be used to correct likely typos, perhaps including checking against a blacklist of known-bad package hashes, before the package is installed.

Yes, these approaches are imperfect, but they are better than doing nothing. "Perfect is the enemy of good".

xapata · on Sept 15, 2017

> needs moderators

Like Anaconda and Enthought? And countless internal departments? Or are you suggesting folks donate to the PSF and they hire a team?

cookiecaper · on Sept 16, 2017

I'm suggesting that whoever currently has admin rights on the PyPi packaging servers, and it's someone, take responsibility for this and physically remove the typosquatting libs from the lookup mechanism. "We need donations before we can do that" doesn't pass muster as far as I'm concerned; leaving this unaddressed is an existential issue for PyPI.

There are already privately-maintained repositories and that's great, but IMO it's not an excuse for PyPi to leave this vulnerability open.

xapata · on Sept 16, 2017

Ah, so you're going to refuse to use PyPI until someone volunteers. I'm not certain that's an existential issue for PyPI. I plan to continue using it.

zzzeek · on Sept 15, 2017

> In contrast, Maven central requires signing.

forgive my ignorance and my lack of 56 minutes to watch the entire youtube video, but who are the identities behind these signatures? The blog post you reference discusses the problem both of users signing their own packages (anyone can make a signature and any malicious package author can point people at a maliciously-owned signature as well) as well as having a central key (used by organizations with employees and known contributors, does not scale to pypi's model).

I'm also ignorant of a vulnerability scanner for Python (haven't looked). Does such a tool exist and have you proposed it as part of pypi's infrastructure? I am sure they'd be interested in that.

I'm not sure how the license file of a product impacts the issue of it being malware or not.

Took a look at https://www.sonatype.com/ and it appears to be a closed-source, commercial product - it appears to have a database of vulnerabilities in some format, but it appears to use hashes of some kind. I'm not sure how that would work against arbitrary Python source code, but again, I am ignorant. I would encourage you to write a comprehensive rebuttal to the blog post you refer towards.

danjoc · on Sept 16, 2017

>who are the identities behind these signatures?

https://maven.apache.org/guides/mini/guide-central-repositor...

"we require you to provide PGP signatures for all your artifacts (all files except checksums), and distribute your public key to a key server like http://pgp.mit.edu."

>anyone can make a signature

The article flip flops on this.

any hacker can do it

it's too much burden for developers

>any malicious package author can point people at a maliciously-owned signature as well

Anyone can also verify ownership of the key before accepting packages signed by it. This is something professionals do. This is something institutions do. This is something three letter agencies do.

>Does such a tool exist

There are CVEs for python. One could even scan a repository using those. Java has prebuilt tools for this. OWASP has the dependency-check plugin for Maven. Nexus uses the same information in their repository health checks.

>have you proposed it as part of pypi's infrastructure? I am sure they'd be interested in that.

Why would I? Given their response to signed packages, I would expect a response along the lines of "Too much burden. Too hard. Not perfect. Not worth it. Security theater. Go away. Ur dumb."

>I'm not sure how the license file of a product impacts the issue of it being malware or not.

It's one of those nice features of good repository management. Do python packages even list licenses? I mean, I assume they would, but then, they actively resist implementing other basic things which I would just assume they could do.

Licenses change over time. Some enterprises treat GPL like a virus. Knowing ReactJS changes from Apache to BSD + Patents in a new version is as important to someone in the business as knowing if a package is compromised.

>it appears to be a closed-source, commercial product

Nexus OSS is open source, Nexus Professional is commercially licensed. The later has a few nice features the former does not. Both can manage PyPi, NPM, Ruby, Docker, Maven, and Nuget repos to name a few.

https://www.sonatype.com/nexus-repository-oss

>I would encourage you to write a comprehensive rebuttal to the blog post you refer towards

It's easier to fool people than to convince them that they have been fooled. -- Mark Twain

zzzeek · on Sept 16, 2017

> Do python packages even list licenses?

Of course they do and it goes into the package classifiers.

Ajedi32 · on Sept 18, 2017

> https://caremad.io/posts/2013/07/packaging-signing-not-holy-...

Thanks, that's actually a great article that explains very well why you can't just throw signatures at the problem and claim that fixes everything.

As other commenters have pointed out, the reason Maven central doesn't have this problem has nothing to do with signatures, and everything to do with the fact that all new packages must undergo manual review, which is unfortunately a solution that doesn't scale. (See the "Linux Has Packaging Signing, Let’s Steal Theirs" section from the article you linked.)

donaldstufft · on Sept 15, 2017

Package signing doesn't achieve anything without a trust model behind it, which is exactly what that post states. Too many people go "we need to add some crypto to this thing!" without developing a threat model and that ends up making the crypto pointless wankery to act as a security blanket without actually solving any problems.

Maven Central, to my knowledge, does not have typo squatting problems because Sonatype has a manual review process for all new projects. It has absolutely nothing to do with the fact that they allow projects to upload PGP signatures and it could not have anything to do with that, because PGP does not provide any mechanism to prevent that.

For example, there may be `urllib3` which is a valid project that must be signed by key X. We'll ignore how a tool like pip would find out that key X is the right key (although this is actually the most important part of a package signing solution) and just grant that we've solved that problem. Someone then comes and registers another project, `urlib3` which must be signed by key Y. The attack that is being described here is that a user would erroneously say ``pip install urlib3`` when they meant to type ``pip install urllib3`` and pip would then fetch that and download the package and install it. I think it is pretty obvious that signing doesn't help here, because pip doesn't know that the user really wanted urllib3 and not urlib3, so it can only determine that urlib3 is supposed to be signed by key Y (which of course, the hypothetical malicious person controlling urlib3 would have), fetch the package and verify it's signature.

There is only one tried and true method for preventing across the board this kind of human introduced error collision (aka typo squatting), and that is manual review of all new projects. The problem with manual review then becomes one scale. There are as of this time of writing 117,226 unique projects on PyPI with an average growth of around 100 new projects a day. In addition there are zero full time developers or operations or support people working on PyPI. There is one part time paid person (me), plus my unpaid time, plus one other part time unpaid developer/ops person who do the vast bulk of the work. There is simply not enough available bandwidth to process 100 new projects every day and to validate them for typo squatting/confusion possibilities.

Beyond that, there are a number of possible heuristic based approaches that can try to reduce the chance of this from happening such as using levinstein distance, unicode confusables, attempting develop "reputation", etc. Most of these are either so broad as to catch a lot of projects which are not typo squatting but are real, actual different things or are so narrow as to be trivially defeated. That's not to say they aren't worthwhile or there isn't an idea that would make sense but focusing on that has not been a priority for a largely volunteer based organization because there are lower hanging fruit that are more impactful , because at the end of the day without a manual review system individual end users are still ultimately responsible for ensuring they're asking for the correct thing (and even beyond that, they're responsible for ensuring that the thing they're asking be installed is something that satisfies their own security constraints).

Security is achieved by layering multiple secure systems on top of each other, not by randomly rubbing crypto on things because it makes you feel good to have crypto involved.

danjoc · on Sept 15, 2017

>For example, there may be `urllib3` which is a valid project that must be signed by key X. Someone then comes and registers another project, `urlib3` which must be signed by key Y.

Key X is on the company approved key list, key y is not. Your argument just fell apart.

>The problem with manual review then becomes one scale. There are as of this time of writing 117,226 unique projects on PyPI with an average growth of around 100 new projects a day.

You're not dealing with projects. You're dealing with keys. It's not one key per project. It's one key per contributor. This has the added bonus that if a contributor goes rogue, you can revoke the one key and all the suspect projects are are invalidated at once.

>There is one part time paid person (me), plus my unpaid time, plus one other part time unpaid developer/ops person who do the vast bulk of the work.

Sonatype has turned this into a rather nice business. It's not a volunteer project for them. You expect me to believe it's impossible despite solid examples to the contrary?

>at the end of the day without a manual review system individual end users are still ultimately responsible for ensuring they're asking for the correct thing

Blaming the victims.

>Security is achieved by layering multiple secure systems on top of each other, not by randomly rubbing crypto on things because it makes you feel good to have crypto involved.

It's also not achieved by doing absolutely nothing at all.

geofft · on Sept 15, 2017

> You're not dealing with projects. You're dealing with keys. It's not one key per project. It's one key per contributor.

My rough guess is that for the Python community, these are roughly proportional; there are a lot of different people maintaining approximately one library each, not a small number of people (or companies) maintaining large parts of the ecosystem. There's nothing directly like org.apache for Python.

danjoc · on Sept 15, 2017

And yet in every typo squatting case, there's multiple projects leading back to a single contributor. It's almost like noticing a little known nobody sneaking to the front of the Pareto distribution would be a huge red flag.

geofft · on Sept 15, 2017

By "every typo squatting case" do you just mean researchers demonstrating the viability of the attack against various systems? A system that successfully defends against researchers but not against actual genuine attackers would be worse than useless. If I actually wanted to pull off an attack without anyone noticing for as long as possible, I'd just target a single package whose maintainer is on vacation.

I think the only way your key-signing mechanism would actually solve the problem is if we made it actively hard for new developers to upload projects to PyPI without a long vetting process. Some projects work this way (Debian, notably; I've had upload rights for a few Debian packages for years and still don't feel ready to apply for full access), but I think it's a poor match for PyPI's actual goal.

danjoc · on Sept 15, 2017

Your arguments and exaggerated claims are silly. I can point to Maven Central all day long. They're doing it right. They don't have these problems.

You know who does this sort of thing? Politicians. They can't just look at a working system, single payer for instance, and copy it. No, they have to make silly arguments about why it will never work, despite a concrete, working example, right in front of their own eyes.

takluyver · on Sept 15, 2017

Donald already pointed out that the key difference in Maven Central is a manual review process, not package signing.

If Python introduced manual review of new packages, it would either need a massive amount of resources that no-one is offering to provide, or it would immediately be a huge bottleneck on people making new packages, which the community doesn't want to do.

danjoc · on Sept 15, 2017

>the key difference in Maven Central is a manual review process

Lipstick on the pig, still covered in mud.

The key difference is the regular occurrence of malware finding its way into PyPi and NPM due to the lack of multilayered security on those repos.

You guys keep trying to prop up the straw man that ONLY package signing is needed. It's not. It's a start. Nobody is making that argument but you. You not only repeatedly beat that dead horse, but you carry it to the illogical extreme that package signing is somehow harmful. Not only do you see no value in that layer of security, but you actively resist any talk or attempts at implementing it.

Meanwhile, your repo is infested with hackers and malware. Big surprise.

xapata · on Sept 16, 2017

Hyperbole and insults. Now you're just trolling. If I see a cockroach in the kitchen, I kill it and spray. I don't rip out the walls or move house.

takluyver · on Sept 15, 2017

That sounds like you're relying on blacklisting the 'bad guy' key that's uploading all the malicious packages. Any half-way competent bad guy will generate a new key for each package (or try to steal keys with some good reputation), so it won't work.

zzzeek · on Sept 15, 2017

> You're not dealing with projects. You're dealing with keys. It's not one key per project. It's one key per contributor. This has the added bonus that if a contributor goes rogue, you can revoke the one key and all the suspect projects are are invalidated at once.

you cannot locate said rogue contributor without regularly manually reviewing 117,226 packages.

danjoc · on Sept 15, 2017

>you cannot locate said rogue contributor without regularly manually reviewing 117,226 packages.

Herd immunity. Someone is out there reviewing it. Most users won't need to lift a finger beyond verifying signatures.

donaldstufft · on Sept 15, 2017

> Herd immunity. Someone is out there reviewing it.

More likely everyone assumes someone else is reviewing it, and nobody actually does.

danjoc · on Sept 16, 2017

We definitely do where I work. I'm really, really surprised to hear they do not where you work. A multibillion dollar company like Amazon, who runs half the internet with AWS, does not verify dependencies? Wow. That's breathtaking.

zzzeek · on Sept 16, 2017

If 100k packages are already security audited by the community.. Then what's the issue? They send them to dstufft, he takes them down (which of course does not actually happen because nobody is auditing most packages). As mentioned elsewhere, most contributors to pypi have only one package so the notion of "find one rogue package == dozens of untrustworthy packages removed in one swoop" doesn't really exist (esp because a rogue agent would be making one account / key per package just to avoid this kind of detection!)

donaldstufft · on Sept 15, 2017

> Key X is on the company approved key list, key y is not. Your argument just fell apart.

A minuscule amount of people are going to bother to do something like approve keys. Security for the minority can already be achieved by those companies mandating their developers use DevPI and mirroring trusted projects from PyPI to DevPI (or similar system).

Complicating the system further for something that, for practical purposes, does not improve the security of the vast bulk of people is not a trade off we're willing to make. Package signing will come to PyPI, likely in the form of TUF which is strictly superior to the trust model provided by PGP for package signing. It hasn't done so because nobody has had the time to do it yet.

What you seem to be missing about my statement both in blog post and here is not that package signing is not worthwhile, but that a lot of people like yourself seem to think that all you need to do is add signatures to a system and suddenly poof it's secure! That view point is common among inexperienced developers or people who don't commonly think too hard about how secure systems are designed/made.

The reality of the situation that adding signatures is painfully easy, but that without a coherent trust model backing those signatures you've achieved nothing but adding more complexity. Determining a trust model (particularly one that works for the majority) is the hard part, and you can't just wave your hand and wish it better.

> Sonatype has turned this into a rather nice business. It's not a volunteer project for them. You expect me to believe it's impossible despite solid examples to the contrary?

Is it impossible to turn PyPI into a business? I don't suspect it is no. However I don't want to do that because my personal risk tolerance doesn't have room for giving up a stable job with health benefits for something that may or may not fail. Others are free to try that if they want of course, but given the lack of people stepping forward to do that, it doesn't seem like anyone else is interested either.

> Blaming the victims.

Stating reality. PyPI is not a curated repository and the end users is responsible for their own security while using it. If they wish to outsource that responsibility there are a number of Linux distributions that are happy to do that for them as well as companies like Enthought and Continuum Analytics who provide curated repositories.

> It's also not achieved by doing absolutely nothing at all.

Good thing we're not doing nothing at all then. Luckily for the Python community we have actual experts and not arm chair cryptographers who fail to understand even the basic fundamentals of developing secure software.

danjoc · on Sept 15, 2017

>Complicating the system further for something that, for practical purposes, does not improve the security of the vast bulk of people is not a trade off we're willing to make.

This is the weakest argument. Are Python devs somehow dumber than Java devs? Are they dumber than Android devs? Are they dumber than iOS devs? Everyone knows how to sign a dependency/app/project except python devs? I don't believe that. I honestly think that's the most insulting aspect of this argument.

The rest of this post seems to have turned to hand waving and personal attacks, so I won't bother responding to that. I'm just glad I got to share this perspective with you. Once you cool down, I hope you look harder at the problem. All I care about is improved security. I'm not here for the imaginary internet points.

donaldstufft · on Sept 15, 2017

> This is the weakest argument. Are Python devs somehow dumber than Java devs? Are they dumber than Android devs? Are they dumber than iOS devs? Everyone knows how to sign a dependency/app/project except python devs? I don't believe that. I honestly think that's the most insulting aspect of this argument.

Nope, I think they're perfectly capable of signing things. I also think it's silly to ask them to do that when the proposed system hasn't been designed to provide any benefit. Properly designing that system is hard, and 99% of people who go "just use PGP!" or "just use X" have spent exactly zero amount of time doing that. Particularly when the proposed solution doesn't actually solve the problem at hand (though it does solve other problems if it's correctly designed).

Ultimately your "suggestions" are nothing new, they're the same generic, cargo culting, suggestions that folks who haven't looked really hard at the problem tend to make.