Hacker News new | past | comments | ask | show | jobs | submit login
Removing PGP from PyPI (pypi.org)
187 points by dlor on May 23, 2023 | hide | past | favorite | 187 comments



> In the last 3 years, about 50k signatures had been uploaded to PyPI by 1069 unique keys. Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19)

Why not include the public key in the package?

99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.


> Why not include the public key in the package?

Because PyPI (or an attacker) could always substitute a new key. There's very little value in the signature and key coming from the same source: the key (and its justified identity) always need to come from a source of trust, not the source that's being verified.

> 99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.

This might be a misunderstanding, but I don't think you actually want this: lots of large packages have multiple release managers (and contributors who come and go); you don't want to manually resolve each new human identity that appears for a package distribution.

What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code, since both that service and the owner of the repository are presumed trusted.

Notably, PGP is incapable of providing either of these: you only get key IDs, which are neither strong human identities nor a strong binding to a service. Key IDs might correspond to keys with email (or other identities) in them, but that's (1) not guaranteed, and (2) not a strong proof of identity (since anybody can claim any identity in a PGP key).


>> but I don't think you actually want this: lots of large packages have multiple release managers (and contributors who come and go); you don't want to manually resolve each new human identity that appears for a package distribution.

Nope, you assume wrong. That's exactly what I (also) want, that is, knowing that the *authors* remained the same, whoever they are.

>> What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code

Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

People *really* want to mitigate the risk of pypi infrastructure getting fully compromised, which is very likely, given how many eggs you keep in the same basket there.

PGP signatures were the last ditch, not very convenient but also not as bad as they are painted. But from now on there will be not even that very little.


> Nope, you assume wrong. That's exactly what I (also) want, that is, knowing that the authors remained the same, whoever they are.

The point is that they don't remain the same. Assuming that they do is an operational error.

> Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

HTTPS provides transport security, i.e. an authenticity relationship between you and GitHub's servers. It doesn't provide artifact authenticity for the source on that server, and cannot. That's what the comment above is referring to.


> The point is that they don't remain the same. Assuming that they do is an operational error.

How many projects are signing each release with a different PGP key each time? And what are the odds that such projects will actually correct their practices as soon as pip implements key verification and make the problems more visible? A lot I guess?

It seems a lot of assumptions are being made...

But it is a self-fulfilling prophecy: the more you hide, hamper, and cripple the signature metadata, the more people will misuse it (without knowing it), which leads to these articles that argue for more crippling because people are misusing it.

The elephant in the room remains that pypi is a big target, and even though I highly appreciate the work done by maintainers (mostly volounteers?) I have a hard time believing they will always be able to keep skilled attackers away from its infra.


Nobody at PyPI is opposed to package signing, and removing or minimizing the damage that compromised infrastructure can do.

However, GPG is not a good tool to build those features on top of, and the vestigial support for GPG signing that PyPI had in no way aided the long term efforts to get proper, secure package signing into PyPI.


maybe your blog post can use a little extra line at the end that says, one of these three things:

1. "Nobody at Pypi is opposed to package signing, so long term here is the technology we want to use for this: XYZ..."

2. "Nobody at Pypi is opposed to package signing, however after years of discussion there seem to be no feasible ways of doing this, so going forward there are no plans to actually add package signing" (refer to @tptacek's post at https://news.ycombinator.com/item?id=36048373 which seems to claim there are many, IIRC)

3. "Nobody at Pypi is opposed to package signing, however we simply don't have the resources to implement any new approaches. We would require a grant of $X million dollars to hire people do do this (which would be using technology XYZ)"

is there a choice 4?


The current expectation is it will be a combination of sigstore and TUF, but if someone proposes something better then we're open to that.

Implementing those things takes time though.


And why exactly should pypi implement it?

Pypi should just be the organized repository of packages, with only some limited assurance over their authenticity. That is, pypi should just let authors upload signatures and metadata.

Something else, something to plug into pip (taking into account also the bootstrap problem), should be responsible for validating the signatures and providing assurance over identities.

People that just want TOFU will use that one plugin. People that trust Microsoft will use the github plugin. People that trust pypi also for identity, will use maybe what you will write when you have time for it. Maybe a popular plugin will allow people to choose many IdPs. Over time, the community will converge and that may be adopted as de facto standard.

But your choice of removing PGP signatures (as on type of signature) is now making that impossible, and you intend to decide in the future for others what the only blessed verification mechanism is (also, with no indication of when that will happen).


Well PGP signatures has been part of PyPI for 18 years now, if someone was going to build a secure system on top of that, they would have by now.

PyPI should implement it though, because fundamentally the question of who is authorized to release for "requests" on PyPI is a question of who PyPI authorizes to release for that.


The problem is that package signing is removed without providing an alternative. I am not volunteering to this project, so will quietly sit in the corner.


> That's exactly what I (also) want, that is, knowing that the authors remained the same, whoever they are

The authors are often many people. You can have one person signing on behalf of all the others. PGP isn't going to tell you that the authors remained the same, only that the signer did (or that many people have access to the same private key and hopefully every one of them is completely trustworthy).

PGP doesn't let you verify that the authors remained the same. Only the key. If you wanted to actually verify authors, you'd have to have all of them sign their own commits, and you'd have to validate every commit, not just the release, otherwise you're just back to trusting whoever holds the key. Many projects very regularly get new committers, too, so you'd have to validate many new signatures with every single update.

> Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

No it's not. Your HTTPS certificate will not tell you "this PyPi package release is actually built and uploaded by the same person who controls the GitHub repository linked on the package page". PyPi hosts distributions. It frequently has source distributions, but it doesn't necessarily host "source code", which would usually mean the source repository. Even with that, it's Transport Layer Security, or a Secure Socket Layer. It does not authenticate anything other than the Socket/Transport itself.

I'm fine with PGP, but most people don't really know how to use it. They add a key and think they're safe when it validates, but that only protects you if you already trust the key. PGP signing doesn't tell you "this is safe", just "this was signed by the person who has the private key for this public key", which isn't as useful without a lot of personal footwork or a trusted authority.

PGP key signing parties were a thing for a reason. Using PGP properly requires either an initial leap of trust (importing your distro's keys and trusting what they trust), a lot of dilligence (personally verifying identities), or a small amount of dilligence with a good web of trust (you sign keys that you know are good, and so does everybody you know, so a lot of what you find online you can validate through your links).


The people in charge of doing a release, with the permissions to do so, are a much smaller subset than the authors.

And PGP does support web of trust, so if the previous release guy trusts the new release guy… perhaps we could accept it as well.


> And PGP does support web of trust, so if the previous release guy trusts the new release guy… perhaps we could accept it as well.

PGP's web of trust has been broken since at least 2019[1]. GPG removed support for it years ago.

(This is a recurring problem with PGP: if you search these things, you're given the false impression that it's all still humming along.)

[1]: https://inversegravity.net/2019/web-of-trust-dead/


Web of trust based on signatures on keyservers is dead. That is not what is being suggested here.


> that's what's your HTTPS certificate is for.

Not really... That certificate doesn't go back in time. If a domain expires, an attacker could reregister it under their name and get a valid certificate.

You'd be downloading from the right domain name with a valid HTTPS certificate, but you're not downloading from the same place as before.


> That certificate doesn't go back in time.

It does, kind of, if it's pinned.


HKPK doesn't have a ton of adoption and only works in browsers. So this does nothing for curl, wget, pip


>> knowing that the authors remained the same

The problem is that "authors" is not a well-defined concept, and especially larger projects will have very regular author changes. Is the author the person who made the last commit? The person who uploaded it to PyPI? The person who is currently managing the project? What if it isn't a person but a company?

>> that's what's your HTTPS certificate is for

A lot of open source projects rely on untrusted third-party mirrors. The main server will just randomly redirect you to a mirror near you, so HTTPS certificates are pretty much useless because you are connecting to a third-party domain. They use signatures to prevent the mirror from doing weird stuff, and they guarantee that the mirror is serving the upstream content as-is.


The author is the person holding the release signing key


"The"? Multiple people may hold the key.

"Person"? The release could be part of an automated process.


> and especially larger projects will have very regular author changes

We're not checking the signature of every commit, just of the release. It is usually 1 or 2 people who do releases.


But in this case we lose one way of defining authors.


There is some security even if they provide the public key. Bootstrapping is a problem, but clients can keep track of a mapping from package names to public key and issue a warning if that ever changes. That’s how SSH and RDP works and while I’ve never had an actual security hole plugged with this, I’ve had cases where my remote machine went down so the DNS didn’t update yet the IP was reassigned so the warning about mismatching keys was actually helpful.


> There is some security even if they provide the public key.

That security is integrity, which PyPI already provides through strong cryptographic digests of each package distribution. Codesigning schemes need to provide authenticity, not just integrity; a codesigning scheme that's downgradeable to arbitrary key trust is a more complicated than necessary hashing scheme.


The problem with TOFU is that it assumes long lived keys (itself a bad practice) OR it assumes that the end user will be fine with regular notices that the keys that have signed their packages have changed, and will be able to correctly differentiate false positives from real positives.


> Because PyPI ... could always substitute a new key.

Isn't that what public key servers are for?

For publishing my FOSS to sonatype, I had to first publish my public key, eg keyserver.ubuntu.com.

I don't know PyPI, but from this OC, it sounds like PyPI does not have the same prerequisite.


Yep. Unfortunately, PGP's keyserver network has been dead for years[1]. There are two big (non-synchronizing) ones left, and they're the two I used to do the analysis that's linked in this announcement (meaning they're the ones that are largely missing well-formed keys for the signatures on PyPI).

This was discussed a bit on Sunday's thread[2], and my understanding is that Maven's ability to use PGP in this way is effectively due to Sonatype assuming a large amount of operational and maintenance burden. PyPI doesn't have those kind of resources available to it. Even assuming that the service was gifted that kind of support, it would still cause a lot of heartburn with existing signatures and carry forwards all of the legacy baggage of PGP that we're trying to eliminate entirely.

[1]: https://gist.github.com/rjhansen/67ab921ffb4084c865b3618d695...

[2]: https://news.ycombinator.com/item?id=36021172


It seems pypi should launch their own new keyserver, rather than removing PGP.

In any event, they will ask for a photo of the ID in the future. Google has already written on their security blog that this is where they're going, and from the whole google titan keys event, we know who decides on behalf of pypi.


Pypi doesn’t have the resources it needs to do its own job, they’re not going to waste more resources they don’t have on a dead-end technology they don’t have a use for.


> Notably, PGP is incapable of providing either of these: you only get key IDs, which are neither strong human identities nor a strong binding to a service. Key IDs might correspond to keys with email (or other identities) in them, but that's (1) not guaranteed, and (2) not a strong proof of identity (since anybody can claim any identity in a PGP key).

Depends. If the distributor maintains a repository of trusted public keys (for example as repositories of Linux distributions do) it gives you a guarantee. As it's said, most of the time you just want to know that the key used to sign a package is not changed. That is the same level of security that SSH offers (first time you connect to a server saves the public key, then give an error if that public key is changed). That is really enough for a package in PyPy, or sign git commits and similar.

We should ask ourself if the complexity of PGP is needed. Probably not, as it's not needed the complexity of x509 certificates, since a simple RSA signature of the package with a public key hosted on a server would be sufficient. But PGP is practical, you have a good tooling built around it, is pretty universal, so why not?


What happens if the developer looses his key? Or if it expires?

pypi could show a warning that the key has changed. Which is not an actionable or helpful warning. And then everyone gets used to seeing these warnings every now and then. And you won nothing.

Getting signatures to do something useful is hard.


> What happens if the developer looses his key? Or if it expires?

What happens if a developer loses their google titan key that is required to login into pypi?


They either have their backup codes or there's probably a manual process the pypi team can get them their account back if they can sufficiently show they are the real developers. If you have any form of automated signature verification you basically need a concept how to handle recovery. But if this concept comes down to "trust pypi", then you really can just skip the whole thing and rely on pypi giving you the right packages and https to secure the connection).


Did I just witness the invention of a kind of "software package blockchain"?

If would be btw. a proper but sustainable prove of work blockchain. As you would need in most cases to pay developers to "mint new blocks".

OK, maybe let's forget about the blockchain. It's a loaded term. But the idea of software signature TOFU sounds indeed good!


Anyone that combats something based on the name alone isn't really worth listening to. There can be great use cases for Blockchain just like this, wherein the proof of work is less taxing, or optional. Of course, HN has a rabid response towards the term alone, but these technologies actually can provide some great solutions to a more robust form of git lfs, dockerhub, or huggingface model centralization that will inevitably fail at some point.


> There can be great use cases for Blockchain just like this, wherein the proof of work is less taxing, or optional.

The prove of work part was more of a joke, I admit. Developers "mining" "software package blocks" is not really "prove of work" (even it is in some sense of course). :-)

> but these technologies actually can provide some great solutions to a more robust form of git lfs, dockerhub, or huggingface model centralization

Well, that's the Merkle trees part of the tech. You don't need any "blockchain" for that. Something like IPFS ( https://ipfs.tech/ ) is for example a nice demonstration of that.

But yes, there are useful applications of blockchains. Like https://www.namecoin.org/

But those are really rare.


You're right that Merkle trees are useful here, but adding proof of work to, e.g. run the code to produce some lfs data (database, NN weights, physical parameters, etc), which is checked by a checksum, could be a fantastic addition for ensuring reproducibility in science. IPFS could be great for reproducibility if the commit hash was linked directly to the data checksum. Blockchain may not be necessary, but it could be an easier implementation of this feature.


>Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures.

I don't know if it applies to any of those 1069 keys, but note that there is a way of hosting PGP keys that does not depend on key servers: WKD https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s... . You host the key at a .well-known URI under the domain of the email address. It's a draft as you can see, but I've seen a few people using it (including myself), and GnuPG supports it.


This is interesting, but it doesn't really solve the key distribution problem: with well-known hosting you now have a (weak) binding to some DNS path, but you're not given any indication of how to discover that path. It's also not clear that a DNS identity is beneficial to PyPI in particular (since PyPI doesn't associate or namespace packages at all w/r/t DNS).

More generally, these kinds of improvements are not a sufficient reason to retain PGP: even with a sensible identity binding and key distribution, PGP is still a mess under the hood. The security of a codesigning scheme is always the weakest signature and/or key, and PGP's flexibility more or less ensures that that weakest link will always be extremely weak.


Right. I've never used PyPI, but TFA makes it sound like the existing support for signing is "We allow the uploader to upload a signature, and the downloader can look up the key indicated in the signature to do the verification." Is that correct? If so, then yes there is a key ID involved but no email address, so a generic downloader would have no choice but to look it up from a key server.


That's correct!

PyPI's support for PGP is very old -- it's hard to get an exact date, but I think it's been around since the very earliest versions of the index (well before it was a storing index like it is now). If I had to guess (speculate wildly), my guess would be that the original implementation was done with a healthy SKS network and strong set in mind -- without those things, PGP's already weak identity primitives are more or less nonexistent with just signatures.


GPG ASC upload support was quietly added later IIRC. EWDurbin might recall



Well, I think there should be broader discussion of this inadequacy.

"Implement "hook" support for package signature verification." (2013) https://github.com/pypi/warehouse/issues/1638#issuecomment-2...

"GPG signing - how does that really work with PyPI?" https://github.com/pypa/twine/issues/157#issuecomment-101460...

"Better integration with conda/conda-forge for building packages" https://github.com/pyodide/pyodide/issues/795#issuecomment-1...

Conda now has their own package cryptographic signature software supply chain security control control. Unfortunately, conda's isn't yet W3D DIDs with Verifiable Credentials and sigstore either.

Also, [CycloneDX] SBOMs don't have any package archive or package file signatures; so when you try to audit what software you have on all the containers on your infrastructure there's no way to check the cryptographic signatures of the package authors and maintainers against what's installed on disk.

  docker help sbom
  # check_signatures python conda zipapps apt/dnf/brew/chocolatey_nuget git /usr/local
And without clients signing before uploading, we can only verify Data Integrity (1) at the package archive level; (2) with pypi's package signature key.


You can't do WKD with just signatures, there are no identities associated with the signature to just look up.


Isn't that throwing out the baby with the bathwater? There seem to be non-neglible risks of installing malware from PyPI according to various headlines recently. But instead of improving security measures that don't work well they just remove them?


Removing security features that don't work is a separate concern from making security features that do work. Nobody who has done any serious work on PyPI security in the past 15 years thinks that GPG will play a part in the future of PyPI security. It's support was entirely vestigial, served no practical purpose, and never would.


[flagged]


Please make your substantive points without swipes, in keeping with the site guidelines: https://news.ycombinator.com/newsguidelines.html.

If someone else is wrong, it suffices to neutrally explain to them, and the rest of us, how they are wrong. Adding putdowns not only doesn't help, it harms the community and discredits your point, so please don't do that.

Edit: please see https://news.ycombinator.com/item?id=36092913 also. You broke the site guidelines badly in this thread.


Note that the person you're replying to is the PyPi maintainer responsible for removing GPG from PyPi.

The rationale is expanded here: https://news.ycombinator.com/item?id=36050190


[flagged]


Attacking other users will get you banned here. No more of this, please.

https://news.ycombinator.com/newsguidelines.html


Most supply chain attacks rely on dependency confusion or typo-squatting, which PGP signing doesn't solve. An attacker can PGP sign their typosquatted package, and the package manager won't know to alert you because as far as it can tell, you intended to install that package. (This is before even considering whether the packages are signed with strong keys, or users are actually verifying them against any public trust store.) That's one reason supply chain issues are so pernicious - they're more of a human problem than a technical one.

That said, I do agree with your premise that the limited usefulness of PGP signing doesn't necessitate removing the feature entirely.


> Isn't that throwing out the baby with the bathwater?

That assumes there’s a baby in the bath water.

> But instead of improving security measures that don't work well they just remove them?

Well yes, “security measures” which don’t work are usually worse than nothing.


There are many cases where it's better to know you don't have something correctly than think you have something incorrectly. Security is certainly one.


So they examined everything uploaded to PyPi with a signature over three years, including old versions, and classify those packages whose signing key is expired today, possibly years later, as "impossible to meaningfully verify." Never mind that the package may have been verifiable with a valid key for a full year or two before the key expired, and in the meantime may have been superseded by a newer version.

They also say they can't "meaningfully verify" packages if the key does not have "binding identify information," by which they presumably mean automatically verifiable binding identity information, which usually means someone verified an email from keys.openpgp.org. This is a really narrow way to establish "binding identity information." For example someone who is a PyPi author and publicly links their PGP key from a (https) website on the same domain as the email on the key would not count. A well known longtime PyPi author with a well known key would not count.

The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.

This has the smell of "we didn't want to bother supporting PGP any more because it's hard so we came up with an excuse."

No need for an excuse, though: Just be honest about it and let the chips fall where they may, if you really don't want to support PGP. God knows there are valid reasons for not having the energy to deal with PGP. (FWIW I think it's a good solution for packages, for those who can navigate the tooling, but on the other hand I'm not volunteering my time to run PyPi.)

P.S. There is a link in their post saying PGP has "documented issues." The specific issue described in the linked document is "packaging signing is not the holy grail" and a list of known things about PGP, like that verification of keys is ad hoc. It also concludes that there is no known better alternative.


> The ad hoc, out of band nature of how PGP keys are trusted is not remotely new - PyPi would have known from the very start of adopting PGP that many keys would not be automatically verifiable. It makes little sense to turn around now and act like this is some surprising thing.

This is revisionist: in 2005, PGP was approachingly modern and represented an acceptable tradeoff between usability, legal and patent constraints, and arms laws. It was also accompanied by a network of synchronizing keyservers and a "strong set" within the Web of Trust that, in principle, gave you transitive authenticity for artifacts. That never really worked as expected, but it's all code and infrastructure that was actually running in 2005, when PyPI chose to allow PGP signatures.

None of that is the case in 2023: PGP is 20 years behind cryptographic best practices, and has 30 years of unresolved technical debt. There is no web of trust, and the synchronizing keyserver network has been broken for years.

The argument for PGP in 2005 was that it was, to a first approximation, the best that could be done. The argument against PGP in 2023 is that, to a first approximation, it's worse than useless by virtue of providing false security guarantees.


There wasn't any sign or promise in 2005 (that I'm aware of, and I had a key back then) that PGP would soon get meaningfully better, for example that it would suddenly, magically solve the issue of verifying keys in a decentralized system. It was a pain in the arse then as now. There has been some improvement in the meantime, in that keys.openpgp.org started unilaterally doing its own email verification.

And you say that it's been 18 years and PGP is behind best practices, but you don't describe how those best practices would better solve the package verification challenge that PyPi faces. So in the absence of an actual alternative system why not keep using PGP? Perfect is the enemy of good (IMO!).

But again, I'm not even really feeling strongly that PyPi should use PGP or not. I mostly posted just to say that they should be honest about why they are leaving it, and these seem like bad/misleading stats that for some reason they are hiding behind instead of coming out and saying they changed their mind about PGP (or new people are now running things and don't like dealing with PGP - many people would sympathize).


> And you say that it's been 18 years and PGP is behind best practices, but you don't describe how those best practices would better solve the package verification challenge that PyPi faces. So in the absence of an actual alternative system why not keep using PGP? Perfect is the enemy of good (IMO!).

This may or may not be satisfying to you, but there is discussion around this, both in this thread, other threads on the internet, and PyPI's own issue tracker. The current plan is to integrate Sigstore[1] into PyPI as a more complete and modern codesigning solution. That work is progressing, and is not in a state that's meant to "replace" PGP. But that's intentional because, as the post states, nobody (to a first approximation) was using these PGP signatures anyways.

Perfect is indeed the enemy of the good; the other enemy of the good is bad things. PGP is bad; the reason I titled the original post "worse than useless" is because it takes a useless security feature (signatures that nobody verifies) and makes them actively dangerous by providing cryptographic margins that weren't even safe 25 years ago.

> But again, I'm not even really feeling strongly that PyPi should use PGP or not. I mostly posted just to say that they should be honest about why they are leaving it, and these seem like bad/misleading stats that for some reason they are hiding behind

Two things should be separated here: there's the PyPI blog post, which is written by the PyPI admins, and there's the "worse than useless" blog post, which was written by me. I am not an admin of PyPI, and it's my independent technical opinion that PGP is bad. I stand by the stats that I've included in my own post, but I do welcome specific critiques of how they're bad or misleading.

The PyPI admins can provide their own rationale, but this is my best understanding: they have known for years that PGP is bad, and have more or less tolerated it as a legacy feature because removing it was a low priority. The post I wrote two days ago was just a "final nudge" towards removing it, since the post's statistics (particularly large numbers of expired keys) refute one of the last defenses for PGP on PyPI.

[1]: https://www.sigstore.dev/


> The PyPI admins can provide their own rationale, but this is my best understanding: they have known for years that PGP is bad, and have more or less tolerated it as a legacy feature because removing it was a low priority. The post I wrote two days ago was just a "final nudge" towards removing it, since the post's statistics (particularly large numbers of expired keys) refute one of the last defenses for PGP on PyPI.

PyPI Administrator here, and the person who removed GPG from PyPI.

All the way back in 2013 I had written blog posts that talked about how GPG was not sufficient for a secure package signing scheme in a repository.

I first proposed removing GPG back in May of 2016 (turns out May is a bad month for GPG in my world). At that time we were knee deep in rewriting PyPI into it's modern incanation and trying to quickly identify what features were actually important enough to keep in the new implementation and what features were not.

Even back in 2016 I did not think that the level of use of GPG and the relative uselessness of the signatures made sense to keep it as a feature. However when I proposed it we got some small amount of push back, primarily from Linux distributors, and the feature had already been implemented so we just removed it from the UI and left the feature in. This wasn't an endorsement of the feature, but rather a tactical choice that it wasn't worth spending more time on removing GPG at that point when we were focused on the rewrite.

In the intervening years it had periodically come up, everyone had agreed that it wasn't part of our long term plans, but nobody had the time to dig into figuring out if the signatures that were being uploaded were actually useful and without that, there was some vague concern that maybe somewhere out there some system might be relying on them, and not wanting to "pick a fight" over it at that time.

Then woodruffw did the work to investigate how useful the existing signatures actually were, and quite frankly the numbers were worse than I expected. I honestly expected most of the existing signatures to be meaningfully verifiable, because from my perspective, the only people left signing were likely going to be people who were invested in GPG, and thus more likely to spend the time to make sure that everything was working.

Given that new information, along with a long desire (over 7-10+ years now!) to remove this small bit of security theater, I went ahead and threw together a pull request to actually do it now. Like a lot of things in OSS, it was a perfect storm of someone pointing out a problem to someone who had enough time and motivation at that point in time to fix it that made that particular task bubble up to the top of my long TODO list.


Why would you expect regular GPG users to have signed things years ago with keys that never expire? People who regularly use GPG tend to set expiration dates on their keys. Non expiring keys are a really bad idea.


I agree that long lived GPG keys are a bad idea, which is yet another reason why the feature was a bad one, because it could only ever work with non expiring GPG keys.


>providing cryptographic margins that weren't even safe 25 years ago.

Such an exceptional statement requires some sort of argument...



You posted a link to a whole document. I am not going to try to figure out what you mean.


>PGP is 20 years behind cryptographic best practices...

In what sense? If someone signs a package with, say, a RSA key, how is that behind in some way?

>30 years of unresolved technical debt.

How can a standard for a file/message format have technical debt. PGP is dead simple. Where is this debt hidden?


> In what sense? If someone signs a package with, say, a RSA key, how is that behind in some way?

OpenPGP specifies PKCS#1 v1.5 for RSA padding. Attacks on PKCS#1 v1.5 have been well understood for over 20 years[1]; every few years, someone finds a new one.

RSA itself is well-known for having weird number-theoretic problems that implementations have failed to respect, to catastrophic effects. Best practice for algorithm selection is to pick algorithms where users can't compromise the integrity of the scheme through poor public parameter selection; RSA forces the user to pick a public modulus and exponent, leading to all kinds of silly things that actually happen[2].

Edit: Correcting myself: most attacks on v1.5 padding concern encryption, not signatures. The general fragility argument remains, however.

[1]: https://en.wikipedia.org/wiki/PKCS_1#Attacks

[2]: https://news.ycombinator.com/item?id=5993959


Exactly how is PKCS#1 v1.5 vulnerable in PGP usage? You might be confusing PGP usage with encrypted pipe applications like TLS.

>RSA forces the user to pick a public modulus and exponent, leading to all kinds of silly things that actually happen[

What PGP implementation forces the user to pick the public modulus and exponent?


> Exactly how is PKCS#1 v1.5 vulnerable in PGP usage?

Something that is often misused in a way that compromises security really needs to be proven secure rather than the opposite.


PGP's extremely poor UX would suggest there's code that doesn't exist that should.


There are much, much better solutions for packaging!


Like signify (developed and adopted by OpenBSD) and minisign.

AFAIK Debian has been working on abandoning GPG in favor of something very similar to those two. Not sure when it's going to be shipped, though.

https://wiki.debian.org/Teams/Apt/Spec/AptSign


Which you do not describe, but set that aside: A post that honestly said "we do not like PGP, but here is our alternate plan" would be great. On an actual better solution, I don't think anyone has proposed a good one. Here is the closest I've seen from PyPi (or at least linked from this post as describing their thinking), from 10 years ago:

"Everything is Terrible So What Do We Do?

Bluntly put, I don’t know for sure. This isn’t an already solved problem nor is it an easy to solve one."

https://caremad.io/posts/2013/07/packaging-signing-not-holy-...

What I'll say on PGP is the perfect is the enemy of the good. It's not a tech anyone has much fun using, but in a group setting, used regularly, I have found it can fade into the background at least. I don't want to go any further down the "is PGP good or bad" rabbit hole than that.

But if you have a better solution for package security, please do describe it here.


The current documented plans revolve around TUF (https://peps.python.org/pep-0458/, https://peps.python.org/pep-0480/). Those links have probably bit rotted a bit by now, progress has been slow on implementing them for a number of reasons (mostly OSS reasons, volunteers etc).

There's also a general consensus (not documented) that sigstore will play some kind of role here. Possibly in-toto as well?

In the 10 years since my post that you referenced, we've laid some decent plans I believe, and have just slowly been working on them, to the extent that we've been able to given our own time constraints.


It's not really up to you or me, it's up to PyPI. For my part: their logic seems pretty sound.


And which of those are in use/available at the Python Package Index?


I don't know, but PGP isn't either, so the comparison holds.


So they're removing PGP signatures, which certainly have some issues, and replacing them with ... nothing?


The research article cited in the announcement is titled "PGP signatures on PyPI: worse than useless."

That's the issue. Pretending there is a security solution in place is worse than being upfront that there is none. If you look down and notice that your seatbelt is actually made out of angel hair pasta, you might drive more carefully. Hopefully you'll also get a better car.


But they're not "worse than useless", that article was wrong. PGP/GPG are without doubt problematic, they have weak points (like use of SHA-1, some keys that could not be located, and terrible UI) but they are not worse than having no traceability of the package at all between the author and PyPI.


The system guaranteed that a key signed a package. That was its entire utility.

At best, it defeated plausible deniability for package maintainers who had avowed public keys, but then somehow signed a bad package. This wouldn't have stopped the malware from getting onto your system. It only would have led you to the hapless (but honest) package maintainer.

It didn't stop someone who is not you from generating a PGP key for Richard WM Jones, signing malware, uploading to PyPi, and then disappearing back under the rock where they live. And if you believe this system is not useless, then you also believe that at least one person out there was not dissuaded from installing that malware because "Hey, someone named Richard WM Jones went through the trouble of signing it!"

As is often the case, the value of this system depends on your threat model. I'm not too worried about someone going rogue from the tiny population of people who were using PGP correctly. But I am worried about using a platform that claimed to have signing infrastructure, when that infrastructure had no meaningful checks on who was signing.


> they are not worse than having no traceability of the package at all between the author and PyPI.

Except that they are: PGP does not give you this kind of identity relationship. The most it can give you is an association to a key ID, which is (1) brute-forceable, and (2) not strongly bound to any actual user or machine identity.

The only thing worse than an unsecured scheme is an insecure scheme that lulls users into a false sense of security and authenticity. PGP signatures on PyPI are the latter.


>The most it can give you is an association to a key ID, which is (1) brute-forceable

This is false. That would mean brute forcing a 160 bit SHA-1 hash. That is not possible.


This is wrong on every conceivable level:

1. Key IDs are 32-bit truncations of the SHA-1 hash. That means 32 bits of state, not 160. Fake key IDs are observable in the wild[1].

2. You're confusing pre-image resistance with collision resistance. SHA-1 has been publicly vulnerable to practical collision attacks since 2017[2].

[1]: https://www.phoronix.com/news/Short-PGP-Collision-Attacks

[2]: https://shattered.io/


OK, you are talking about the 64 bit keyID. The 32 bit keyIDs are no longer used for the reason you state.

Shattered is about SHA-1 use in key signatures. It has nothing to do with keyIDs.


It's like the larger holy war against self-signed certificates in TLS. They are strictly better than plaintext but there is software that will prefer a plaintext connection to self-signed TLS.


Unverifiable security is not "strictly better" than plaintext.


It is. In both cases you may be MiTMed, but with plaintext that MiTM session may also be eavesdropped upon.

This isn't really debateable. Unverified sig simply is strictly better.


In both cases you can be eavesdropped upon. It really isn't debatable indeed, unverifiable security is not security.


Nope. An unverified TLS session still cannot be examined by a third party. You know you are communicating with exactly one party, even if you don't know who that is.

Your attacker may share the data with a third party, but that's true of verified connections too.


It is true that an unverified TLS session does prevent passive attacks it does not prevent against "active" attacks. The general consensus is that it's not a useful property to differentiate passive from active here, since every passive attack can be upgraded to work as an active attack, on top of the fact that explaining the subtle differences to people is extremely difficult (and since they can be upgraded to active attacks, not worth it).


Unless you have the other parties selfsigned cert pinned, you very well may be experiencing MITM and not know about it.


Exactly. You may be being MITMed, but that attack cannot also be eavesdropped or altered by another attacker.


Assuming of course that the other attacker doesn't just MITM the first attacker.

But this is a very silly threat model, "I want exactly one person to be able to attack me at a time".


Not sure what's silly about that. If I'm on the hotel wifi I'd rather my neighbor not be able to eavesdrop on the person phishing me.

I don't see what's so hard about this:

Plaintext is worst (active and passive attacks possible)

Unverified TLS is better (active attacks possible)

Verified TLS is best (neither active nor passive attacks possible)


It's been a couple days since this thread quieted down, but I've continued to think about the logic behind the discussion. I believe the fallacy here is akin to arguing about numbers without units attached to them.

For most of the people in this thread, the units are all something like "number of times my house burns down." I guess I'd rather my house burned down once rather than twice, so to that extent your position is not irrational. But the second time is not meaningfully different; the only further loss is maybe a magazine or newspaper that the postal service delivered the day after the first fire and placed on the ashes that used to be my mailbox. It's certainly sad that I won't get to read the paper after the second fire in as many days, but I'm still mostly concerned that my house burned down.

Your units, on the other hand, are inconsistent and surprisingly ordered. Either you really enjoyed the unburned article you read, apparently enough to forget that the rest of your worldly possessions are gone (this is the "at least nobody eavesdropped on my conversation with the MITM" position), which implies that the units are large, or that avoidance of eavesdropping outweighs undetected MITM. Or else you wear an asbestos suit 24/7 because you already assumed most of the world is on fire and don't care if it engulfs your home (this is the part about how you believe HN could someday serve malicious JS, so that origin authenticity wasn't a big deal in the first place), which suggests that the units were small.

Your values are your own, and only you can decide to change them. But the discussion might have been shorter and smoother if you'd acknowledged that others have been using a single, consistent unit called "catastrophes," and that the only numbers we care about are zero and any.


Good analogy. I stopped responding after I understood some easily avoidable risks were totally acceptable to bandrami. That's not how my risk model works, especially when it's usually very easy to not accept such risks at all and the alternative would be a potential disaster.

I personally like to have my life/work set up so that I know what catastrophies _can't_ happen (the probablities can be compared to the effort required to boil oceans or waiting until the heat death of the universe).


Again, not "totally acceptable", but better to be limited to a single attack channel than multiple ones. You're just being willfully obtuse to ignore that.


+1. High-school curriculums should include something like applied Bayesian reasoning. Understanding dependent probabilities is an underappreciated superpower.


What I learned from this thread is that a lot of y'all seem to trust counterparties a lot more than you should just because one of the 172 CAs in your OS's chain will claim they are who they say they are. Remember that those 172 CAs include the Chinese and Turkish governments.


Technically the MITMer could send your data to the intended server in plaintext, or upload your traffic as a torrent, or something else fun like that.


The counterparty of verified TLS can also share your communication with a 3rd party; that isn't the problem verification solves.


"Private chat with the devil" is not a useful security model for most web sessions, and it's certainly not suitable for codesigning. Authenticity is the property we're aiming for; if it was just integrity, we'd do nothing at all other than provide digests.


"Private chat with the devil" is the perfect security model for most web sessions. I trust very few websites I visit in any real sense.


> "Private chat with the devil" is the perfect security model for most web sessions. I trust very few websites I visit in any real sense.

I'm sorry, but this is either incorrect or a gross misunderstanding of your own threat model.

Most people treat their online self as an extension of their physical self: that means banking information, private personal details, intimate communications, and everything else that's normally private by virtue of physical ownership needs to go through an authenticated channel.

You might not care that someone can't MITM your Wikipedia traffic, but you almost certainly care that someone can't MITM your tax returns or your medical records.


> You might not care that someone can't MITM your Wikipedia traffic, but you almost certainly care that someone can't MITM your tax returns or your medical records.

So presumably, you'd demand a cert from a trustworthy authority in those cases. But you still don't want your ISP to be able to inject ads into the recipe blog you're reading.


With self signed certificates your ISP can just serve their own self signed certificate, and then inject ads into the recipe blog.


And I definitely want 3rd party verification for my tax preparation website.

But I don't trust news.ycombinator.com any more than I trust somebody pretending to be news.ycombinator.com; validating that cert does nothing useful for me.


> But I don't trust news.ycombinator.com any more than I trust somebody pretending to be news.ycombinator.com; validating that cert does nothing useful for me.

Yes, you absolutely do. You don't expect news.ycombinator.com to serve you malicious JavaScript, or to redirect you to porn, or to do anything other than serve churlish content from the Internet commentariat.


> You don't expect news.ycombinator.com to serve you malicious JavaScript

I absolutely do not trust this site (or most sites) enough to run arbitrary javascript it serves.


> even if you don't know who that is

Thus it can be the eavesdropper, easy.


Exactly, the eavesdropper.

Even if you're being MiTMd by a criminal organization, you aren't also being listened to by the NSA.


I think another thing with pgp is that it's in this awkward place where it's bad enough that few people use it, but good enough that it prevents someone from making an alternative.


Nobody's making a PGP alternative because a major part of what makes PGP bad is that it tries to be a generic solution to every problem, when in practice signing and encryption workflows are incredibly domain-specific.

People are continuously creating better tools for domains that historically saw PGP usage. To name a few: Signal for short-form messaging, age for file encryption, signify/minisign for artifact signing.


The article is closer to a blog post than a multiple authors peer reviewed research paper published in a high impact journal.


That's because it is a blog post. It isn't advertised as anything else.


I wonder if the domain being `blog.pypi.org` tipped them off?


Inasmuch as PGP signatures are rarely used and even more rarely useful, I don't think it's a problem to remove them and replace them with nothing. If it is a problem, it's been a problem for a long time and it's not really making the situation meaningfully worse to remove them.

That said, if PGP signatures are to be replaced then there's no reason why they can't be removed now and replaced with something later.


Sooner or later they will ask a photo of a passport… google's idea, from their security blog.


PGP signatures purpose is to remove the dependency on trusting PyPI, i.e. protecting against PyPI getting hacked.

(Note: PyPI protects against MITM with HTTPS.)

Removing this is predicated on the idea that is a low priority threat vector.


> Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19) 2.

so...*reject those packages*. if you use a PGP key that isn't properly available or verifiable, reject it. That way every package with a PGP key will have 100% "key is properly discoverable" rate.

it's not really reasonable to just drop this feature because most packages don't use it. Packages with tens of millions of downloads (like mine) make up a small percentage of total packages, but this small number of packages makes up a huge proportion of actual downloads, and package signing is most useful for these kinds of packages.

if the adoption of "proper PGP keys" were ranked by packages/ downloads rather than "packages" alone, these rates would be much different.


I don't believe they would.

Looking at the top 20 packages in the last month by download (packages with hundreds of millions of downloads), only 1 of them shipped a GPG signature with their most recent release. I haven't asked the author of that one, but I do know them and I suspect they agree with the idea that it's not a valuable thing and they do it largely because it exists.


> they do it largely because it exists.

That’s me. I used to upload signatures to PyPI only because it’s a thing that exists and it’s not much trouble. I’d be counted among the valid 36%, but I doubt anyone ever verified even one of the hundreds of sigs I uploaded over the years. I eventually stopped due to the pointlessness.


That quote doesn't make any sense even if we stopped at the first part. I PGP-sign my packages and my key is not on any public key server. It's on my website. This reasoning lacks rigor and seems to only serve as an excuse to remove a feature that some pypi devs didn't like without offering an alternative for security guarantees that it provided.


I don't understand how Java can get this right with Maven Central and co but newer languages can't.

Having a slight barrier to entry which is essentially "you must learn why signing is important for users of your library and this is how to do it", a) really isn't that bad and b) doesn't result in less quality packages being uploaded c) if it acts like any sort of filter that seems to be a good thing.

Maven Central isn't short of high quality packages and no high quality OSS Java libraries are missing so the filter aspect isn't culling anything important.

Java, Apt, RPM, etc all have this and have absolutely gigantic numbers of packages so the argument that it's too hard really just doesn't hold water.

Doing so requires reading/understanding these ~3 pages of docs: https://central.sonatype.org/publish/requirements/gpg/


> newer languages can't.

Python (1991) is older than Java (1995)

(irrelevant factoid, but still ...)


I don't believe that Maven Central's use of GPG is providing a meaningful security control here, so I would dispute the idea that they're doing it "right".


At the very least there are a) more active keys b) those keys are available on keyservers and c) it's being used by the major packages in the ecosystem correctly. i.e Spring, Jackson, Quarkus, Logback, Apache-sphere, Google-sphere, etc.

So while it might not be providing meaningful security for lower-tier packages it's definitely doing it's job for top tier packages like these that are relied on by hundreds of thousands of projects.


> I don't understand how Java can get this right with Maven Central and co but newer languages can't.

it's the magic combination of pushing their own agenda (vs. that of their users), mixed with ineptitude


Now that you have removed GPG ASC signature upload support, is there any way for publishers to add cryptographic signatures to packages that they upload to pypi? FWIU only "the server signs uploads" part of TUF was ever implemented?

Why do we use GPG ASC signatures instead of just a checksum over the same channel?


> Why do we use GPG ASC signatures instead of just a checksum over the same channel?

Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.

GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.


> Why do we use GPG ASC signatures instead of just a checksum over the same channel?

You can include an md5sum or a sha512sum string next to the URL that the package is downloaded from (for users to optionally check after downloading a package); but if that checksum string is uploaded over the same channel (HTTPS/TLS w/ a CA cert bundle) as the package, the checksum string could have been MITM'd/tampered with, too. A cryptographically-signed checksum can be verified once the pubkey is retrieved over a different channel (GPG: HKP is HTTPS/TLS with cert pinning IIRC), and a MITM would have to spend a lot of money to forge that digital publisher signature.

Twine COULD/SHOULD download uploads to check the PyPI TUF signature, which could/should be shipped as a const in twine?

And then Twine should check publisher signatures against which trusted map of package names to trusted keys?


1) the server signs what's uploaded using one or more TUF keys shared to RAM on every pypi upload server.

2) the client uploads a cryptographic signature (made using their own key) along with the package, and the corresponding public key is trusted to upload for that package name, and the client retrieves said public key and verifies the downloaded package's cryptographic signature before installing.

FWIU, 1 (PyPI signs uploads with TUF) was implemented, but 2 (users sign their own packages before uploading the signed package and signature, (and then 1)) was never implemented?


Your understanding is a little off: we worked on integrating TUF into PyPI for a while, but ran into some scalability/distributability issues with the reference implementation. It's been a few years, but my recollection was that the reference implementation assumed a lot of local filesystem state, which wasn't compatible with Warehouse's deployment (no local state other than tempfiles, everything in object storage).

To the best of my knowledge, the current state of TUF for PyPI is that we performed a trusted setup ceremony for the TUF roots[1], but that no signatures were ever produced from those roots.

For the time being, we're looking at solutions that have less operational overhead: Sigstore[2] is the main one, and it uses TUF under the hood to provide the root of trust.

[1]: https://www.youtube.com/watch?v=jjAq7S49eow&t=1s

[2]: https://www.sigstore.dev/


python-tuf [1] back then assumed that everything was manipulated locally, yes, but a lot has changed since then: you can now read/write metadata entirely in memory, and integrate with different key management backend systems such as GCP.

More importantly, I should point out that while Sigstore's Fulcio will help with key management (think of it as a managed GPG, if you will), it will not help with securely mapping software projects to their respective OIDC identities. Without this, how will verifiers know in a secure yet scalable way which Fulcio keys _should_ be used? Otherwise, we would then be back to the GPG PKI problem with its web of trust.

This is where PEP 480 [2] can help: you can use TUF (especially after TAP 18 [3]) to do this secure mapping. Marina Moore has also written a proposal called Transparent TUF [4] for having Sigstore manage such a TUF repository for registries like PyPI. This is not to mention the other benefits that TUF can give you (e.g., protection from freeze, rollback, and mix-and-match attacks). We should definitely continue discussing this sometime.

[1] https://github.com/theupdateframework/python-tuf

[2] https://peps.python.org/pep-0480/

[3] https://github.com/theupdateframework/taps/blob/master/tap18...

[4] https://docs.google.com/document/d/1WPOXLMV1ASQryTRZJbdg3wWR...


> [4] [TUFT: Transparent TUFT] : https://docs.google.com/document/d/1WPOXLMV1ASQryTRZJbdg3wWR...

W3C ReSpec: https://github.com/w3c/respec/wiki

blockcerts-verifier (JS): https://github.com/blockchain-certificates/blockcerts-verifi...

blockchain-certificates/cert-verifier (Python): https://github.com/blockchain-certificates/cert-verifier

https://news.ycombinator.com/item?id=35896445 :

> Can SubtleCrypto accelerate any of the W3C Verifiable Credential Data Integrity 1.0 APIs? vc-data-integrity: https://w3c.github.io/vc-data-integrity/ ctrl-f "signature suite"

>> ISSUE: Avoid signature format proliferation by using text-based suite value The pattern that Data Integrity Signatures use presently leads to a proliferation in signature types and JSON-LD Contexts. This proliferation can be avoided without any loss of the security characteristics of tightly binding a cryptography suite version to one or more acceptable public keys. The following signature suites are currently being contemplated: eddsa-2022, nist-ecdsa-2022, koblitz-ecdsa-2022, rsa-2022, pgp-2022, bbs-2022, eascdsa-2022, ibsa-2022, and jws-2022.


https://github.com/theupdateframework/taps/blob/master/tap18... :

> TUF "targets" roles may delegate to Fulcio identities instead of private keys, and these identities (and the corresponding certificates) may be used for verification.

s/fulcio/W3C DID/g may have advantages, or is there already a way to use W3C DID Decentralized Identifiers to keep track of key material in RDFS properties of a DID class?


What command(s) do I pass to pip/twine/build_pyproject.toml to build, upload, and install a package with a key/cert that users should trust for e.g. psf/requests?


Where does the user specify the cryptographic key to sign a package before uploading?

Serverside TUF keys are implemented FWICS, but clientside digital publisher signatures (like e.g. MS Windows .exe's have had when you view the "Properties" of the file for many years now) are not yet implemented.

Hopefully I'm just out of date.


> Where does the user specify the cryptographic key to sign a package before uploading?

With Sigstore, they perform an OIDC flow against an identity provider: that results in a verifiable identity credential, which is then bound to a short-lived (~15m) signing key that produces the signatures. That signing key is simultaneously attested through a traditional X.509 PKI (it gets a certificate, that certificate is uploaded to an append-only transparency log, etc.).

So: in the normal flow, the user never directly specifies the cryptographic key -- the scheme ensures that they have a strong, ephemeral one generated for them on the spot (and only on their client device, in RAM). That key gets bound to their long-lived identity, so verifiers don't need to know which key they're verifying; they only need to determine whether they trust the identity itself (which can be an email address, a GitHub repository, etc.).


What command(s) do I pass to pip/twine/build_pyproject.toml to build, upload, and install a package with a key/cert that users should trust for e.g. psf/requests?


Signature tells you who signed it.

Of course, if you haven't put any effort in system to end-to-end verify whether it's right signature it doesn't matter.


pip checks that a given was signed with the pypi key but does not check for a signature from the publisher. And now there's no way to host any type of cryptographic signatures on pypi.

There is no e2e: pypi signs what's uploaded.

(Noting also that packages don't have to be encrypted in order to have cryptographic signatures; only the signature is encrypted, not the whole package)


Yeah the whole thing looks like throwing away baby with bathwater; the package should

* get a signature for author ("the actual author published it") + some metadata with list of valid signing keys (in case project have more authors or just for key rotation * get a signature for hosting provider that confirms "yes, that actual user logged in and uploaded the package" * (the hardest part) key management on client side so the user have to do least amount of work possible in when downloading/updating valid package.

If user doesn't want to go to effort to validate whether the public key of author is valid so be it but at very least system should alert on tampering with the provider (checking the hosting signature) or the author key changing (compromised credentials to the hosting provider).

It still doesn't prevent "the attacker steals key off author's machine" but that is by FAR the rarest case and could be pretty reasonably prevented by just using hardware tokens. Hell, fund them for key contributors.


Two days ago: https://news.ycombinator.com/item?id=36021172 ("PGP signatures on PyPI: worse than useless", >150 comments)


When many developers didn't use 2FA they pushed for them to enable 2FA within a deadline. It sounds like the same approach could have been used for PyPI. E.g.: an attempt to make the feature useful before declaring it dead forever.


This has very little to do with 2FA: PGP signing has been de facto dead for years on PyPI, and this change has no effect on publishing workflows: PyPI will still accept uploads that contain signatures, and just ignores them now.

It's also not accurate to say that PyPI failed to make 2FA useful: it was deployed for over two years before the 2FA mandate for critical projects went into effect. That mandate also came with free hardware keys for everyone affected.


No. 2FA is a feature for pypi, and developers. The entire purpose of pgp sigs was external, it was for distributions to use.

Distributions don’t use it, therefore it’s worthless, just just overhead and technical debt.


Debian checks PGP signatures of releases.


For Python packages served by PyPI?


Sometimes? There's no global policy of doing it in Debian, it's up to individual package maintainers inside of Debian to enable it (it defaults to off AFAIK) and to hardcode the key that they expect the package to be signed by.

In the cases that it is used, AFAIK it is only used by Debian's uscan program, which is sort of like the Debian version of Dependabot, it tells them when there is a new version of something to package. As far as I know, the process of packaging that new version is still manual, and relies on the maintainer downloading the package and packaging it, so they may or may not use the signature in that case.

How useful this is, is up for debate. Many years ago when I first started taking over releasing pip, that caused the pip GPG key to change, and the reaction of the Debian maintainer at the time was to just comment out the signature bit and fall back to no signature.


I came here thinking they were removing the PGP package from PyPi, but they're just removing a barely-used signature system? I don't know why they have to remove it though. I doubt it requires much maintenance now that it's already in place.

Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.


They address your comment directly in their post-

> While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.


> Even if only 37% of keys are verifiable, that's infinitely more than will be verifiable if they remove the PGP support.

Discoverable. That does not really verify anything about the key, its identities or the supposed signer.

It boils down to almost entirely to just an overcomplicated hashing system.


At least you can't blame pypi for ignoring the report, and tbh I find this response time remarkably quick. It wouldn't have been far fetched to imagine someone in their position just trying to ignore/downplay/dispute this sort of reports.


As the author of the post noted above, the pypi maintainers have been wanting to get rid of pgp for awhile.

The post gave them excellent additional justification to.


I don't understand the argument. Isn't the whole point of PGP establishing some kind of chain of trust? If pypi.org has it's public key, it could sign a few major distributors's keys, and for smaller/individual packages I could either choose to always trust the same public key or don't use the package. It's not a centralized system to begin with. It's not pypi.org's responsibility to identify and verify all the keys belong to who say they belong. Pypi.org's unable to verify individual identities shouldn't impact the overall usefulness of the PGP for package distribution and verification.


Interesting timeline. The Yossarian article that TFA cites and that I assumed was the impetus here was published two days ago on 5/21. But the audit was two days earlier on 5/19.


I originally ran the audit on 3/27 (IIRC), and then ran it a few additional times as I fixed data quality issues in my scripts (the ones linked in the post). The last time I ran it was on 5/19-5/20, when I was finalizing the post. You can also see that I did a new release of `pgpkeydump` at around the same time, to add some more extracted datapoints.

PyPI's admins have been wanting to remove PGP support for years; all I did was provide the final nudge.


I have been thinking about this in the context of Java libraries (really using Scala, but bear with me).

If the repo requires a GPG signature, they could also ask for the public key of the developer making the releases (e.g. when they make the account), and they could sign it with their key at that point.

Then make available the package, the signature, and the signed public key. Then I only need to trust the repo's key (in this case PyPi).

Does this make any sense?


> Does this make any sense?

It makes sense in terms of trusting the package index, but it's inverted from the original design goal: the point of end-user signatures on package indices is to eliminate unnecessary package index trust, not reinforce it.

If you already trust the package index, then mandating HTTPS and strong cryptographic digests is going to be far more effective (and secure) than some kind of PGP key attestation scheme.


The package index only hosts the packages, but doesn't release them. The dev releasing the package is who signs it.

Without an easy way to verify the keys, the signatures are useless. Which is why PiPy is removing the GPG keys all together.


> The package index only hosts the packages, but doesn't release them. The dev releasing the package is who signs it.

I know that; the GP is describing a countersigning scheme, where the package index (qua trusted entity) countersigns for the signing key, which the dev then uses to sign for their package.

> Without an easy way to verify the keys, the signatures are useless. Which is why PiPy is removing the GPG keys all together.

Agreed entirely; I'm the one who wrote the analysis in the linked announcement :-)


What are we switching to? Does Pypi support ECDSA?


Just for disambiguation: ECDSA is a signing algorithm, not a protocol or toolkit like PGP. PGP can produce ECDSA signatures through an extension RFC, but it's not a core part of OpenPGP.

There is no immediate replacement, because the overwhelming majority of packages never bothered to sign with PGP (and all evidence points to the overwhelming majority of signatures never being verified). In other words, this is much closer to removing "dead" code than to killing an active feature.

Longer term, the plan is to integrate Sigstore[1]-based signatures.

[1]: https://www.sigstore.dev/


sigstore I hope.


I’m not sure if I understand this correctly, but, this basically seems to be a CA, with SSO-type of proof of identity, short lived certificates and transparency logs?

How an OIDC identity is obtained and secured is not treated. It brings useful organization to PKI, but the problem remains. You have to delegate trust to identity providers: Google, GitHub, etc.

Keybase was interesting, but the project seems semi-dead.


> How an OIDC identity is obtained and secured is not treated. It brings useful organization to PKI, but the problem remains. You have to delegate trust to identity providers: Google, GitHub, etc.

Yes, this is a fundamental (and, IMO, reasonable) assumption in Sigstore. The trust argument for large IdPs is that they (1) have the institutional ability and resources (like incident response) to maintain their service, (2) have strong incentives to maintain and improve the overall security of their providers (billions of accounts on the Internet are bound to SSO via Google, etc.), and (3) that any failures in those providers are already catastrophic, so reducing the number of moving and potentially failing parts is a net win in terms of security.


How would someone publish a package without having to rely on institutional OAuth providers?


You can still publish a package without signing it; none of this is mandatory (or even implemented yet).

IMO, one of the things that Sigstore will need to do to become a "serious" codesigning solution for OSS ecosystems is support one or more vendor-neutral IdPs: Sigstore itself is a Linux Foundation project and thus might be able to serve as the right venue for that, or it could be a CA/B-style affair where individual neutral IdPs can qualify for inclusion.


I'd love to make this happen, but there aren't really any IDPs that meet this criteria yet.

Happy to help get one going though!


You will run into this problem when developers from "fun" places like Russia and Iran try to sign packages since they are sanctioned by GitHub and other services. I am not sure about Python but quite a few high profile JavaScript libraries are by Russian engineers.


> they are sanctioned by GitHub and other services

They're sanctioned by various governments around the world, not GitHub.

I think this is an important difference, because GitHub doesn't really have much choice in this.


Ah yes. Let's give all of our personal information to USA… They are so trustworthy!


Eventually? You don't.

The goal of the big companies financing pypi and the other repositories is to identify users with a name, so they can easily ban russians/koreans/iranians/tomorrow's undesirables with ease.


With my PyPI administrator hat on, we have absolutely zero desire to ban anyone from PyPI for anything other than their actions on PyPI and in the Python ecosystem (uploading malware, etc).

If some class of users cannot use whatever signing solution we come up with, then we'll figure out an option for them or we'll scrap the solution completely.


Nice to know that I was wrong!


Couldn't you say those same things about the current providers for code signing certificates?

It almost sounds like a trade of one subpar solution for another. What do I get as a consumer trying to verify an identity? The way I see it:

    Current:  This code was signed by someone that gives Sectigo $500 / year.
    Sigstore: This code was signed by someone that also has a Google account.
Neither does much in terms of identity or trust. Maybe that's not the point and the idea for Sigstore is to provide the equivalent of a long term (ie: lifetime) key by tying the identity to a well known account, but that doesn't feel like an awesome solution for me as a developer.

The saying "not your keys, not your crypto" seems to apply here, but it's more like "not your keys, not your identity". Google or Microsoft (via GitHub) own my identity and they can take it away, right?

I don't know a ton about Sigstore, so maybe these are all unjustified concerns, but, to me, it seems like it's convenient for key management, but I don't understand how it adds much value. The way I see it, when I consume a package the only guarantee I get is something along the lines of "this package was published by example456@gmail.com" and it's guaranteed by OIDC, signing, etc..

That OIDC identity can still be compromised and, even if the numbers get reduced, we'll still have to deal with fake accounts and bad actors. Look at the volume of spam accounts on YouTube. For me, that doesn't inspire confidence in letting companies like Google become the arbiters of identity for the entire development community.

What if I get banned for no reason? Can I bootstrap back into the system using an alternate provider? How do I communicate that change to everyone consuming my packages?

I feel like there's a chance a system like Sigstore could devolve into the same kind of verification system we have with the current code signing vendors, but even worse because of the massive scale and minimal revenue. We could end up sending government ID to Google and Microsoft just to have accounts unlocked. Even though I think they're all terrible, the current code signing vendors at least have humans you can deal with.

I think a domain is a better way of bootstrapping identity. It's a globally unique ID, so it improves discoverability and reduces confusion / impersonation. I know it's not the same as code signing in terms of protecting the supply chain, and agree that code signing is more important, but I'd rather have a code signing system that treats my domain as my identity compared to one that uses an account from one of the big tech companies.

Would you rather see "this code was signed by tailscale.com" or "this code was signed by tailscale@gmail.com"? I know the domain is owned by the organization I'm expecting. I have no idea who owns the GMail handle and I don't see how being guaranteed they signed some code gains me anything, regardless of how fancy the whole system is. I guess once I trust them I know it's the same person signing packages, as long as their Google account hasn't been compromised.

Like I said, I don't know enough about Sigstore, so hopefully someone can explain to me how it isn't just a more convoluted way of owning a long-lived signing key, but with the downside of having it gated / controlled by someone else via OIDC.


I'm not a huge expert on Sigstore, but I believe it's better to think of sigstore as similar to Certificate Transparency than similar to GPG signatures. The idea being that signatures on sigstore are on a public log, so you can't be given a binary that doesn't have some publicly available signature.

However sigstore does not solve the question of ensuring that a package is coming from the person(s) you expect it to.


I've read parts of the docs, but I guess it'll take a bit to see how it works in practice. The docs [1] list some of my concerns as "What Sigstore Doesn't Guarantee".

That same doc also says:

> Fulcio was designed to avoid revocation by issuing short-lived certificates instead. When signing, the user only needs to know that the artifact was signed while the certificate was valid.

The part I don't get is what happens when someone's OIDC account gets compromised and used to sign artifacts? How do the compromised signatures get un-trusted? Even better, what happens if the artifact repository is using OIDC and developers use the same "log in with GitHub" account for the package repo and Sigstore? Isn't that about the same as not even having signatures at that point?

As for the Fulcio stuff, I don't really get the point of some of it TBH.

> Fulcio assumes that a valid OIDC token from a trusted provider is sufficient “proof of ownership” of the associated identity.

Most OIDC providers are going to have an email or SMS based account recovery process, so, at that point, you're more or less trusting those as proof of identity. I feel like history has proven that's not adequate for protecting anything of value.

Then there's the post-failure, auditability of the logs.

> As a result, users can detect any mis-issued certificates, either due to the CA acting maliciously or a compromised OIDC identity provider. Combined with Rekor's signature transparency, artifacts signed with compromised accounts can be identified (auditability).

I don't want to be notified about failures that I can audit after the fact. I want to be notified before a cert is issued for my identity.

From what I've read the Fulcio certs can have validity measured in minutes. To me, it would make more sense to publish a CSR on the CTLog for X days, allow an identity owner to request rejection within that time period (by proving identity), and finally issue the cert if no one objects. It would require longer lived certificates, but it would let me monitor the log to make sure no one is requesting certs for any of my identities. If I see something unexpected it probably means I have a compromised account and at least I get X days to remediate the damage and request the CSR get rejected (assuming I regain control of my account).

If you have something you can audit, you get warned about damage in progress. If you have something you can preempt you can prevent the damage all together. I'd pick the latter if given the option.

I also want to get a better understanding of the OIDC tokens and how they're issued.

> When running something like cosign sign, users will complete an OIDC flow and authenticate via an identity provider (GitHub, Google, etc.) to prove they are the owner of their account.

I bet that doesn't happen every time an artifact needs to be signed. People are going to want to automate signing via CI and, I assume (maybe badly), that's going to mean long(er)-lived OIDC auth tokens authorizing short-lived signing certs. If that's right, I worry about token hijacking.

But there's a solution for that!

> Similarly, automated systems (like GitHub Actions) can use Workload Identity or SPIFFE Verifiable Identity Documents (SVIDs) to authenticate themselves via OIDC.

That sounds like another extra party that can act on my behalf and adds even more complexity. I need to read more about how it works, but common sense would say that if artifacts can be signed without me taking any action, they can be signed without my consent, right?

It seems like a lot of complexity to me, but I've never used it, so I guess I'll have to give it a try. Usually a lot of things that seem illogical on the surface start to make more sense with a better understanding of how everything works, so hopefully I'm just lacking perspective for now and it ends up being an amazing system that topples the current code signing industry (which I really dislike).

1. https://docs.sigstore.dev/security/#what-sigstore-doesnt-gua...


So how are Python packages signed? Are they just shipping rando code without any sort of E2E assurance?

FWIW, Ruby also did a piss-poor job of handling gem signing by making it both difficult and optional.

How fucking hard is it to get to the level of code release assurance as Debian or Fedora? Manage GPG keys, signfest them, and enforce a policy.


PGP is a solution in search of a problem. We have given it decades for it to be useful and it turns out that it is an enormous security failure. It needed to go.

Sigstore [0] on the other hand makes more sense to use instead of problem.

[0] https://www.sigstore.dev


This reads like an advertisement. I routinely use GPG, and it is useful for me. It's not perfect (far from perfect, really), but it's a solution for multiple of my problems.

I don't know much about the solution you promote, but as usual with many "PGP killers" it replaces one very specific application of PGP and ignores all the others. Which is ok! Doing one thing and doing it well is the Unix philosophy after all. But it's not something I have use for, and it's not a viable replacement for GPG.


If doing one thing and doing it well is the core of the Unix philosophy, PGP is (cryptographically) the antithesis of that. It's a Swiss Army Knife that does none of its tasks well by modern standards.


I'll let my boss know we must stop signing our releases and having our software automatically check if the new version is legit then.

We will instead switch to use some thing with a fluffy corporate website that tells absolutely nothing.


Trust on first use is absolutely a valid use of PGP signatures that is being used in many real world systems (ask me how I know). You finding that PGP isn't being used they way you think it should does not justify removing it without providing a replacement.

Why on earth wasn't the community asked before you implemented this change?

> Given all of this, the continued support of uploading PGP signatures to PyPI is no longer defensible. While it doesn't represent a massive operational burden to continue to support it, it does require any new features that touch the storage of files to be made aware of and capable of handling these PGP signatures, which is a non zero cost on the maintainers and contributors of PyPI.

This uninformed reasoning is what's indefensible.


What an amazing opportunity for someone to add a new way of integrating PGP authentication by writing two short scripts:

One to compile a list of file hashes and PGP-sign them.

One to validate these hashes against the provided signatures.


Replace a 31% effective solution with no solution? very impressive




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: