Now that you have removed GPG ASC signature upload support, is there any way for...

woodruffw · on May 23, 2023

> Why do we use GPG ASC signatures instead of just a checksum over the same channel?

Could you elaborate on what you mean by this? PyPI computes and supplies a digest for every uploaded distribution, so you can already cross-check integrity for any hosted distribution.

GPG was nominally meant to provide authenticity for distributions, but it never really served this purpose. That's why it's being removed.

westurner · on May 23, 2023

> Why do we use GPG ASC signatures instead of just a checksum over the same channel?

You can include an md5sum or a sha512sum string next to the URL that the package is downloaded from (for users to optionally check after downloading a package); but if that checksum string is uploaded over the same channel (HTTPS/TLS w/ a CA cert bundle) as the package, the checksum string could have been MITM'd/tampered with, too. A cryptographically-signed checksum can be verified once the pubkey is retrieved over a different channel (GPG: HKP is HTTPS/TLS with cert pinning IIRC), and a MITM would have to spend a lot of money to forge that digital publisher signature.

Twine COULD/SHOULD download uploads to check the PyPI TUF signature, which could/should be shipped as a const in twine?

And then Twine should check publisher signatures against which trusted map of package names to trusted keys?

westurner · on May 23, 2023

1) the server signs what's uploaded using one or more TUF keys shared to RAM on every pypi upload server.

2) the client uploads a cryptographic signature (made using their own key) along with the package, and the corresponding public key is trusted to upload for that package name, and the client retrieves said public key and verifies the downloaded package's cryptographic signature before installing.

FWIU, 1 (PyPI signs uploads with TUF) was implemented, but 2 (users sign their own packages before uploading the signed package and signature, (and then 1)) was never implemented?

woodruffw · on May 23, 2023

Your understanding is a little off: we worked on integrating TUF into PyPI for a while, but ran into some scalability/distributability issues with the reference implementation. It's been a few years, but my recollection was that the reference implementation assumed a lot of local filesystem state, which wasn't compatible with Warehouse's deployment (no local state other than tempfiles, everything in object storage).

To the best of my knowledge, the current state of TUF for PyPI is that we performed a trusted setup ceremony for the TUF roots[1], but that no signatures were ever produced from those roots.

For the time being, we're looking at solutions that have less operational overhead: Sigstore[2] is the main one, and it uses TUF under the hood to provide the root of trust.

[1]: https://www.youtube.com/watch?v=jjAq7S49eow&t=1s

[2]: https://www.sigstore.dev/

trishankdatadog · on May 23, 2023

python-tuf [1] back then assumed that everything was manipulated locally, yes, but a lot has changed since then: you can now read/write metadata entirely in memory, and integrate with different key management backend systems such as GCP.

More importantly, I should point out that while Sigstore's Fulcio will help with key management (think of it as a managed GPG, if you will), it will not help with securely mapping software projects to their respective OIDC identities. Without this, how will verifiers know in a secure yet scalable way which Fulcio keys _should_ be used? Otherwise, we would then be back to the GPG PKI problem with its web of trust.

This is where PEP 480 [2] can help: you can use TUF (especially after TAP 18 [3]) to do this secure mapping. Marina Moore has also written a proposal called Transparent TUF [4] for having Sigstore manage such a TUF repository for registries like PyPI. This is not to mention the other benefits that TUF can give you (e.g., protection from freeze, rollback, and mix-and-match attacks). We should definitely continue discussing this sometime.

[1] https://github.com/theupdateframework/python-tuf

[2] https://peps.python.org/pep-0480/

[3] https://github.com/theupdateframework/taps/blob/master/tap18...

[4] https://docs.google.com/document/d/1WPOXLMV1ASQryTRZJbdg3wWR...

westurner · on May 24, 2023

> [4] [TUFT: Transparent TUFT] : https://docs.google.com/document/d/1WPOXLMV1ASQryTRZJbdg3wWR...

W3C ReSpec: https://github.com/w3c/respec/wiki

blockcerts-verifier (JS): https://github.com/blockchain-certificates/blockcerts-verifi...

blockchain-certificates/cert-verifier (Python): https://github.com/blockchain-certificates/cert-verifier

https://news.ycombinator.com/item?id=35896445 :

> Can SubtleCrypto accelerate any of the W3C Verifiable Credential Data Integrity 1.0 APIs? vc-data-integrity: https://w3c.github.io/vc-data-integrity/ ctrl-f "signature suite"

>> ISSUE: Avoid signature format proliferation by using text-based suite value The pattern that Data Integrity Signatures use presently leads to a proliferation in signature types and JSON-LD Contexts. This proliferation can be avoided without any loss of the security characteristics of tightly binding a cryptography suite version to one or more acceptable public keys. The following signature suites are currently being contemplated: eddsa-2022, nist-ecdsa-2022, koblitz-ecdsa-2022, rsa-2022, pgp-2022, bbs-2022, eascdsa-2022, ibsa-2022, and jws-2022.

westurner · on May 23, 2023

https://github.com/theupdateframework/taps/blob/master/tap18... :

> TUF "targets" roles may delegate to Fulcio identities instead of private keys, and these identities (and the corresponding certificates) may be used for verification.

s/fulcio/W3C DID/g may have advantages, or is there already a way to use W3C DID Decentralized Identifiers to keep track of key material in RDFS properties of a DID class?

westurner · on May 23, 2023

What command(s) do I pass to pip/twine/build_pyproject.toml to build, upload, and install a package with a key/cert that users should trust for e.g. psf/requests?

westurner · on May 23, 2023

Where does the user specify the cryptographic key to sign a package before uploading?

Serverside TUF keys are implemented FWICS, but clientside digital publisher signatures (like e.g. MS Windows .exe's have had when you view the "Properties" of the file for many years now) are not yet implemented.

Hopefully I'm just out of date.

woodruffw · on May 23, 2023

> Where does the user specify the cryptographic key to sign a package before uploading?

With Sigstore, they perform an OIDC flow against an identity provider: that results in a verifiable identity credential, which is then bound to a short-lived (~15m) signing key that produces the signatures. That signing key is simultaneously attested through a traditional X.509 PKI (it gets a certificate, that certificate is uploaded to an append-only transparency log, etc.).

So: in the normal flow, the user never directly specifies the cryptographic key -- the scheme ensures that they have a strong, ephemeral one generated for them on the spot (and only on their client device, in RAM). That key gets bound to their long-lived identity, so verifiers don't need to know which key they're verifying; they only need to determine whether they trust the identity itself (which can be an email address, a GitHub repository, etc.).

westurner · on May 23, 2023

What command(s) do I pass to pip/twine/build_pyproject.toml to build, upload, and install a package with a key/cert that users should trust for e.g. psf/requests?

ilyt · on May 23, 2023

Signature tells you who signed it.

Of course, if you haven't put any effort in system to end-to-end verify whether it's right signature it doesn't matter.

westurner · on May 23, 2023

pip checks that a given was signed with the pypi key but does not check for a signature from the publisher. And now there's no way to host any type of cryptographic signatures on pypi.

There is no e2e: pypi signs what's uploaded.

(Noting also that packages don't have to be encrypted in order to have cryptographic signatures; only the signature is encrypted, not the whole package)

ilyt · on May 23, 2023

Yeah the whole thing looks like throwing away baby with bathwater; the package should

* get a signature for author ("the actual author published it") + some metadata with list of valid signing keys (in case project have more authors or just for key rotation * get a signature for hosting provider that confirms "yes, that actual user logged in and uploaded the package" * (the hardest part) key management on client side so the user have to do least amount of work possible in when downloading/updating valid package.

If user doesn't want to go to effort to validate whether the public key of author is valid so be it but at very least system should alert on tampering with the provider (checking the hosting signature) or the author key changing (compromised credentials to the hosting provider).

It still doesn't prevent "the attacker steals key off author's machine" but that is by FAR the rarest case and could be pretty reasonably prevented by just using hardware tokens. Hell, fund them for key contributors.