Why does APT not use HTTPS?

hannob · on Jan 21, 2019

I guess I can copy over a comment I made when this previously made rounds:

I have a few problems with this. The short summary of these claims is “APT checks signatures, therefore downloads for APT don’t need to be HTTPS”.

The whole argument relies on the idea that APT is the only client that will ever download content from these hosts. This is however not true. Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors. At the very least Debian should make sure that there are a few HTTPS mirrors that they use for the direct download links.

Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.

Finally the chapter about CAs and TLS is - sorry - baseless fearmongering. Yeah, there are problems with CAs, but deducing from that that “HTTPS provides little-to-no protection against a targeted attack on your distribution’s mirror network” is, to put it mildly, nonsense. Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.

omarforgotpwd · on Jan 21, 2019

The other big problem is that people can see what you're downloading. Might not be a big deal but consider:

1. You're in China and you download some VPN software over APT. A seemingly innocuous call to package server is now a clear violation of Chinese law.

2. Even in the US, can leak all kinds of information about your work habits, what you're working on, etc.

3. If it's running on a server, it could leak what vulnerable software you have installed or what versions of various packages you're running to make exploiting known vulnerabilities easier

jlarocco · on Jan 21, 2019

Even with HTTPS, it's easy enough to figure out which package is being downloaded based on the download size, as explained in TFA.

Every time this comes up, it's always the same handful of incorrect arguments made in favor of HTTPS.

The cargo-cult mentality that HTTPS == security really does more damage than it does good.

zaptheimpaler · on Jan 21, 2019

I mean deducing the package from download size is way harder than just seeing the name in the open.. security is rarely perfect and more like an arms race, making things more difficult is a big deal. This kind of "you can hack Y too" argument doesn't make any sense if its way harder to hack Y than X.

jlarocco · on Jan 21, 2019

I would agree with you IF it were actually "way harder", but realistically it's no more difficult than checking the file name or the md5 checksum.

gtsteve · on Jan 22, 2019

What if keepalive is used and multiple packages are downloaded? Surely that is at least plausible deniability.

wallunit · on Jan 21, 2019

Is that even true? In practice rather than downloading a single package you'd download/update a bunch of packages over the same connection, and an attacker would only see the accumulated size, right?

jlarocco · on Jan 21, 2019

You can see when you run "apt-get install ..." or "apt-get upgrade" that it opens multiple connections to download packages...

And the Debian contributor who wrote TFA says it's possible, and I'm sure he knows a lot more about it than I do.

pvorb · on Jan 21, 2019

I'm not sure how APT handles connections, but with a typical browser connections will be reused if requests are made shortly after another.

That doesn't mean it's impossible to determine what packages you downloaded. But it will be more effort to do so.

dcbadacd · on Jan 22, 2019

Less with HTTP2/TLSv1.3, endpoints are also hard to detect with ESNI.

AstralStorm · on Jan 22, 2019

No they aren't. HTTPS fingerprinting is easy. It's been done by lcamtuf years ago and it's available as a layer 7 filter in Linux... TLS adds more information because it prevents proxies and has specific server implementations.

loeg · on Jan 21, 2019

This is addressed in the "privacy" section. Basically, your premise is wrong. TLS does not provide additionally privacy for this usecase. The short summary is: the size of transmitted data makes inferring what you've downloaded from a public file mirror trivial for a passive observer, even with TLS.

hopler · on Jan 22, 2019

Perhaps apt should pad its downloads

loeg · on Jan 22, 2019

Padding doesn't solve the privacy problem without unacceptable tradeoffs (i.e., pad all packages to the same size).

sattoshi · on Jan 22, 2019

Pad all packages to the nearest in a list of sizes?

loeg · on Jan 22, 2019

How many bits of privacy are you willing to give up here? Debian only has about 48,000 packages. That's almost 16 bits, total, with perfect privacy — all packages enlarged to the same size.

You can select a list of sizes trading off collisions (2 packages with same size => 1 bit of privacy; 4 => 2 bits, etc). But the most you ever get is (nearly) 16. The amount of padding you need to even get 2 bits of privacy (giving up almost 14) on the long tail of large packages is going to be "a lot" and it grows as you want more bits.

shasheene · on Jan 22, 2019

Some further pros and cons of padding is discussed elsewhere in the thread.

Providing privacy for the packages that, including dependencies, are less than 100MB in size is something that's probably worth doing. The cost of padding an apt-get process to the nearest say, 100MB, is not necessarily infeasible as far as bandwidth goes.

Instead of padding individual files, how about a means to arbitrarily download some number of bytes from an infinite stream? That would appear to be sufficient to prevent file size analysis (but probably not timing attacks).

Exposing something like /dev/random via a symbolic link and allowing the apt-get client to close the stream after the total transfer reaches 100MB would appear to make it harder to infer packages based on the transferred bytes, without being very difficult to roll out.

JdeBP · on Jan 22, 2019

This has already been discussed in this very discussion.

* https://news.ycombinator.com/item?id=18959470

dcbadacd · on Jan 22, 2019

It's a big step from "just `cat` the traffic to know" to "start comparing file-sizes and hope no sizes match" in terms of privacy.

JdeBP · on Jan 22, 2019

This has already been discussed in this very discussion.

* https://news.ycombinator.com/item?id=18958679

* https://news.ycombinator.com/item?id=18962084

* https://news.ycombinator.com/item?id=18961014

* https://news.ycombinator.com/item?id=18960255

loeg · on Jan 22, 2019

I'd say you're flat out wrong — there is no big step. We'll have to agree to disagree, I guess.

omarforgotpwd · on Jan 21, 2019

Oh, another thing I just thought of: It could leak what vulnerable software you have installed / what version you're running to make exploits easier.

NicoJuicy · on Jan 21, 2019

Here is why Videolan (VLC) doesn't do it: https://www.beauzee.fr/2017/07/04/videolan-and-https/

There were some topics about it yesterday.

Eg. https://news.ycombinator.com/item?id=18948195

Some arguments of the blog post are also valid here

hacknat · on Jan 21, 2019

Also, their claim that HTTPS doesn’t hide the hosts that you are visiting is about to not be true. Encrypted SNI is now part of the TLS 1.3 RFC, so HTTPS will actually hide one’s internet browsing habits quite well. The only holes in privacy left on the web’s stack are in DNS.

toast0 · on Jan 21, 2019

Can you point out where encrypted SNI is in the RFC? I've read the RFC, and I don't recall it being in there. I do see that there is an extension published, which I haven't reviewed in depth.

From a breif review, I see two potential issues:

a) the encrypted sni record contains a digest of the public key structure. This digest is transmitted in the clear (as it must be at this phase of the protocol), so a determined attacker could create a database of values for the top N mirror sites.

b) in order to be useful, the private key for the public keys would need to be shared across all servers supporting that hostname. That's not a big deal for a normal deployment, but it's not great for a volunteer mirrors system -- lots of diverse organizations own and operate the individual mirrors and we need to count on all of those to keep it secure. Also, it adds an extra layer of key management, which is an organizational and operational burden.

tialaramex · on Jan 21, 2019

Yeah, your parent is wrong about it being in the RFC. ESNI is something that they decided wasn't possible and ruled out of scope for TLS 1.3 but then somebody had a brainwave and Rescorla plus some people at Cloudflare wrote IDs and did live fire testing. The drafts are maybe at the "this is the rough shape of a thing" stage, more than ambitions but not a basis on which to announce specific plans.

It's also pointless without DPRIVE. If people can see all your DNS lookups they can guess exactly what you're up to. That's why that Firefox build did both eSNI and DoH

topranks · on Jan 23, 2019

Yeah it’s not part of the TLS 1.3 spec (RFC8446).

It’s being added, but early stages. Internet draft is here:

https://www.ietf.org/id/draft-ietf-tls-esni-02.txt

ec109685 · on Jan 21, 2019

For sni to be effective, you need to be hitting hosts that serve more than one domain. Otherwise, you can just reverse map the ip.

y4mi · on Jan 21, 2019

Most of my severs don't report the actual service dns back on a reverse lookup. It's generally nodeXX.some.fqdn or clusterXX or lbXX

shittyadmin · on Jan 21, 2019

Doesn't really matter either way - there's a bunch of crawlers and scanners out there such that you can pretty much Google any IP and find a list of sites that are hosted on it.

dcbadacd · on Jan 22, 2019

Fun fact, most of those break if you just close the connection when the client doesn't support SNI.

shittyadmin · on Jan 22, 2019

Not quote the ones I was referring to - many services just look at DNS and get the A record for every domain then offer reverse lookup - complete lists of domains are purchasable for all major TLDs. The only defense to this would be to host your content on a subdomain.

DNSlytics, DomainTools, W3Advisor and others offer this.

Ayesh · on Jan 22, 2019

It's also easy to just scan a whole range (say, top 1m Alexa domains) and log the IP. You can scan 1m sites on a cheap $5 VPS nowadays.

jedberg · on Jan 21, 2019

Yes but if I want to specifically look for traffic to the Debian mirrors, I can use DNS to build a list of the IPs and then see if you're connecting to one of them.

pbhjpbhj · on Jan 21, 2019

Most services report what they are, and even the server often, when you connect. If you connect to an IP and it's serving a website then I don't know why you'd care that reverse-lookup isn't configured correctly, you're not hiding anything?

admax88q · on Jan 21, 2019

Its feasible to build a reverse lookup table of all registered domain names.

zamadatix · on Jan 21, 2019

I bet DoHTTPS/DoTLS has more penetration than ESNI. Neither are widely implemented though, which is the important part.

da_chicken · on Jan 21, 2019

Yeah, but not being implemented now doesn't mean that they never will be. Migrating/implementing now will mean that the mirrors will automatically support it once the features are in place and the clients and servers can agree on the improved privacy feature sets.

progval · on Jan 21, 2019

By host, they probably mean the IP address.

jdamato · on Jan 21, 2019

Thanks for commenting this! I've seen this website before and it's really unfortunate how much attention it gets.

APT's use of plain text HTTP (even with GPG) is vulnerable to several attacks outlined in this paper: https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.p....

Yes, this paper is old, but APT is still vulnerable to most of these attacks. I would advise anyone wanting to use APT to do so only with TLS.

loeg · on Jan 21, 2019

The criticisms in that paper either do not apply to Apt as described in TFA or amount to DoS attacks. HTTPS does not and can not solve DoS.

Spooky23 · on Jan 21, 2019

For a mirror based system like apt it is incredibly trivial. The integrity of a package depends on dozens of organizations and their practices.

When I was a 19 year old idiot, I was responsible for a mirror server. As a bad actor, I could easily get access to a valid organizational cert.

sanxiyn · on Jan 21, 2019

> When I was a 19 year old idiot, I was responsible for a mirror server.

Me too! It was even ftp.kr.debian.org! It still is!

Seriously, people, who do you think has root of official Debian mirror servers hosted by universities? University students. Who are 19 years old. This is literally true.

Ayesh · on Jan 22, 2019

In my country, the ccTLD registry is run by a university. While the professors have done an excellent job, the NIC itself was hacked a few times back, there is no admin UI (call a 19 year old kid and set your nameservers with NATA phonetics), and they still have some non functional root nameservers.

simias · on Jan 21, 2019

>Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors.

That's a reasonable complaint. I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.

>Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.

I do do that. If you care about such an attack vector why wouldn't you? And if you don't, why should Debian care for you? There are plenty of mirrors for Debian installers, I often get the mine from bittorent, trusting a PGP sig makes much more sense than relying on HTTPS for that IMO.

>Finally the chapter about CAs and TLS is - sorry - baseless fearmongering.

I don't think it's baseless (we have a long list of shady CAs and I'm sure may government agencies can easily generate forged certificates) but it's rather off-topic. Their main argument is that the trust model of HTTPs doesn't make sense for APT, if that's true then whether or not HTTPS is potentially hackable is irrelevant.

driverdan · on Jan 21, 2019

> If you care about such an attack vector why wouldn't you? And if you don't, why should Debian care for you?

Security should be as automatic as possible. It should be assumed that any step that requires manual intervention will be skipped by most people.

simias · on Jan 21, 2019

But the point the page makes is that HTTPS wouldn't be good enough anyway. As such it's not a replacement for checking the PGP signature. I think it's consistent.

If HTTPS could be used to replace PGP signature checks then I'd agree with you but it's not the case. So I go back to my initial point, if you worry about your image being tampered with HTTPS is not enough. If you don't care then you don't care either way.

In a way not using HTTP is kind of an implicit disclaimer on Debian's part. "Don't trust what you get from this website". If they feel like they can't guarantee the security of whatever server is hosting the CD images adding HTTPS might actually be a bad thing because people who might otherwise have checked the signature may think "well, it's over HTTPS, it's good enough".

antris · on Jan 21, 2019

>It should be assumed that any step that requires manual intervention will be skipped by most people.

Indeed.

1. If you don't care about security, it still doesn't hurt to have HTTPS. Think of it as "extra" that you get for free.

2. If you care about security, you might still don't have the know-how to make sure everything is secure and don't have time to get into it as you're trying to get things done.

3. Even if you care about security AND have the know-how, you might still forget. Nobody's perfect. So it's good that the HTTPS is there.

wolco · on Jan 21, 2019

Nothing is for free, https has additional costs over http. In many cases it makes sense to pay those costs but let's not forget about them.

bigiain · on Jan 21, 2019

To be fair to the apt developers/maintainers - the security _is_ automatic when using their tool to talk to their repos.

It's not their responsibility to automate security for people using their repos via different tools.

If the solution was just "install certbot on the server and use a free https cert" then perhaps you could make an argument saying maybe they should just do it. But when the problem space includes aggressively using a global (largely volunteer) mirror network and supporting local caching proxies, I can completely understand why they'd say "Nope. Not our problem, not our responsibility to provide a solution. We've got other more productive ways to spend our and our mirror volunteers time and effort".

Conan_Kudo · on Jan 21, 2019

> I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.

Actually, debs are not signed in Debian or Ubuntu. The accepted practice in Debian is that only repository metadata is signed.

The argument is that the design of Debian packages (as in, the package format) makes it difficult to reproducibly validate a deb, add a signature, strip the signature, and validate it again.

Personally, I'm not sure I buy it, as we don't have problems signing both RPMs and RPM repository metadata. Technically, yes, the RPM file format is structured differently to make this easier ('rpm' is a type of 'cpio' archive), but Debian packages are fundamentally 'ar' archives with tarballs inside, and those aren't hard to do similar things. For reproducible builds, Koji (Fedora's build system) and OBS (Open Build Service, openSUSE's build system) are able to strip and add signatures in a binary-predictable way for RPMs.

Fedora goes the extra step of shipping checksums via metalink for the metadata files before they are fetched to ensure they weren't tampered before processing. But even with all that, RPMs _are_ signed so that they can be independently verified with purely the usage of 'rpm(8)'.

oefrha · on Jan 21, 2019

> And if you don't, why should Debian care for you?

Those who don’t go out of their way to defend themselves don’t deserve security — seriously, that’s your attitude? Then you don’t deserve to be in any position to make security decisions for other people.

0xdeadbeefbabe · on Jan 21, 2019

Security isn't on or off. If you think it is then "you don't deserve [...]" It seems like reasonable trade off.

emilfihlman · on Jan 21, 2019

>Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.

This is simply not true. Governments can simply compel your CA to do as they want. Not to mention that "uncovered later" is pretty damn worthless.

That said, I do agree that there should be HTTPS mirrors.

electrum · on Jan 22, 2019

> Not to mention that "uncovered later" is pretty damn worthless.

It is not worthless as a deterrent to the CA. Proof of a fraudulently issued certificate are grounds to permanently distrust the CA. So, yes, they can do it, but hopefully only once.

pbhjpbhj · on Jan 21, 2019

Can't Debian devs just hookup their own server as a CA? No need to trust anyone outside the Debian universe.

In fact, is the FOSS community running it's own trust network still, it used to be a thing at LUGs.

notyourday · on Jan 21, 2019

The justifications for why APT does not use HTTPS ( by default, it is possible to add https transport ) are just mind blowing. It is however not at all surprising considering how broken Debians secure package distribution methodology is -- I'm saying it as someone who had to implement workarounds for it for a company that was willing to spend significant amount of resources on making it work.

Here's are some low level gems:

1. I have installed package X. I want to validate that the files that are listed in a manifest for package X have not changed on a host.

APT answer: Handwave! This is not a valid question. If you are asking this question you already lost.

2. I want to have more than one version of a package in a distribution.

APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!

3. I want to have a complicated policy where some packages are signed, some are not signed, and some are signed with specific keys.

APT answer: Handwave! You should have all or nothing policy! Nothing policy, actually, because we mostly just sign collections, rather than the individual packages

https://wiki.debian.org/SecureApt

geezerjay · on Jan 21, 2019

> 2. I want to have more than one version of a package in a distribution.

> APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!

This is not true at all. If you need to distribute multiple versions of the same package then all you need to do is provide multiple versions of the same package. Just get your act straight, learn how to package software, create your packages so that you can deploy them simultaneously without breaking downstream software, and you're set.

drdaeman · on Jan 21, 2019

AFAIK, dpkg doesn't allow to have multiple versions of the same package installed at the same time.

This is not apt's limitation, though.

geezerjay · on Jan 22, 2019

> AFAIK, dpkg doesn't allow to have multiple versions of the same package installed at the same time.

That's true, but multiple versions of the same package is not the same as multiple versions of the same software. Debian and Ubuntu have access to multiple minor version releases of GCC and they can all coexist side by side.

UncleEntity · on Jan 22, 2019

Too lazy to get up from my comfy easy chair to boot my debian running NAS but 99.9% sure there's more than one version of python on that thing.

notyourday · on Jan 22, 2019

No, those are <somepackage>-<someversion> where <somepackage> is different.

Conan_Kudo · on Jan 22, 2019

Correct. Multi-version packages are not supported by dpkg. They are supported by rpm in limited circumstances (no file conflicts allowed!). Fedora and openSUSE kernel packages work this way, as an example.

The way Debian works around this is by doing "<name>-<version>" as the package name. This is a valid approach, though it makes package name discovery a bit more difficult at times...

As for multiple versions of packages in the repo, "createrepo"/"createrepo_c" (for RPM repositories) does not care.

And for Debian, "dpkg-scanpackages" can be made to not care too, using the "--multiversion" switch: https://www.mankier.com/1/dpkg-scanpackages#--multiversion

This is somewhat at your peril, as I've observed APT getting confused when it parses metadata from repositories produced by dpkg-scanpackages that allows multiple versions in the repository.

However, reprepro does not support this at all, so most deployments with semi-large Debian repositories will not have this option available to them anyway.

geezerjay · on Jan 22, 2019

> No, those are <somepackage>-<someversion> where <somepackage> is different.

It seems there is a significant semantics gap between how deb packages work and what's supposed to be a package version.

In deb packages, package versions are lexicograhically ordered descriptions of a version ID that is used to guide autoupgrades.

If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.

notyourday · on Jan 22, 2019

> If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.

Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.

If you have a repo with <packagename>-<packageversion> and add a package <packagename>-<packageversion1> where packageversion1 is higher than the package version, the previous version gets deleted from the repo.

Can it be worked around? Sure, you can:

a) have multiple repos. If you want to keep up to a hundred versions, you can just create a hundred repos.

b) you can redefine the meaning of a package name and incorporate the version into the name of the package.

(b) Sounds like a good solution. Except if the org decided to use something like .deb for distribution of artifacts it is probably the org that uses other software. Let's say it uses puppet, which supports Debian package management out of the box, except that now you need to change how puppet uses version numbers because as we started embedding the version name into the package name "nginx-1.99.22" and "nginx-1.99.23" became two different packages not two different versions of the same package.

This goes on and on.

geezerjay · on Jan 22, 2019

> Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.

That's the semantic gap I've mentioned.

Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .

Take, for example, GCC. Debian provides official deb packages for multiple major and minor release versions of GCC, and they are quite able to coexist in the very same system.

All it takes for someone to build deb packages for multiple versions of a software package is to get to know deb, build their packages so that they can coexist, and set their package and version names accordingly.

notyourday · on Jan 22, 2019

> Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .

Through multiple repos. Not through one repo. That's why testing packages live in a separate repo. That's why security packages live in a separate repo. That's why updates live in a separate repo.

If you are using careful versioning on packaging avoiding the <packagename>-<version> as the convention you are breaking other tools, including the tools that are distributed with Debian. Puppets' package { "nginx": ensure => installed } will not work if you decided to allow nginx to have multiple versions using "nginx-1.99" as a special package name.

geezerjay · on Jan 23, 2019

> Through multiple repos.

Wrong. The official debian package repository hosts projects which provide multiple major and even minor versions of the same software package to be installed independently and to coexist side-by-side.

Seriously, you should get to know debian and its packaging system before making any assertion about them. You can simply browse debian's package list and search for software packages such as GCC to quickly acknowledge that your assertion is simply wrong.

notyourday · on Jan 23, 2019

It is not. You are talking about a hack/workaround/different methodology. That hack cannot be natively integrated with the other software that uses a regular idea of what is a package name and what is a package version (for example, puppet, which ships with Debian). It is a garbage approach just like the APT approach of using a repo manifest for signatures is garbage, and just like approach of not storing crypto hashes of every file in a .deb inside the .deb is garbage.

I have provided the challenge in another reply:

I have in a repo:

package: nginx version: 1.99.77

I need to add to the repo:

package: nginx version: 1.99.78

Both of the versions must remain in the repo. Packages must be signed. Repo must be signed. The name of the package cannot change and neither can the version number. Both of the packages must be installable using a flag that specifies a version number passed to one of the standard Debian package install tools (the tool must be in "main" collection )

What is a tool that can be used that is listed https://wiki.debian.org/DebianRepository/Setup as existing in the "main" part of the Debian? For half a point, you can use any tool listed in the wiki.

Everything else is the hand waving.

notyourday · on Jan 22, 2019

repropro has that limitation.

notyourday · on Jan 22, 2019

Been there. Done that. It does not work.

I want to have

nginx-1.99.77 nginx-1.99.78 nginx-1.99.78ehjkhk-1

in the same repo. It is not possible.

glennpratt · on Jan 22, 2019

Unless I'm missing something, this is 100% possible. Now, if you want to carefully control the version constraints of these, welcome to pinning hell, but that's a different problem.

notyourday · on Jan 22, 2019

I'm always willing to see how to do it. The repo currently contains

<thispackage>-<this-version>-<this-patchlevel>.architecture

The package name is <thispackage>

I need to add to the repo <thispackage>-<this-version>-<that-patchlevel>.architecture

Packages are signed and repo will be signed. At the end I should be able to install the package using the select the specific version flag without adding another repo.

glennpratt · on Jan 24, 2019

This is possible, if you are having issues, it is likely you apt repo tooling, not apt itself.

I use aptly and rely on this everyday.

JdeBP · on Jan 22, 2019

Rubbish. I have every version of the nosh toolset packages from version 1.22 to 1.39 side by side in a single Debian repository.

notyourday · on Jan 22, 2019

Which tool?

https://wiki.debian.org/DebianRepository/Setup#dak_.28Debian...

JdeBP · on Jan 23, 2019

A listing of all of these packages side by side, along with information on how it is built that includes the very scripts used to do so (including constructing that listing), is one of the value-added bonuses in the GOPHER version of the repository:

* gopher://jdebp.info:70/1/Repository/debian/dists/stable/

* https://news.ycombinator.com/item?id=14837740

notyourday · on Jan 23, 2019

Ok, so just to make it sure I understand:

1) It is not listed in the debian wiki

2) To access the "methods" i need to install a gopher client

That's pretty much a definition of the "handwave! Workaround!"

P.S. I have implemented a workaround. It works. It is just not a solution. I should not need to assign people to resign the packages with our own keys and create utilities that would duplicate Debian tools to provide a nominal access to regular required functionality by a large organization that has packages with hundreds of dependencies and could have 5-20 versions of the apps in different production/test/validation/qa environments. Joe Random Engineer expects that his knowledge of how apt-get works, how dpkg works and how puppet works to be portable.

JdeBP · on Jan 23, 2019

You clearly do not understand, especially given that gibberish about not being listed in some wiki, and utter irrelevancies about dpkg and puppet, which have no bearing upon the matter of publishing one Debian repository with multiple versions of packages at all.

I'm just not going to laboriously re-type it all into Hacker News when you can just go to the published repository and see it all explicitly laid out right there in the repository itself, with the exact scripts and commands that get run to produce the very repository that you are seeing -- quite the opposite of either handwaving or workarounds.

You could get to it with an FTP or an HTTP client, too. But that would just leave you with the raw files in no particular order, rather than the annotated and organized GOPHER listings. A value-added bonus for the GOPHER version, as I said.

You don't actually have any justification at all for your claim that this is somehow impossible with Debian repositories, given that people like me are doing it with a few simple scripts and even publishing them for the world to see; and clearly neither handwaving nor workarounds for anything are required.

notyourday · on Jan 23, 2019

First of all, it is the only thing that is relevant. Repo needs to be functional using tools provided by the distribution that uses that repo format. No one cares that one can download tarballs and compile them -- this is not 1997. In 2019 it is "Enter this command and get this result". Feed this result into the orchestration/management framework. Find a bug? Fix the bug. The rest of the system will continue to function.

Second of all, you still have not provided a link to the doc that someone can read without installing a gopher client. Come on, you said you already have it!

kragen · on Jan 22, 2019

> 1. I have installed package X. I want to validate that the files that are listed in a manifest for package X have not changed on a host.

not going to argue with the rest of your flame (others already did) but the answer to this one is

    apt install debsums

notyourday · on Jan 22, 2019

This is based around the workflow where one first generates the sums. Notice the -g flag. It is a dumb workaround that I should not need to implement.

xcaaa · on Jan 21, 2019

How is it not trivial if rogue organizations like "Staat der Nederlanden" have default certificates in the browser?

Hint: The Netherlands is world leader in surveillance of its own citizens and inhabitants.

kstrauser · on Jan 21, 2019

TLS instead of GPG signatures might be a bad idea, but adding TLS to the transport of signed packages can't make the pipeline less secure.

admax88q · on Jan 21, 2019

Unless it misleads people in to thinking they don't have to check signatures because they fetched it over HTTPS which is "secure"

bigiain · on Jan 22, 2019

Also, as the article posts out - it's not exactly trivial to deploy https across their global mirror network or to make it work with local caching proxies. That's an easy thing if you've got a handful of servers or a few load balancers, but not so easy or practical for their use case.

(Also, remember most of the apt development had already happened way before free ssl certs became a thing. While saying "Why don't then just use certbot/LetEncrypt is an easy criticism, give them credit for having actually build a GPG sig secured distributed software delivery system years before LetEncrypt existed...)

vegardx · on Jan 21, 2019

Because they would be caught doing it very quickly. There are so many ways to detect this.

shittyadmin · on Jan 21, 2019

And ultimately none of those ways to detect this aren't useful against a sufficiently targeted attack with direct access to the signing key.

It's one thing if you sign a bad key for Google.com, publish it in CT logs and then put it up on the public internet - it's quite another if you sign a bad key for midsizecompany.com, keep it out of the CT system and use it only in a targeted attack against non-technical individuals who are unlikely to examine a certificate or use things like Certificate Watch.

With that said, I still believe serving it over HTTPS would be a substantial improvement. Perhaps pin the cert or at least the CA out of the box to prevent such attacks.

Boulth · on Jan 21, 2019

CT monitors are primarily for site operators not end users. When site operators spot a rouge cert issuance they can clarify that and ultimately get the cert revoked.

shittyadmin · on Jan 21, 2019

Precisely my point - it's not something an end user would notice. And if it doesn't even appear in CT logs or for the end user it would likely go completely unnoticed.

There really aren't "so many ways to detect this" - there's about 3: the user examines the certificate, CT logs catch it later and detection in browsers of major changes on the most high profile sites. Anything falling outside of those will almost certainly go unnoticed.

tialaramex · on Jan 21, 2019

When you (a CA, but also just anybody who finds a new one) log a new certificate or a "pre-certificate" (which is essentially a signed document equivalent to the final certificate but not usable as a certificate) the log gives you a receipt, a Signed Certificate Timestamp, saying it commits to publish a consistent log containing this certificate within a period of time (today 24 hours).

The SCT proves that this particular log saw a signed document with these specific contents, at this specific moment.

Chrome (for a long while now), Safari (announced for early 2019) and Firefox (announced but a bit vague on when) check SCTs for publicly trusted certificates.

The browser can look at the SCT and verify that:

* It was signed by a log this browser trusts

* It matches the contents of the leaf certificate (DNS names, dates, keys, etcetera: Distinguished Encoding means there is only one correct way to write any certificate so there can't be any ambiguity)

* It has an acceptable timestamp (not too old, in some cases not too new)

It can also contemplate the set of SCTs and decide if they meet further criteria e.g. Google requires at least one Google log and at least one non-Google log.

If any of these is wrong, the site doesn't work and an appropriate error message occurs, no user effort is needed or useful here.

We know this works because Google managed to do it to themselves by accident once already, blocking Chrome access to a new Google site for - I think it was several hours - because their dedicated in-house certificate group screwed up and didn't log a new certificate.

There is more to do to defend the system completely:

1. You could compel a log operator to emit an SCT, but then not actually log your certificate. Certificate Transparency can detect this, a browser would need to remember SCTs it has seen and periodically ask a log for a proof which allows it to verify that the log really has included this certificate.

2. You could go further and compel the log to bifurcate, showing some clients a history with your certificate in, and others a parallel history without that certificate. This can only be detected using what is called "gossip" in which observers of the log have a way to discuss their knowledge of the log state and find out systematically if there are inconsistencies.

Once both these things are in place, there's basically no way around just admitting what do you did. Which of course doesn't mean any negative consequences for you, but it does make _deniability_ (much desired by outfits like the NSA and Mossad) hard to achieve if that's something you care about.

monocasa · on Jan 21, 2019

A state actor using a rogue cert is MitMing the connection in a.very targeted way, and the site operator wouldn't ever see it.

cb100 · on Jan 22, 2019

This is the most reasonable reply, and of course it is at the bottom. I wonder if accurate technical commentary will ever make it to the top here.

TorKlingberg · on Jan 21, 2019

The arguments are correct. APT does not need HTTPS to be secure. That said, if APT was designed today I'm sure it would use HTTPS. It's now the default things to do, and Let's Encrypt makes it free and easy.

However Debian, where APT is from, relies on the goodwill of various universities and companies to host their packages for free. I can see that they don't want to make demands on a service they get for free, when HTTPS isn't even necessary for the use case.

Also since APT and Debian was created in the pre-universal HTTPS days, it does things like map something.debian.org to various mirrors owned by different parties. That makes certificate handling complicated.

arghwhat · on Jan 21, 2019

It does not need HTTPS to be secure, but it would need HTTPS to add privacy, for which the protocol has none at the current time.

brazzy · on Jan 21, 2019

As explained on the website, HTTPS would not add meaningful privacy at all, because without significant other changes the architecture, what you're doing is still downloading files from a very limited set. The size of the files in most cases is unique so that an onlloker can tell what you downloaded, encrypted or not.

DCKing · on Jan 21, 2019

I find this argument not very convincing. Suppose an attacker wants to track people downloading stuff over APT. This is what they would need to do:

In case of HTTP - Step 1: Read the HTTP request payload. Step 2: There is no step 2.

In case of HTTPS - Step 1: Build an index of all possible packages and their sizes. Step 2: Reassemble HTTPS response traffic into individual HTTP responses. Step 3: Look up the response length to the corresponding package. Step 4: In case of identical file sizes, make some sort of model to find out which packet it looks to be based on other packages downloaded (?).

Yes, it's still possible to track people's packages all the same. But you have to have a have way more determined and prepared attacker - it cannot be as easily be done through casual eavesdropping. It's a false equivocation to say it would not add meaningful privacy, as your attacker model changes from casual eavesdroppers to more determined attackers.

You might not care for that particular distinction, and I agree people should have the choice to use HTTP or FTP for APT when selecting mirrors. Unencrypted APT is plenty secure, but encrypted APT is really a little better. In my opinion, there should not be so much resistance for HTTPS in default configurations (e.g. the Debian project could easily require this for their official mirrors around the world). Let's Encrypt makes this so easy, there's no argument anymore in my opinion.

JdeBP · on Jan 21, 2019

That's because you are glossing over a lot of packet inspection deeper than the TCP level with a glib "Read the HTTP request".

You might want to think about existing surveillance systems. Analysis of telephone traffic is often done purely on the CDR (the caller, callee, and length of call, in simple terms) without the equivalent of deep packet inspection to read the HTTP request, which would be analysis of the actual audio data themselves. The HTTPS case would likewise need just the total octets transferred over the TCP connection for fingerprinting.

There's a lot of glib handwaving in this discussion about identical sizes, not based upon actual measurements of the Debian archive. I quickly looked at the package cache in one of my Debian machines:

    jdebp% ls -l|awk 'x[$5]++'
    -rw-r--r-- 1 root root      3314 Feb 16  2018 nosh-run-freedesktop-system-bus_1.37_amd64.deb
    -rw-r--r-- 1 root root     35190 Dec 14  2016 redo_1.3_amd64.deb
    -rw-r--r-- 1 root root   1114546 Feb 25  2018 udev_232-25+deb9u2_amd64.deb
    jdebp %

It turns out that in practice size alone almost does uniquely identify package in this sample. The other file that is 35190 bytes is version 1.2 of the same package, leaving just 2 possible ambiguities out of 847 packages. It seems likely that this holds after encryption as well.

So the remaining question is how much HTTP pipelining ameliorates this, which no-one here has yet actually analysed.

DCKing · on Jan 21, 2019

I'm not sure I understand your reply. Are you replying to me as if I said that determined attackers cannot trace back HTTPS traffic to individual APT packages? Because I said no such thing.

I just made the distinction between casual eavesdroppers and determined attackers. Those determined attackers exist and are quite capable, I'm sure. I said as much in my post.

You might also want to look into your use of the word 'glib' here. I find it an uncharitable interpretation of my post to call it 'glib' or 'glib handwaving', to be honest. Makes it seem to me as if I should be defending something I said, but I'm not sure what.

ddevault · on Jan 21, 2019

There's really no difference between a "casual" eavesdropper and a "serious" one. In what world do you live in where the former camp even exists? No one is casually spying on your apt updates, and anyone who is "seriously" spying on your apt updates can trivially manage to identify them by size. HTTPs really doesn't add anything here.

DCKing · on Jan 21, 2019

Of course no-one is spying casually on HTTP APT traffic specifically. Nobody is arguing that strawman - nobody here is "living in that world", give me some credit please.

But people spying casually on HTTP traffic in general do exist. People able to spy on HTTP traffic in general casually is one of the main reasons we care about HTTPS in the first place. Even though people can do a targeted content length analysis for nearly all other the stuff we read/watch/download online, too. We still care about HTTPS for all of that. And we should probably care for that with APT too, if only a little bit.

Fnoord · on Jan 21, 2019

TL;DR HTTPS gives you potentially more confidentiality but not guaranteed as known vulnerability exists which an advanced attacker can exploit. You should not assume confidentiality when using APT over HTTPS. The severity of this issue in a CVSS is going to be very low because it is only an information leak.

metafunctor · on Jan 21, 2019

What?

ISPs, for example, eavesdrop us all the time, and they do it quite casually. They will modify your unprotected HTTP requests, inject ads, log everything they are able to, and sell the data if they can.

0xbadcafebee · on Jan 21, 2019

Wouldn't help. If almost all of the files' sizes are unique (I'm pretty sure compressed package sizes aren't even block-aligned), and you know the sizes ahead of time, and you can make test requests for samples of what headers are being sent/received, it's trivial to calculate which combination of packages would result in a given stream length using pipelining. You'd have to add countermeasures like padding or fake data.

ynniv · on Jan 21, 2019

You could quantize these with up to 10% padding and cause a very large number of collisions, but that wouldn't be useful without HTTPS. Is the core argument that privacy is not attainable, or that it is not valuable?

brazzy · on Jan 21, 2019

I think the core argument first of all is that it is not attainable trivially by "just using HTTPS". So it's a question of costs vs. benefits, where the costs are pretty big (change the whole infrastructure).

ynniv · on Jan 21, 2019

This is wrong. A great deal more privacy is attainable by trivially using HTTPS. Privacy in the presence of stream inspection is more difficult, but attainable by padding files to have quantized lengths.

Godel_unicode · on Jan 21, 2019

You are entirely incorrect in this instance, for all the reasons laid out in this thread.

Looking at stream sizes is not "advanced deep packet inspection", it's in fact the opposite of that.

loeg · on Jan 21, 2019

Up to 10% padding likely isn't sufficient to provide privacy. Think of the long tail of large packages.

ynniv · on Jan 22, 2019

Quantizing up to 10% collapses 43,000 values into 120. It's not perfect but it removes a huge amount of information.

loeg · on Jan 22, 2019

How do you arrive at 120? I guess that makes sense if the biggest packages is within 92709x (1.1^120) the size of the smallest. But that doesn't seem like enough range, just eyeballing it. If you have a 1kB package at the low end, I'd be surprised if Debian didn't have a package bigger than 92 MB.

Presenting it as 120 from 43,000 is a bit of an oversimplification, because the average isn't meaningful. The long tail is going to have the worst privacy and the small packages will (probably) have the most.

A scheme like this might be workable but requires being really careful about the security properties you're claiming (i.e., of those 120, probably half are unique, large packages). And obviously, this scheme requires up to 10% additional bandwidth, in the case of the chosen 10% threshold. If buckets change over time, packages moving between buckets may leak a lot of information.

jerf · on Jan 21, 2019

"Step 1: ... etc."

This sounds less like some sort of massively impossible barrier to overcome and more like a Project Euler problem, and one not all that far into the sequence, either.

One of the things you have to overcome if you want to think like a security person is that, yes, there are attackers that will put some effort into attacking you if you are a target of any consequence, certainly effort far exceeding what you just described. I've watched some people at the company I work for have to overcome that handicap myself. Yes, there are attackers that are not just script kiddies and actually, like, have skills and such.

Attackers won't jump through infinite hoops, but getting a foothold on a network somewhere where they'd like more access, seeing that they can watch a new system in your network getting provisioned, and cross-checking that against a list of known vulnerabilities by looking at package sizes would be boringly mundane for them, not something wildly exotic.

fps_doug · on Jan 21, 2019

There are two reasons you want to compromise a host:

1) You're building a botnet (or, these days, are crypto mining). In that case you're not targeting a specific machine, you just want many of them.

2) You want to exfiltrate information from a host, or sabotage it. In that case you're targeting a specific machine.

I'd argue that in both cases, the proposed attack vector of inferring installed software versions through apt downloads is inferior, or at least more involved. In case 1) you're better off scanning for known vulnerabilities or make use of shodan and the likes. In case 2) you're probably going to probe the server anyways. It might take a little more time than if you just had a complete list of installed packages and their version (given you were somehow able to eavesdrop on the host in the first place), but you'll most likely determine at least what OS is running and what technology stack their internet-facing services are running on after some nmapping.

Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.

DCKing · on Jan 21, 2019

> Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.

This is exactly my position, to be fair. We're all bike-shedding here as far as I'm concerned - including this very website. I think the position that HTTPS doesn't help you is a little bit disingenuous, and the only fair position is that "coordinating this stuff takes time and effort we don't feel is worth the negligible advantages" (as you say, and as this website says) is a more acceptable argument than "the negligible advantages don't exist" (as this website seems to also want to say).

DoctorOetker · on Jan 21, 2019

A: Assuming package privacy could somehow be protected (a question I leave to part B), then I agree that there is no improvement in security against capable attackers: they would have to focus on hacking the APT servers, on which the attack surface can be minimized, and monitoring and logging any deviations can be tracked and published for all to inspect.

B: if the packages are concatenated to each other and a random length noise string, we can substantially frustrate nation state / ISP level attackers to the point of forcing them to get this information from the endpoints themselves: Either end user, or the APT server must already have been compromised.

1) End user not yet compromised: in order to capture these, they must attack the APT server.

2) End user already compromised: on each update of all compromised users, information is sent to C&C, so this would produce lots of opportunity for attentive users to discover the implant.

When focussing on the APT, which would give the cleanest record of attack surfaces, the community can put man power on designing minimalist APT servers, and inspecting published deviations in communications can lead to uncovering 0days.

EDIT: changed disagree to agree, as I (incorrectly) thought you were arguing it would not make attacks more expensive, woops!

Operyl · on Jan 21, 2019

Any self signed certificate? Then the attacker who is already capable of intercepting packets can just serve whatever and proxy your request. Your argument to the main argument falls short just the same..

DCKing · on Jan 21, 2019

Agreed, went a bit overboard there. Removed the part about self-signed certificates.

geofft · on Jan 21, 2019

Steps 1-4 are very easy; they just require some dev work. Thinking an attacker won't bother because it sounds annoying to implement is security by obscurity.

dane-pgp · on Jan 21, 2019

No, it's security against a specific class of attacker that your threat model is considering.

We know there are nation states that build profiles of each user based on their HTTP requests, but we don't know of any that have written custom software specifically to target Debian users.

geofft · on Jan 21, 2019

It would take them half an hour to write that custom software. Under what threat model is "nation-states that can't spend half an hour writing code" a meaningful adversary? Note in particular that they can identify users by previous traffic statistics (and just normal traffic flow statistics that might be kept as a routine log by any network administrator, even); they don't need to write the code in advance. Given the number of bytes transferred, the times, and an archive of the state of the Debian archive (which is publicly available), they can always identify past downloads.

(Would it help if I wrote that code right now and put it on GitHub?)

dane-pgp · on Jan 21, 2019

> It would take them half an hour to write that custom software.

It would take a software engineer half an hour to write that custom software. It would take a government years to amass the political will to target such a small section of the population, and then potentially hundreds of thousands of dollars for a government contractor to offer a solution and implement it.

There's always going to be a big difference in threat level between a piece of software which already exists and a piece of software that could exist. For example, when you're snatched off the streets by the secret police, and they go to investigate what you've been doing in the country, they might be able to request from HQ a list of HTTP addresses fetched from the IP address associated with your apartment, but they're unlikely to be able to request that HQ write some software to go back retrospectively and count bytes of individual connections you made.

> (Would it help if I wrote that code right now and put it on GitHub?)

No, but it would help if you wrote a patch for APT which made it use HTTP range requests to hide the size of the files it downloads. That should only take half an hour, right?

geofft · on Jan 21, 2019

I continue to be confused by this threat model where all government agencies you're worried about are plagued by massive levels of US-style governmental bureaucracy and can't get anything done, yet they're capable of being meaningful threats. (Also where the only entities you're worried about are government entities.)

dane-pgp · on Jan 22, 2019

The entities I'm worried about have bought off-the-shelf surveillance tools which record the HTTP requests associated with each IP/MAC address. This is a minimum viable product for governments and ISPs (not to mention businesses, like hotels and coffee shops), and it is reasonable to think that such a system is deployed on orders of magnitude more networks than a system that tries to infer Debian package downloads from counting bytes of HTTPS traffic.

rlpb · on Jan 21, 2019

> Read the HTTP request payload

The thing to note here is that the only reason this seems easy is that there is tooling readily available for such a task. If you didn't have such tooling, you'd find it more difficult to implement then your HTTPS case even _with_ tooling.

The same principle applies to your HTTPS case. Your argument disappears as soon as there is tooling. That tooling only needs to be written once. Perhaps it already has been done and exists in the circles where people want to surveil apt users. One possible reason such tooling isn't widely available is that apt doesn't use HTTPS by default, and one outcome may be that if apt switched to HTTPS the tooling would appear.

I have half a mind to write the tooling and publish it just to eliminate this argument. It really isn't very difficult.

skywhopper · on Jan 21, 2019

There's a big difference, privacy-wise, between being able to say "X is talking to this debian mirror and thus probably running debian" and "X is downloading exactly these packages from this debian mirror".

xorcist · on Jan 21, 2019

The point being made is that the fact that X just talked Y bytes to this Debian mirror is enough to know exactly which packages were downloaded.

The argument that https obfuscates which packages you download is not a good one, and may cause users to unnecessarily worry about the implications (and conversely, that they end up "more safe" if that was not the case). If that type of privacy is desirable you should probably use something like Tor.

kstrauser · on Jan 21, 2019

When I say `apt-get install foo` and it brings in 47 other packages, the problem gets exponentially more difficult. "He downloaded 1,432,509,104 bytes; what packages were those?" is more or less O(2^n): https://en.wikipedia.org/wiki/Knapsack_problem

If you download exactly one package, it may be easy to deduce which one it was (assuming that the protocol overhead is identical each time, and that changing timestamps and nonces doesn't affect the byte length whatsoever, etc.). If you download more than one at a time, which is common with Debian, then the problem is a whole lot harder.

q3k · on Jan 21, 2019

An HTTPS stream can be side-channel (ie size, time) broken down into black-box HTTP/1 requests quite easily. Remember, even with Connection: Keep-alive, you still have to request every file synchronously after you're done with the previous one.

loeg · on Jan 21, 2019

> Remember, even with Connection: Keep-alive, you still have to request every file synchronously after you're done with the previous one.

Wait, why? I don't know of any reason HTTP clients can't pipeline requests.

AstralStorm · on Jan 22, 2019

They can but they do not because servers are broken (and terribly so) and it brings less than zero gain for big files.

See, debian does not want to manage specific mirror server features if they don't have to. If they were in that position they'd make their own protocol.

loeg · on Jan 22, 2019

Right, that makes sense. So it's not so much a client limitation as a minimum common denominator for their hosting infrastructure.

DoctorOetker · on Jan 21, 2019

its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.

It would be more fruitful to discuss the different ways an attacker might deduce what software was installed from a naive implementation: download sizes, download date (i.e. new update available for package P, then a substantial fraction of users who were downloading from the server that day were probably installing P etc...

In theory an onion router might substantially improve the situation if the attacker has a hard time identifying which server the user is talking to, and thus making it hard to identify if the user is even installing anything at all...

sadly I don't trust TOR as long as I can't exclude a specific attack scenario I have always suspected about TOR but never actually known to be present...

brazzy · on Jan 21, 2019

> its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.

Sure, but then you are not talking about "just use HTTPS", you're talking about creating your own protocol and requiring all APT packet sources to speak that protocol, requiring a specialized server software, where currently they can just use whatever HTTP server they want. Switching the whole infrastructure and installed base over to that would be a massive multi-year project, not just a handful code and configuration changes.

DoctorOetker · on Jan 21, 2019

Note I use the phrase "naive implementation" and was never talking about nor clamoring for "just use HTTPS".

The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors? can we virtually force those to attack the ATP servers themselves? could we use oblivious transfer ? could we design a fresh minimalist onion router (as opposed to bloated TOR) for package distribution?

marcosdumay · on Jan 21, 2019

> can we design a package distribution system that preserves privacy against nation state level actors?

That's a great question. Also, can we do it on top of the current Debian infrastructure (HTTP and everything)? Or do we need to change anything?

brazzy · on Jan 21, 2019

> Also, can we do it on top of the current Debian infrastructure (HTTP and everything)?

I'm pretty sure that is not possible, because the current infrastructure is just plain old HTTP file servers (anything that can sling bits will do) run by whoever fancies being a part of it.

marcosdumay · on Jan 21, 2019

If you create an onion layer like the GP proposes, I don't see any loss from it running over HTTP servers.

xorcist · on Jan 21, 2019

It already exists. It's called tor.

But the question here was why https isn't default in apt, not why tor isn't.

DoctorOetker · on Jan 22, 2019

you're replying downstream of my comment containing "for me the real interesting question is:..." where I generalized the question away from the false dichotomy "keep apt as it is, or make https default in apt"

brazzy · on Jan 21, 2019

> The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors?

That is certainly an intersting theoretical question, but in practice there is also the question of the costs of something that requires you to change and complicate the whole distributed infrastructure vs. the benefits - are there actually real people who need privacy against "nation state level actors" specifically concerning the Linux packages they install?

DoctorOetker · on Jan 21, 2019

what about journalists, whistleblowers, strictly vocal critics of the centralized surveillance state [0], lawyers, politicians, ... ?

[0] https://news.ycombinator.com/item?id=16947652

EDIT: just adding, we also don't know what the cost is of the most efficient privacy presevering distribution method actually is. only when people investigate and try will we find out.

mr__y · on Jan 21, 2019

> variable length noise Just thinking... could this be achieved by simply adding some extra response headers on the mirror side? That is the response could contain some headers like

   X-Unnecessary-Header-0: [random-string]
   X-Unnecessary-Header-1: [other-random-string]
   ...

This could be enough to insert random-length noise without the need to invent any new protocol. Of course this would only be effective for smaller packages as I assume that headers size is limited and as a result this would significantly change the perceived download size only for smaller packages.

DoctorOetker · on Jan 21, 2019

that might work, and the larger packages might be sent as a sequence of parts, so that essentially all packages have the same size

other things to keep in mind is the dependency graph, some large packages might be rather unique in total filesize downloaded

fps_doug · on Jan 21, 2019

The post you replied to outlined a way to determine which packages were downloaded for the HTTPS way, so I don't get what you're trying to say here.

metafunctor · on Jan 21, 2019

Preventing script kiddies or dickhead ISPs logging your traffic is meaningful privacy.

That is, in fact, the main threat everyone should protect against.

Privacy against script kiddies and dickhead ISP is valuable. Not every scenario is a determined attacker targeting you personally.

sbrazzy · on Jan 21, 2019

This can easily be solved by inflating the wire size of the file but the received download, transmitting junk packets, that fail to decode, but still look like encrypted noise to an eavesdropper.

anticensor · on Jan 21, 2019

Not everyone uses fast unlimited internet.

ghurui · on Jan 21, 2019

Pretty much everyone using Debian flavored Linux is.

Ajedi32 · on Jan 21, 2019

That doesn't sound like an argument against it to me. It just means that in addition to HTTPS, they also need to make "significant other changes the architecture".

brazzy · on Jan 21, 2019

...which would be a huge undertaking given the vast installed base and the fact that packet sources currently don't run any custom server software at all, which would need to change.

Ajedi32 · on Jan 21, 2019

I'm not so sure about that. Assuming all the packages are downloaded over a single connection, you can easily pad the response client-side via HTTP range requests, for example. No special server software required; just a normal HTTP server.

Arnt · on Jan 21, 2019

You're downloading form a third-party volunteer. What protocol can provide privacy given that condition?

arghwhat · on Jan 21, 2019

There is a huge difference between the third-party knowing what you're doing, and everyone in between knowing what you're doing.

The HTTPS everywhere movements are attempting to make privacy the default rather than the exception, but are of course done knowing that the server will always know whats up. The point is to make it so that only the two parties concerned, the server and the client, comprehend the communication rather than the entire world

Arnt · on Jan 21, 2019

Okay, suppose I want to know what packages you're installing and updating. I have two routes:

1. I gain access to a router near you.

2. I rent a sizable server at the hetzner/ovh/… location that's closest to you and volunteer to run a mirror for the OS you're using.

Both are somewhat uncertain (your traffic might flow via a different route, you might be load-balanced to a different mirror) but the uncertainty seems comparable. Option 2 seems so much easier that I have a real problem seeing the point of even attempting option 1 if all I want is the information option 2 would give. Perhaps someone can explain?

arghwhat · on Jan 21, 2019

You'd have to take control over my mirror. Making a new mirror will not get you any traffic unless people actively choose to use it. Therefore, your option 1. is to control a part of the public route, and option 2. is to control the mirror of my choice.

However, any argument against using encryption for privacy for APT can equally be applied to any other traffic. Do you trust your public internet route enough to let your traffic run authenticated, but unencrypted? Chats, news, bank statements, software updates?

Even if content cannot be modified, it can still be blocked or made public. There are quite a few nosy governments that would like to know or block certain types of content, software packages included.

toast0 · on Jan 21, 2019

Lots of people use the nice hostnames like ftp.us.debian.org. It wouldn't be too hard to get included in that host name if you're determined. I haven't looked into the requirements, but I'm pretty sure it's a) be technically competent enough to run a mirror (it's not hard) b) have a lot of bandwidth. c) be organizationally competent enough to convince Debian that a and b will hold for a long time.

AstralStorm · on Jan 21, 2019

You will instantly spot blocking because apt reports connection failure and hash failure and replay attack on update list is ineffective.

As for privacy, eh? It is visible you're connecting to a debian mirror and what size of update list you're getting. Barring that, indexing packages by size is trivial.

You want true privacy, you'd have to use Tor or such.

ric2b · on Jan 22, 2019

Option 1 is hard for individuals but easy for state actors.

For example, they might want to know what versions you're running (by looking at what updates you _didn't_ download) so they can target you or lots of people at once.

Xylakant · on Jan 21, 2019

Since it's known to which host you connect and the pattern of access is known to, it's a reasonable guess that it would be possible to infer the list of packages that you're downloading from observing the encrypted traffic.

rhacker · on Jan 21, 2019

Instead of JUST that third party knowing what (possibly vulnerable) packages you have, you are now letting everyone know what possibly vulnerable packages you have.

altfredd · on Jan 21, 2019

> you are now letting everyone know what possibly vulnerable packages you have.

Erm, what?

The number of people, who can listen to (much less — modify) your traffic is very small. It is basically your ISP (who is supposed to offers you services in good faith, not spy on you) and a number of engineers, who maintain Internet backbone. That's far from "everyone". Some SSL evangelists make it sound like everyone's traffic is permanently broadcasted to everyone else in the world, but it is not.

As for "vulnerable packages", the most certain sign, that someone does not install security updates, is lack of traffic between them and update servers. But that's orthogonal to use of encryption.

DoctorOetker · on Jan 21, 2019

"everyone" in this context obviously means everyone on the path, and any attackers that have compromised nodes along the path. See the Belgacom hack by 5 eyes...

monocasa · on Jan 21, 2019

And I those attackers can still tell everything based on the target IP and the payload sizes.

DoctorOetker · on Jan 21, 2019

there is no proof that this is true in general, so it is worth trying to find 1) an inefficient way in order to 2) postulate an efficient way...

for example, overlay onion routing, size blurring by appending random length random bits,... with oblivious transfer even the APT-server does not know what you downloaded (but that would require a large amount of information..., nevertheless oblivious transfer might still be a useful tool when used as a primitive, perhaps just to send a list of bootstrap addresses for p2p hosting of the signed files etc...)

taf2 · on Jan 21, 2019

Couldn’t this be used to identify possible hosts that are running exploited software. So you watch a target and keep track of their installed packages. You also monitor for zero day exploits. The instance you have identified a zero day you also have a list of high probability targets to test it out on or exploit. Privacy is more then just I know you have blue eyes or brown eyes

arp242 · on Jan 21, 2019

Genuine question: what are you doing with APT that requires privacy?

dangerface · on Jan 21, 2019

In the past cryptographic export was illegal in the us and many other countries, it can still get you put on a list.

In some countries messaging apps like signal and telegram are illegal.

There is no telling what seemingly benign software will be made illegal in the future for political reasons.

Privacy is always a requirement because of these reasons.

AstralStorm · on Jan 21, 2019

Size fingerprinting defeats such "privacy". If you need this, use Tor instead.

seqastian · on Jan 21, 2019

Cause pushing the barrier of how easy it is to detect is a bad thing.

zaphar · on Jan 21, 2019

This is very asymmetric though. The cost for an attacker is very low. The cost for Debian and it's mirror network is very high.

Given that the cost of implementation is high and the protection is minimal the decision to not do so is reasonable.

Twirrim · on Jan 21, 2019

> The cost for Debian and it's mirror network is very high.

I'm curious how high it actually is. They say it's high, but that could well just be hand-waving. Sure, prior to things like LetsEncrypt those SSL certs would have been a notable financial burden. There's also some extra cost on infrastructure covering the cryptographic workload, but increasingly the processors in servers are capable of handling that without any notable effort.

toast0 · on Jan 21, 2019

Certificate cost is trivial. Let's encrypt makes it free, but with a small change to the host names (country code.ftp.debian.org instead of ftp.countrycode.debian.org), all mirrors could have been covered with a single certificate. Some CAs will let you buy one wildcard cert and issue unlimited duplicate certificates with the same name. So, that would cost some money, but probably not too much.

The real costs are organizational and technical.

Organizing all the different volunteers who are running the mirrors to get certificates installed and updated and configured properly is work. Maybe let's encrypt automation helps here.

From a technical perspective, assuming mirrors get any appreciable traffic, adding https adds significantly to the CPU required to provide service. TLS handshaking is pretty expensive, and it adds to the cost of bulk transfer as well.

I get the feeling that alot of the volunteer mirrors are running on oldish hardware that happens to have a big enough disk and a nice 10G ethernet. I've run a bulk http download service that enabled https, and after that our dual xeon 2690 (v1) systems ran out of CPU instead of out of bandwidth. CPUs newer than 2012 (Sandy Bridge) do better with TLS tasks, but mirrors might not be running a dual CPU system either.

Ayesh · on Jan 22, 2019

Old hardware will eventually die and needs replacing. I run infrastructure for a CDN setup, and we actually _reeuced_ the CPU overhead with TLS 1.3 + HTTP/2.

jowsie · on Jan 21, 2019

When someone says the cost is high, most people jump to monetary issues. The cost is in the time and effort required to make the changes, and to have those changes synchronised across every single APT mirror.

skywhopper · on Jan 21, 2019

None of your business.

But! As mentioned above, outside entities being able to monitor exactly which versions of which packages are being installed to which hosts is a significant security risk.

gnulinux · on Jan 21, 2019

This sort of comments aren't helpful. Switching to HTTPS will require tremendous amount of work from volunteers. You need to convince me that (1) your usecase exists (2) your usecase can be remedied with HTTPS.

arghwhat · on Jan 21, 2019

Nothing requires privacy. However, the HTTPS movement is about making privacy the default, not the exception.

black-tea · on Jan 21, 2019

The HTTPS movement is about ensuring that only Google gets your data. It's not about privacy.

crankylinuxuser · on Jan 21, 2019

I don't quite follow. Please explain why Google gets the data.

astrodust · on Jan 21, 2019

Presumably Google Analytics which can embed on HTTPS pages.

pasta · on Jan 21, 2019

Which is already blocked by 40% of the users via ad-blockers.

Edit: some report even over 70% that block GA.

black-tea · on Jan 21, 2019

Using an adblocker is not part of HTTPS.

amaccuish · on Jan 21, 2019

If I was a particular government, I could block apt requests for the tor package e.g.

jermy · on Jan 21, 2019

We deploy our software packages to our own infrastructure and clients using a private APT repository and basic HTTP auth. Obviously we're running it with apt-transport-https installed for making the latter not completely insecure.

I see no reason to do that for signed packages from the main repositories, however.

dingaling · on Jan 21, 2019

For a theoretical example, hiding the fact that I have installed encryption or steganography from my government.

switch007 · on Jan 21, 2019

Genuinely: it's 2019: shouldn't privacy be the default?

disiplus · on Jan 21, 2019

yes but privacy is a whole another thing that is maybe not worth it. with mirrors and so on getting https to properly work is not trivial. sure it would be nice.

arghwhat · on Jan 21, 2019

HTTPS is really quite trivial, especially with the advent of letsencrypt. This is especially true for simple package protocols like APT, where a repository is simply a dumb HTTP server coupled with a bunch of shell scripts that update the content.

Assuming that we consider SSH-ing into a server a negligible effort, then adding HTTPS to a APT repository or mirror is also a negligible effort.

As for whether privacy is worth it: Absolutely, especially in this day and age. There is very rarely a cost too high when it comes to privacy, and in this instance, it comes for free.

AstralStorm · on Jan 22, 2019

The problem is, HTTPS is not designed for privacy in any meaningful term.

1) TLS session negotiation leaks all sorts of useful data about both systems, not to mention TCP and IP stack on which it sits. This data is grabbed in 5 minutes with an existing firewall filter. Combined with IP, it shows the exact machine and web browser (incl. Apt version) downloading the file in many cases.

2) It does nothing to prevent time, host and transfer size fingerprinting.

3) Let's Encrypt helps with deployment but you get rotating automated server certificates. It is reasonably easy to obtain a fake Let's Encrypt certificate so without pinning it is worthless for authentication, pinning a rotating certificate is hard too.

Debian does not have resources to handle impostor mirrors.

disiplus · on Jan 22, 2019

it's not trivial if we are talking about Linux boxes serving as servers let's encrypt has a good chance to not work out of the box, and especially with older boxes. and then there i are other things like needing a http server for obtaining the cert rotating it, distributing it. and you loose the ability to use a proxy, and so on. with https you are still not protected with them knowi g where you get only what you did there.

it would be great to have the ability to have https but for APT in its current form and for what it is used the cost benefit for adding https is not that compelling to me.

stefan_ · on Jan 21, 2019

Why would it? As the site lays out, HTTPS is strictly inferior to the scheme used by APT.

You can not get more security by adding a less secure mechanism to a better one. It's not additive.

julian-klode · on Jan 21, 2019

Of course you'd get more security.

Analogy: It's like your hanging on a rope and also add a safety net. If the rope breaks, you only fall on the safety net instead of the ground.

All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.

kiallmacinnes · on Jan 21, 2019

That's not always true though, and it can be really hard to tell if a particular combination is going to make things better or worse. e.g. the "Breach" HTTPS+Compression vulnerability.

To continue your analogy: The rope gets tangled in the safety net, forcing you to jump or proceed up with a loose rope, because you can no longer move the rope..

marcosdumay · on Jan 21, 2019

This is not the physical world.

Digital security strength is measured on orders of magnitude, and two mechanisms providing security with very different orders of magnitude do not add in any practical sense.

julian-klode · on Jan 22, 2019

If you've seen today's bug yesterday, or carefully looked at the previous CVEs, you would have seen that https would have significantly reduced the probability of exploitability.

diffeomorphism · on Jan 21, 2019

> All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.

I am unconvinced about the last part. More commonly an exploit in either will cause security to fail, so adding more steps just adds more attack surface and leads to less security.

DoctorOetker · on Jan 21, 2019

It depends on the threat model, here TM1 and TM2:

TM1: attacker does not posses zerodays to installed software

TM2: attacker possesses speccific (perhaps OS, perhaps library, perhaps userland) zerodays, usage of which (including unsuccesful attempts) should be minimized to avoid detection

in TM1: it's ok to use HTTP in the clear, as long as signatures are verified

in TM2: everything should be fetched over encrypted HTTPS, since HTTP would leak information about available attack surface

EDIT: not only would this increase security by not revealing what a user installs (perhaps download some noise as well such that it becomes harder to detect what a user is installing?), it could also improve security by turning the APT servers into honeypots, so that monitoring these can reveal zerodays...

AstralStorm · on Jan 22, 2019

TM3: attacker has a 0-day against a complex https server and they replace packages on some mirrors.

TM4: attacker can impersonate a server using Lets Encrypt certificate and bypass their automated verification, creating a fake mirror or a bunch. (HTTP has same vector.) They can also make DNS fail or reroute.

TM5: attacker has a 0-day against the more complex https client (e.g. curl).

TM6: Attacker fingerprints network connections to given servers by size and os or os + tls fingerprinting data

DoctorOetker · on Jan 22, 2019

TM3 and TM5 are specific subcases for TM2, and turn the APT server / client into a honeypot

TM6: we should have an overlay onion router, and I agree that the current complexity is worrying, I'd love to see a minimalist version of TOR (with minimal I don't necessarily mean the code size should be small, but minimal assumptions, and that the safety of the system can be verified from the assumptions)

TM4: I don't understand, Lets Encrypt does not calculate private keys for public keys...

mtgx · on Jan 21, 2019

> if APT was designed today I'm sure it would use HTTPS.

Or it would use the much more modern and more secure Noise, like I think QUIC will end up using through nQUIC:

https://dl.acm.org/citation.cfm?id=3284854

https://noiseprotocol.org