I guess I can copy over a comment I made when this previously made rounds:
I have a few problems with this. The short summary of these claims is “APT checks signatures, therefore downloads for APT don’t need to be HTTPS”.
The whole argument relies on the idea that APT is the only client that will ever download content from these hosts. This is however not true. Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors. At the very least Debian should make sure that there are a few HTTPS mirrors that they use for the direct download links.
Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.
Finally the chapter about CAs and TLS is - sorry - baseless fearmongering. Yeah, there are problems with CAs, but deducing from that that “HTTPS provides little-to-no protection against a targeted attack on your distribution’s mirror network” is, to put it mildly, nonsense. Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.
The other big problem is that people can see what you're downloading. Might not be a big deal but consider:
1. You're in China and you download some VPN software over APT. A seemingly innocuous call to package server is now a clear violation of Chinese law.
2. Even in the US, can leak all kinds of information about your work habits, what you're working on, etc.
3. If it's running on a server, it could leak what vulnerable software you have installed or what versions of various packages you're running to make exploiting known vulnerabilities easier
I mean deducing the package from download size is way harder than just seeing the name in the open.. security is rarely perfect and more like an arms race, making things more difficult is a big deal. This kind of "you can hack Y too" argument doesn't make any sense if its way harder to hack Y than X.
Is that even true? In practice rather than downloading a single package you'd download/update a bunch of packages over the same connection, and an attacker would only see the accumulated size, right?
No they aren't. HTTPS fingerprinting is easy. It's been done by lcamtuf years ago and it's available as a layer 7 filter in Linux... TLS adds more information because it prevents proxies and has specific server implementations.
This is addressed in the "privacy" section. Basically, your premise is wrong. TLS does not provide additionally privacy for this usecase. The short summary is: the size of transmitted data makes inferring what you've downloaded from a public file mirror trivial for a passive observer, even with TLS.
How many bits of privacy are you willing to give up here? Debian only has about 48,000 packages. That's almost 16 bits, total, with perfect privacy — all packages enlarged to the same size.
You can select a list of sizes trading off collisions (2 packages with same size => 1 bit of privacy; 4 => 2 bits, etc). But the most you ever get is (nearly) 16. The amount of padding you need to even get 2 bits of privacy (giving up almost 14) on the long tail of large packages is going to be "a lot" and it grows as you want more bits.
Some further pros and cons of padding is discussed elsewhere in the thread.
Providing privacy for the packages that, including dependencies, are less than 100MB in size is something that's probably worth doing. The cost of padding an apt-get process to the nearest say, 100MB, is not necessarily infeasible as far as bandwidth goes.
Instead of padding individual files, how about a means to arbitrarily download some number of bytes from an infinite stream? That would appear to be sufficient to prevent file size analysis (but probably not timing attacks).
Exposing something like /dev/random via a symbolic link and allowing the apt-get client to close the stream after the total transfer reaches 100MB would appear to make it harder to infer packages based on the transferred bytes, without being very difficult to roll out.
Also, their claim that HTTPS doesn’t hide the hosts that you are visiting is about to not be true. Encrypted SNI is now part of the TLS 1.3 RFC, so HTTPS will actually hide one’s internet browsing habits quite well. The only holes in privacy left on the web’s stack are in DNS.
Can you point out where encrypted SNI is in the RFC? I've read the RFC, and I don't recall it being in there. I do see that there is an extension published, which I haven't reviewed in depth.
From a breif review, I see two potential issues:
a) the encrypted sni record contains a digest of the public key structure. This digest is transmitted in the clear (as it must be at this phase of the protocol), so a determined attacker could create a database of values for the top N mirror sites.
b) in order to be useful, the private key for the public keys would need to be shared across all servers supporting that hostname. That's not a big deal for a normal deployment, but it's not great for a volunteer mirrors system -- lots of diverse organizations own and operate the individual mirrors and we need to count on all of those to keep it secure. Also, it adds an extra layer of key management, which is an organizational and operational burden.
Yeah, your parent is wrong about it being in the RFC. ESNI is something that they decided wasn't possible and ruled out of scope for TLS 1.3 but then somebody had a brainwave and Rescorla plus some people at Cloudflare wrote IDs and did live fire testing. The drafts are maybe at the "this is the rough shape of a thing" stage, more than ambitions but not a basis on which to announce specific plans.
It's also pointless without DPRIVE. If people can see all your DNS lookups they can guess exactly what you're up to. That's why that Firefox build did both eSNI and DoH
Doesn't really matter either way - there's a bunch of crawlers and scanners out there such that you can pretty much Google any IP and find a list of sites that are hosted on it.
Not quote the ones I was referring to - many services just look at DNS and get the A record for every domain then offer reverse lookup - complete lists of domains are purchasable for all major TLDs. The only defense to this would be to host your content on a subdomain.
DNSlytics, DomainTools, W3Advisor and others offer this.
Yes but if I want to specifically look for traffic to the Debian mirrors, I can use DNS to build a list of the IPs and then see if you're connecting to one of them.
Most services report what they are, and even the server often, when you connect. If you connect to an IP and it's serving a website then I don't know why you'd care that reverse-lookup isn't configured correctly, you're not hiding anything?
Yeah, but not being implemented now doesn't mean that they never will be. Migrating/implementing now will mean that the mirrors will automatically support it once the features are in place and the clients and servers can agree on the improved privacy feature sets.
> When I was a 19 year old idiot, I was responsible for a mirror server.
Me too! It was even ftp.kr.debian.org! It still is!
Seriously, people, who do you think has root of official Debian mirror servers hosted by universities? University students. Who are 19 years old. This is literally true.
In my country, the ccTLD registry is run by a university. While the professors have done an excellent job, the NIC itself was hacked a few times back, there is no admin UI (call a 19 year old kid and set your nameservers with NATA phonetics), and they still have some non functional root nameservers.
>Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors.
That's a reasonable complaint. I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.
>Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.
I do do that. If you care about such an attack vector why wouldn't you? And if you don't, why should Debian care for you? There are plenty of mirrors for Debian installers, I often get the mine from bittorent, trusting a PGP sig makes much more sense than relying on HTTPS for that IMO.
>Finally the chapter about CAs and TLS is - sorry - baseless fearmongering.
I don't think it's baseless (we have a long list of shady CAs and I'm sure may government agencies can easily generate forged certificates) but it's rather off-topic. Their main argument is that the trust model of HTTPs doesn't make sense for APT, if that's true then whether or not HTTPS is potentially hackable is irrelevant.
But the point the page makes is that HTTPS wouldn't be good enough anyway. As such it's not a replacement for checking the PGP signature. I think it's consistent.
If HTTPS could be used to replace PGP signature checks then I'd agree with you but it's not the case. So I go back to my initial point, if you worry about your image being tampered with HTTPS is not enough. If you don't care then you don't care either way.
In a way not using HTTP is kind of an implicit disclaimer on Debian's part. "Don't trust what you get from this website". If they feel like they can't guarantee the security of whatever server is hosting the CD images adding HTTPS might actually be a bad thing because people who might otherwise have checked the signature may think "well, it's over HTTPS, it's good enough".
>It should be assumed that any step that requires manual intervention will be skipped by most people.
Indeed.
1. If you don't care about security, it still doesn't hurt to have HTTPS. Think of it as "extra" that you get for free.
2. If you care about security, you might still don't have the know-how to make sure everything is secure and don't have time to get into it as you're trying to get things done.
3. Even if you care about security AND have the know-how, you might still forget. Nobody's perfect. So it's good that the HTTPS is there.
To be fair to the apt developers/maintainers - the security _is_ automatic when using their tool to talk to their repos.
It's not their responsibility to automate security for people using their repos via different tools.
If the solution was just "install certbot on the server and use a free https cert" then perhaps you could make an argument saying maybe they should just do it. But when the problem space includes aggressively using a global (largely volunteer) mirror network and supporting local caching proxies, I can completely understand why they'd say "Nope. Not our problem, not our responsibility to provide a solution. We've got other more productive ways to spend our and our mirror volunteers time and effort".
> I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.
Actually, debs are not signed in Debian or Ubuntu. The accepted practice in Debian is that only repository metadata is signed.
The argument is that the design of Debian packages (as in, the package format) makes it difficult to reproducibly validate a deb, add a signature, strip the signature, and validate it again.
Personally, I'm not sure I buy it, as we don't have problems signing both RPMs and RPM repository metadata. Technically, yes, the RPM file format is structured differently to make this easier ('rpm' is a type of 'cpio' archive), but Debian packages are fundamentally 'ar' archives with tarballs inside, and those aren't hard to do similar things. For reproducible builds, Koji (Fedora's build system) and OBS (Open Build Service, openSUSE's build system) are able to strip and add signatures in a binary-predictable way for RPMs.
Fedora goes the extra step of shipping checksums via metalink for the metadata files before they are fetched to ensure they weren't tampered before processing. But even with all that, RPMs _are_ signed so that they can be independently verified with purely the usage of 'rpm(8)'.
> And if you don't, why should Debian care for you?
Those who don’t go out of their way to defend themselves don’t deserve security — seriously, that’s your attitude? Then you don’t deserve to be in any position to make security decisions for other people.
>Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.
This is simply not true. Governments can simply compel your CA to do as they want. Not to mention that "uncovered later" is pretty damn worthless.
That said, I do agree that there should be HTTPS mirrors.
> Not to mention that "uncovered later" is pretty damn worthless.
It is not worthless as a deterrent to the CA. Proof of a fraudulently issued certificate are grounds to permanently distrust the CA. So, yes, they can do it, but hopefully only once.
The justifications for why APT does not use HTTPS ( by default, it is possible to add https transport ) are just mind blowing. It is however not at all surprising considering how broken Debians secure package distribution methodology is -- I'm saying it as someone who had to implement workarounds for it for a company that was willing to spend significant amount of resources on making it work.
Here's are some low level gems:
1. I have installed package X. I want to validate that the files that are listed in a manifest for package X have not changed on a host.
APT answer: Handwave! This is not a valid question. If you are asking this question you already lost.
2. I want to have more than one version of a package in a distribution.
APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!
3. I want to have a complicated policy where some packages are signed, some are not signed, and some are signed with specific keys.
APT answer: Handwave! You should have all or nothing policy! Nothing policy, actually, because we mostly just sign collections, rather than the individual packages
> 2. I want to have more than one version of a package in a distribution.
> APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!
This is not true at all. If you need to distribute multiple versions of the same package then all you need to do is provide multiple versions of the same package. Just get your act straight, learn how to package software, create your packages so that you can deploy them simultaneously without breaking downstream software, and you're set.
> AFAIK, dpkg doesn't allow to have multiple versions of the same package installed at the same time.
That's true, but multiple versions of the same package is not the same as multiple versions of the same software. Debian and Ubuntu have access to multiple minor version releases of GCC and they can all coexist side by side.
Correct. Multi-version packages are not supported by dpkg. They are supported by rpm in limited circumstances (no file conflicts allowed!). Fedora and openSUSE kernel packages work this way, as an example.
The way Debian works around this is by doing "<name>-<version>" as the package name. This is a valid approach, though it makes package name discovery a bit more difficult at times...
As for multiple versions of packages in the repo, "createrepo"/"createrepo_c" (for RPM repositories) does not care.
This is somewhat at your peril, as I've observed APT getting confused when it parses metadata from repositories produced by dpkg-scanpackages that allows multiple versions in the repository.
However, reprepro does not support this at all, so most deployments with semi-large Debian repositories will not have this option available to them anyway.
> No, those are <somepackage>-<someversion> where <somepackage> is different.
It seems there is a significant semantics gap between how deb packages work and what's supposed to be a package version.
In deb packages, package versions are lexicograhically ordered descriptions of a version ID that is used to guide autoupgrades.
If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.
> If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.
Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.
If you have a repo with <packagename>-<packageversion> and add a package <packagename>-<packageversion1> where packageversion1 is higher than the package version, the previous version gets deleted from the repo.
Can it be worked around? Sure, you can:
a) have multiple repos. If you want to keep up to a hundred versions, you can just create a hundred repos.
b) you can redefine the meaning of a package name and incorporate the version into the name of the package.
(b) Sounds like a good solution. Except if the org decided to use something like .deb for distribution of artifacts it is probably the org that uses other software. Let's say it uses puppet, which supports Debian package management out of the box, except that now you need to change how puppet uses version numbers because as we started embedding the version name into the package name "nginx-1.99.22" and "nginx-1.99.23" became two different packages not two different versions of the same package.
> Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.
That's the semantic gap I've mentioned.
Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .
Take, for example, GCC. Debian provides official deb packages for multiple major and minor release versions of GCC, and they are quite able to coexist in the very same system.
All it takes for someone to build deb packages for multiple versions of a software package is to get to know deb, build their packages so that they can coexist, and set their package and version names accordingly.
> Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .
Through multiple repos. Not through one repo. That's why testing packages live in a separate repo. That's why security packages live in a separate repo. That's why updates live in a separate repo.
If you are using careful versioning on packaging avoiding the <packagename>-<version> as the convention you are breaking other tools, including the tools that are distributed with Debian. Puppets' package { "nginx": ensure => installed } will not work if you decided to allow nginx to have multiple versions using "nginx-1.99" as a special package name.
Wrong. The official debian package repository hosts projects which provide multiple major and even minor versions of the same software package to be installed independently and to coexist side-by-side.
Seriously, you should get to know debian and its packaging system before making any assertion about them. You can simply browse debian's package list and search for software packages such as GCC to quickly acknowledge that your assertion is simply wrong.
It is not. You are talking about a hack/workaround/different methodology. That hack cannot be natively integrated with the other software that uses a regular idea of what is a package name and what is a package version (for example, puppet, which ships with Debian). It is a garbage approach just like the APT approach of using a repo manifest for signatures is garbage, and just like approach of not storing crypto hashes of every file in a .deb inside the .deb is garbage.
I have provided the challenge in another reply:
I have in a repo:
package: nginx
version: 1.99.77
I need to add to the repo:
package: nginx
version: 1.99.78
Both of the versions must remain in the repo. Packages must be signed. Repo must be signed. The name of the package cannot change and neither can the version number. Both of the packages must be installable using a flag that specifies a version number passed to one of the standard Debian package install tools (the tool must be in "main" collection )
What is a tool that can be used that is listed https://wiki.debian.org/DebianRepository/Setup as existing in the "main" part of the Debian? For half a point, you can use any tool listed in the wiki.
Unless I'm missing something, this is 100% possible. Now, if you want to carefully control the version constraints of these, welcome to pinning hell, but that's a different problem.
I need to add to the repo <thispackage>-<this-version>-<that-patchlevel>.architecture
Packages are signed and repo will be signed. At the end I should be able to install the package using the select the specific version flag without adding another repo.
A listing of all of these packages side by side, along with information on how it is built that includes the very scripts used to do so (including constructing that listing), is one of the value-added bonuses in the GOPHER version of the repository:
2) To access the "methods" i need to install a gopher client
That's pretty much a definition of the "handwave! Workaround!"
P.S. I have implemented a workaround. It works. It is just not a solution. I should not need to assign people to resign the packages with our own keys and create utilities that would duplicate Debian tools to provide a nominal access to regular required functionality by a large organization that has packages with hundreds of dependencies and could have 5-20 versions of the apps in different production/test/validation/qa environments. Joe Random Engineer expects that his knowledge of how apt-get works, how dpkg works and how puppet works to be portable.
You clearly do not understand, especially given that gibberish about not being listed in some wiki, and utter irrelevancies about dpkg and puppet, which have no bearing upon the matter of publishing one Debian repository with multiple versions of packages at all.
I'm just not going to laboriously re-type it all into Hacker News when you can just go to the published repository and see it all explicitly laid out right there in the repository itself, with the exact scripts and commands that get run to produce the very repository that you are seeing -- quite the opposite of either handwaving or workarounds.
You could get to it with an FTP or an HTTP client, too. But that would just leave you with the raw files in no particular order, rather than the annotated and organized GOPHER listings. A value-added bonus for the GOPHER version, as I said.
You don't actually have any justification at all for your claim that this is somehow impossible with Debian repositories, given that people like me are doing it with a few simple scripts and even publishing them for the world to see; and clearly neither handwaving nor workarounds for anything are required.
First of all, it is the only thing that is relevant. Repo needs to be functional using tools provided by the distribution that uses that repo format. No one cares that one can download tarballs and compile them -- this is not 1997. In 2019 it is "Enter this command and get this result". Feed this result into the orchestration/management framework. Find a bug? Fix the bug. The rest of the system will continue to function.
Second of all, you still have not provided a link to the doc that someone can read without installing a gopher client. Come on, you said you already have it!
Also, as the article posts out - it's not exactly trivial to deploy https across their global mirror network or to make it work with local caching proxies. That's an easy thing if you've got a handful of servers or a few load balancers, but not so easy or practical for their use case.
(Also, remember most of the apt development had already happened way before free ssl certs became a thing. While saying "Why don't then just use certbot/LetEncrypt is an easy criticism, give them credit for having actually build a GPG sig secured distributed software delivery system years before LetEncrypt existed...)
And ultimately none of those ways to detect this aren't useful against a sufficiently targeted attack with direct access to the signing key.
It's one thing if you sign a bad key for Google.com, publish it in CT logs and then put it up on the public internet - it's quite another if you sign a bad key for midsizecompany.com, keep it out of the CT system and use it only in a targeted attack against non-technical individuals who are unlikely to examine a certificate or use things like Certificate Watch.
With that said, I still believe serving it over HTTPS would be a substantial improvement. Perhaps pin the cert or at least the CA out of the box to prevent such attacks.
CT monitors are primarily for site operators not end users. When site operators spot a rouge cert issuance they can clarify that and ultimately get the cert revoked.
Precisely my point - it's not something an end user would notice. And if it doesn't even appear in CT logs or for the end user it would likely go completely unnoticed.
There really aren't "so many ways to detect this" - there's about 3: the user examines the certificate, CT logs catch it later and detection in browsers of major changes on the most high profile sites. Anything falling outside of those will almost certainly go unnoticed.
When you (a CA, but also just anybody who finds a new one) log a new certificate or a "pre-certificate" (which is essentially a signed document equivalent to the final certificate but not usable as a certificate) the log gives you a receipt, a Signed Certificate Timestamp, saying it commits to publish a consistent log containing this certificate within a period of time (today 24 hours).
The SCT proves that this particular log saw a signed document with these specific contents, at this specific moment.
Chrome (for a long while now), Safari (announced for early 2019) and Firefox (announced but a bit vague on when) check SCTs for publicly trusted certificates.
The browser can look at the SCT and verify that:
* It was signed by a log this browser trusts
* It matches the contents of the leaf certificate (DNS names, dates, keys, etcetera: Distinguished Encoding means there is only one correct way to write any certificate so there can't be any ambiguity)
* It has an acceptable timestamp (not too old, in some cases not too new)
It can also contemplate the set of SCTs and decide if they meet further criteria e.g. Google requires at least one Google log and at least one non-Google log.
If any of these is wrong, the site doesn't work and an appropriate error message occurs, no user effort is needed or useful here.
We know this works because Google managed to do it to themselves by accident once already, blocking Chrome access to a new Google site for - I think it was several hours - because their dedicated in-house certificate group screwed up and didn't log a new certificate.
There is more to do to defend the system completely:
1. You could compel a log operator to emit an SCT, but then not actually log your certificate. Certificate Transparency can detect this, a browser would need to remember SCTs it has seen and periodically ask a log for a proof which allows it to verify that the log really has included this certificate.
2. You could go further and compel the log to bifurcate, showing some clients a history with your certificate in, and others a parallel history without that certificate. This can only be detected using what is called "gossip" in which observers of the log have a way to discuss their knowledge of the log state and find out systematically if there are inconsistencies.
Once both these things are in place, there's basically no way around just admitting what do you did. Which of course doesn't mean any negative consequences for you, but it does make _deniability_ (much desired by outfits like the NSA and Mossad) hard to achieve if that's something you care about.
The arguments are correct. APT does not need HTTPS to be secure. That said, if APT was designed today I'm sure it would use HTTPS. It's now the default things to do, and Let's Encrypt makes it free and easy.
However Debian, where APT is from, relies on the goodwill of various universities and companies to host their packages for free. I can see that they don't want to make demands on a service they get for free, when HTTPS isn't even necessary for the use case.
Also since APT and Debian was created in the pre-universal HTTPS days, it does things like map something.debian.org to various mirrors owned by different parties. That makes certificate handling complicated.
As explained on the website, HTTPS would not add meaningful privacy at all, because without significant other changes the architecture, what you're doing is still downloading files from a very limited set. The size of the files in most cases is unique so that an onlloker can tell what you downloaded, encrypted or not.
I find this argument not very convincing. Suppose an attacker wants to track people downloading stuff over APT. This is what they would need to do:
In case of HTTP - Step 1: Read the HTTP request payload. Step 2: There is no step 2.
In case of HTTPS - Step 1: Build an index of all possible packages and their sizes. Step 2: Reassemble HTTPS response traffic into individual HTTP responses. Step 3: Look up the response length to the corresponding package. Step 4: In case of identical file sizes, make some sort of model to find out which packet it looks to be based on other packages downloaded (?).
Yes, it's still possible to track people's packages all the same. But you have to have a have way more determined and prepared attacker - it cannot be as easily be done through casual eavesdropping. It's a false equivocation to say it would not add meaningful privacy, as your attacker model changes from casual eavesdroppers to more determined attackers.
You might not care for that particular distinction, and I agree people should have the choice to use HTTP or FTP for APT when selecting mirrors. Unencrypted APT is plenty secure, but encrypted APT is really a little better. In my opinion, there should not be so much resistance for HTTPS in default configurations (e.g. the Debian project could easily require this for their official mirrors around the world). Let's Encrypt makes this so easy, there's no argument anymore in my opinion.
That's because you are glossing over a lot of packet inspection deeper than the TCP level with a glib "Read the HTTP request".
You might want to think about existing surveillance systems. Analysis of telephone traffic is often done purely on the CDR (the caller, callee, and length of call, in simple terms) without the equivalent of deep packet inspection to read the HTTP request, which would be analysis of the actual audio data themselves. The HTTPS case would likewise need just the total octets transferred over the TCP connection for fingerprinting.
There's a lot of glib handwaving in this discussion about identical sizes, not based upon actual measurements of the Debian archive. I quickly looked at the package cache in one of my Debian machines:
jdebp% ls -l|awk 'x[$5]++'
-rw-r--r-- 1 root root 3314 Feb 16 2018 nosh-run-freedesktop-system-bus_1.37_amd64.deb
-rw-r--r-- 1 root root 35190 Dec 14 2016 redo_1.3_amd64.deb
-rw-r--r-- 1 root root 1114546 Feb 25 2018 udev_232-25+deb9u2_amd64.deb
jdebp %
It turns out that in practice size alone almost does uniquely identify package in this sample. The other file that is 35190 bytes is version 1.2 of the same package, leaving just 2 possible ambiguities out of 847 packages. It seems likely that this holds after encryption as well.
So the remaining question is how much HTTP pipelining ameliorates this, which no-one here has yet actually analysed.
I'm not sure I understand your reply. Are you replying to me as if I said that determined attackers cannot trace back HTTPS traffic to individual APT packages? Because I said no such thing.
I just made the distinction between casual eavesdroppers and determined attackers. Those determined attackers exist and are quite capable, I'm sure. I said as much in my post.
You might also want to look into your use of the word 'glib' here. I find it an uncharitable interpretation of my post to call it 'glib' or 'glib handwaving', to be honest. Makes it seem to me as if I should be defending something I said, but I'm not sure what.
There's really no difference between a "casual" eavesdropper and a "serious" one. In what world do you live in where the former camp even exists? No one is casually spying on your apt updates, and anyone who is "seriously" spying on your apt updates can trivially manage to identify them by size. HTTPs really doesn't add anything here.
Of course no-one is spying casually on HTTP APT traffic specifically. Nobody is arguing that strawman - nobody here is "living in that world", give me some credit please.
But people spying casually on HTTP traffic in general do exist. People able to spy on HTTP traffic in general casually is one of the main reasons we care about HTTPS in the first place. Even though people can do a targeted content length analysis for nearly all other the stuff we read/watch/download online, too. We still care about HTTPS for all of that. And we should probably care for that with APT too, if only a little bit.
TL;DR HTTPS gives you potentially more confidentiality but not guaranteed as known vulnerability exists which an advanced attacker can exploit. You should not assume confidentiality when using APT over HTTPS. The severity of this issue in a CVSS is going to be very low because it is only an information leak.
ISPs, for example, eavesdrop us all the time, and they do it quite casually. They will modify your unprotected HTTP requests, inject ads, log everything they are able to, and sell the data if they can.
Wouldn't help. If almost all of the files' sizes are unique (I'm pretty sure compressed package sizes aren't even block-aligned), and you know the sizes ahead of time, and you can make test requests for samples of what headers are being sent/received, it's trivial to calculate which combination of packages would result in a given stream length using pipelining. You'd have to add countermeasures like padding or fake data.
You could quantize these with up to 10% padding and cause a very large number of collisions, but that wouldn't be useful without HTTPS. Is the core argument that privacy is not attainable, or that it is not valuable?
I think the core argument first of all is that it is not attainable trivially by "just using HTTPS". So it's a question of costs vs. benefits, where the costs are pretty big (change the whole infrastructure).
This is wrong. A great deal more privacy is attainable by trivially using HTTPS. Privacy in the presence of stream inspection is more difficult, but attainable by padding files to have quantized lengths.
How do you arrive at 120? I guess that makes sense if the biggest packages is within 92709x (1.1^120) the size of the smallest. But that doesn't seem like enough range, just eyeballing it. If you have a 1kB package at the low end, I'd be surprised if Debian didn't have a package bigger than 92 MB.
Presenting it as 120 from 43,000 is a bit of an oversimplification, because the average isn't meaningful. The long tail is going to have the worst privacy and the small packages will (probably) have the most.
A scheme like this might be workable but requires being really careful about the security properties you're claiming (i.e., of those 120, probably half are unique, large packages). And obviously, this scheme requires up to 10% additional bandwidth, in the case of the chosen 10% threshold. If buckets change over time, packages moving between buckets may leak a lot of information.
This sounds less like some sort of massively impossible barrier to overcome and more like a Project Euler problem, and one not all that far into the sequence, either.
One of the things you have to overcome if you want to think like a security person is that, yes, there are attackers that will put some effort into attacking you if you are a target of any consequence, certainly effort far exceeding what you just described. I've watched some people at the company I work for have to overcome that handicap myself. Yes, there are attackers that are not just script kiddies and actually, like, have skills and such.
Attackers won't jump through infinite hoops, but getting a foothold on a network somewhere where they'd like more access, seeing that they can watch a new system in your network getting provisioned, and cross-checking that against a list of known vulnerabilities by looking at package sizes would be boringly mundane for them, not something wildly exotic.
There are two reasons you want to compromise a host:
1) You're building a botnet (or, these days, are crypto mining). In that case you're not targeting a specific machine, you just want many of them.
2) You want to exfiltrate information from a host, or sabotage it. In that case you're targeting a specific machine.
I'd argue that in both cases, the proposed attack vector of inferring installed software versions through apt downloads is inferior, or at least more involved. In case 1) you're better off scanning for known vulnerabilities or make use of shodan and the likes.
In case 2) you're probably going to probe the server anyways. It might take a little more time than if you just had a complete list of installed packages and their version (given you were somehow able to eavesdrop on the host in the first place), but you'll most likely determine at least what OS is running and what technology stack their internet-facing services are running on after some nmapping.
Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.
> Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.
This is exactly my position, to be fair. We're all bike-shedding here as far as I'm concerned - including this very website. I think the position that HTTPS doesn't help you is a little bit disingenuous, and the only fair position is that "coordinating this stuff takes time and effort we don't feel is worth the negligible advantages" (as you say, and as this website says) is a more acceptable argument than "the negligible advantages don't exist" (as this website seems to also want to say).
A: Assuming package privacy could somehow be protected (a question I leave to part B), then I agree that there is no improvement in security against capable attackers: they would have to focus on hacking the APT servers, on which the attack surface can be minimized, and monitoring and logging any deviations can be tracked and published for all to inspect.
B: if the packages are concatenated to each other and a random length noise string, we can substantially frustrate nation state / ISP level attackers to the point of forcing them to get this information from the endpoints themselves: Either end user, or the APT server must already have been compromised.
1) End user not yet compromised: in order to capture these, they must attack the APT server.
2) End user already compromised: on each update of all compromised users, information is sent to C&C, so this would produce lots of opportunity for attentive users to discover the implant.
When focussing on the APT, which would give the cleanest record of attack surfaces, the community can put man power on designing minimalist APT servers, and inspecting published deviations in communications can lead to uncovering 0days.
EDIT: changed disagree to agree, as I (incorrectly) thought you were arguing it would not make attacks more expensive, woops!
Any self signed certificate? Then the attacker who is already capable of intercepting packets can just serve whatever and proxy your request. Your argument to the main argument falls short just the same..
Steps 1-4 are very easy; they just require some dev work. Thinking an attacker won't bother because it sounds annoying to implement is security by obscurity.
No, it's security against a specific class of attacker that your threat model is considering.
We know there are nation states that build profiles of each user based on their HTTP requests, but we don't know of any that have written custom software specifically to target Debian users.
It would take them half an hour to write that custom software. Under what threat model is "nation-states that can't spend half an hour writing code" a meaningful adversary? Note in particular that they can identify users by previous traffic statistics (and just normal traffic flow statistics that might be kept as a routine log by any network administrator, even); they don't need to write the code in advance. Given the number of bytes transferred, the times, and an archive of the state of the Debian archive (which is publicly available), they can always identify past downloads.
(Would it help if I wrote that code right now and put it on GitHub?)
> It would take them half an hour to write that custom software.
It would take a software engineer half an hour to write that custom software. It would take a government years to amass the political will to target such a small section of the population, and then potentially hundreds of thousands of dollars for a government contractor to offer a solution and implement it.
There's always going to be a big difference in threat level between a piece of software which already exists and a piece of software that could exist. For example, when you're snatched off the streets by the secret police, and they go to investigate what you've been doing in the country, they might be able to request from HQ a list of HTTP addresses fetched from the IP address associated with your apartment, but they're unlikely to be able to request that HQ write some software to go back retrospectively and count bytes of individual connections you made.
> (Would it help if I wrote that code right now and put it on GitHub?)
No, but it would help if you wrote a patch for APT which made it use HTTP range requests to hide the size of the files it downloads. That should only take half an hour, right?
I continue to be confused by this threat model where all government agencies you're worried about are plagued by massive levels of US-style governmental bureaucracy and can't get anything done, yet they're capable of being meaningful threats. (Also where the only entities you're worried about are government entities.)
The entities I'm worried about have bought off-the-shelf surveillance tools which record the HTTP requests associated with each IP/MAC address. This is a minimum viable product for governments and ISPs (not to mention businesses, like hotels and coffee shops), and it is reasonable to think that such a system is deployed on orders of magnitude more networks than a system that tries to infer Debian package downloads from counting bytes of HTTPS traffic.
The thing to note here is that the only reason this seems easy is that there is tooling readily available for such a task. If you didn't have such tooling, you'd find it more difficult to implement then your HTTPS case even _with_ tooling.
The same principle applies to your HTTPS case. Your argument disappears as soon as there is tooling. That tooling only needs to be written once. Perhaps it already has been done and exists in the circles where people want to surveil apt users. One possible reason such tooling isn't widely available is that apt doesn't use HTTPS by default, and one outcome may be that if apt switched to HTTPS the tooling would appear.
I have half a mind to write the tooling and publish it just to eliminate this argument. It really isn't very difficult.
There's a big difference, privacy-wise, between being able to say "X is talking to this debian mirror and thus probably running debian" and "X is downloading exactly these packages from this debian mirror".
The point being made is that the fact that X just talked Y bytes to this Debian mirror is enough to know exactly which packages were downloaded.
The argument that https obfuscates which packages you download is not a good one, and may cause users to unnecessarily worry about the implications (and conversely, that they end up "more safe" if that was not the case). If that type of privacy is desirable you should probably use something like Tor.
When I say `apt-get install foo` and it brings in 47 other packages, the problem gets exponentially more difficult. "He downloaded 1,432,509,104 bytes; what packages were those?" is more or less O(2^n): https://en.wikipedia.org/wiki/Knapsack_problem
If you download exactly one package, it may be easy to deduce which one it was (assuming that the protocol overhead is identical each time, and that changing timestamps and nonces doesn't affect the byte length whatsoever, etc.). If you download more than one at a time, which is common with Debian, then the problem is a whole lot harder.
An HTTPS stream can be side-channel (ie size, time) broken down into black-box HTTP/1 requests quite easily. Remember, even with Connection: Keep-alive, you still have to request every file synchronously after you're done with the previous one.
They can but they do not because servers are broken (and terribly so) and it brings less than zero gain for big files.
See, debian does not want to manage specific mirror server features if they don't have to. If they were in that position they'd make their own protocol.
its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.
It would be more fruitful to discuss the different ways an attacker might deduce what software was installed from a naive implementation: download sizes, download date (i.e. new update available for package P, then a substantial fraction of users who were downloading from the server that day were probably installing P etc...
In theory an onion router might substantially improve the situation if the attacker has a hard time identifying which server the user is talking to, and thus making it hard to identify if the user is even installing anything at all...
sadly I don't trust TOR as long as I can't exclude a specific attack scenario I have always suspected about TOR but never actually known to be present...
> its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.
Sure, but then you are not talking about "just use HTTPS", you're talking about creating your own protocol and requiring all APT packet sources to speak that protocol, requiring a specialized server software, where currently they can just use whatever HTTP server they want. Switching the whole infrastructure and installed base over to that would be a massive multi-year project, not just a handful code and configuration changes.
Note I use the phrase "naive implementation" and was never talking about nor clamoring for "just use HTTPS".
The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors? can we virtually force those to attack the ATP servers themselves? could we use oblivious transfer ? could we design a fresh minimalist onion router (as opposed to bloated TOR) for package distribution?
> Also, can we do it on top of the current Debian infrastructure (HTTP and everything)?
I'm pretty sure that is not possible, because the current infrastructure is just plain old HTTP file servers (anything that can sling bits will do) run by whoever fancies being a part of it.
you're replying downstream of my comment containing "for me the real interesting question is:..." where I generalized the question away from the false dichotomy "keep apt as it is, or make https default in apt"
> The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors?
That is certainly an intersting theoretical question, but in practice there is also the question of the costs of something that requires you to change and complicate the whole distributed infrastructure vs. the benefits - are there actually real people who need privacy against "nation state level actors" specifically concerning the Linux packages they install?
EDIT: just adding, we also don't know what the cost is of the most efficient privacy presevering distribution method actually is. only when people investigate and try will we find out.
> variable length noise
Just thinking... could this be achieved by simply adding some extra response headers on the mirror side? That is the response could contain some headers like
This could be enough to insert random-length noise without the need to invent any new protocol. Of course this would only be effective for smaller packages as I assume that headers size is limited and as a result this would significantly change the perceived download size only for smaller packages.
This can easily be solved by inflating the wire size of the file but the received download, transmitting junk packets, that fail to decode, but still look like encrypted noise to an eavesdropper.
That doesn't sound like an argument against it to me. It just means that in addition to HTTPS, they also need to make "significant other changes the architecture".
...which would be a huge undertaking given the vast installed base and the fact that packet sources currently don't run any custom server software at all, which would need to change.
I'm not so sure about that. Assuming all the packages are downloaded over a single connection, you can easily pad the response client-side via HTTP range requests, for example. No special server software required; just a normal HTTP server.
There is a huge difference between the third-party knowing what you're doing, and everyone in between knowing what you're doing.
The HTTPS everywhere movements are attempting to make privacy the default rather than the exception, but are of course done knowing that the server will always know whats up. The point is to make it so that only the two parties concerned, the server and the client, comprehend the communication rather than the entire world
Okay, suppose I want to know what packages you're installing and updating. I have two routes:
1. I gain access to a router near you.
2. I rent a sizable server at the hetzner/ovh/… location that's closest to you and volunteer to run a mirror for the OS you're using.
Both are somewhat uncertain (your traffic might flow via a different route, you might be load-balanced to a different mirror) but the uncertainty seems comparable. Option 2 seems so much easier that I have a real problem seeing the point of even attempting option 1 if all I want is the information option 2 would give. Perhaps someone can explain?
You'd have to take control over my mirror. Making a new mirror will not get you any traffic unless people actively choose to use it. Therefore, your option 1. is to control a part of the public route, and option 2. is to control the mirror of my choice.
However, any argument against using encryption for privacy for APT can equally be applied to any other traffic. Do you trust your public internet route enough to let your traffic run authenticated, but unencrypted? Chats, news, bank statements, software updates?
Even if content cannot be modified, it can still be blocked or made public. There are quite a few nosy governments that would like to know or block certain types of content, software packages included.
Lots of people use the nice hostnames like ftp.us.debian.org. It wouldn't be too hard to get included in that host name if you're determined. I haven't looked into the requirements, but I'm pretty sure it's a) be technically competent enough to run a mirror (it's not hard) b) have a lot of bandwidth. c) be organizationally competent enough to convince Debian that a and b will hold for a long time.
You will instantly spot blocking because apt reports connection failure and hash failure and replay attack on update list is ineffective.
As for privacy, eh? It is visible you're connecting to a debian mirror and what size of update list you're getting. Barring that, indexing packages by size is trivial.
You want true privacy, you'd have to use Tor or such.
Option 1 is hard for individuals but easy for state actors.
For example, they might want to know what versions you're running (by looking at what updates you _didn't_ download) so they can target you or lots of people at once.
Since it's known to which host you connect and the pattern of access is known to, it's a reasonable guess that it would be possible to infer the list of packages that you're downloading from observing the encrypted traffic.
Instead of JUST that third party knowing what (possibly vulnerable) packages you have, you are now letting everyone know what possibly vulnerable packages you have.
> you are now letting everyone know what possibly vulnerable packages you have.
Erm, what?
The number of people, who can listen to (much less — modify) your traffic is very small. It is basically your ISP (who is supposed to offers you services in good faith, not spy on you) and a number of engineers, who maintain Internet backbone. That's far from "everyone". Some SSL evangelists make it sound like everyone's traffic is permanently broadcasted to everyone else in the world, but it is not.
As for "vulnerable packages", the most certain sign, that someone does not install security updates, is lack of traffic between them and update servers. But that's orthogonal to use of encryption.
"everyone" in this context obviously means everyone on the path, and any attackers that have compromised nodes along the path. See the Belgacom hack by 5 eyes...
there is no proof that this is true in general, so it is worth trying to find 1) an inefficient way in order to 2) postulate an efficient way...
for example, overlay onion routing, size blurring by appending random length random bits,... with oblivious transfer even the APT-server does not know what you downloaded (but that would require a large amount of information..., nevertheless oblivious transfer might still be a useful tool when used as a primitive, perhaps just to send a list of bootstrap addresses for p2p hosting of the signed files etc...)
Couldn’t this be used to identify possible hosts that are running exploited software. So you watch a target and keep track of their installed packages. You also monitor for zero day exploits. The instance you have identified a zero day you also have a list of high probability targets to test it out on or exploit. Privacy is more then just I know you have blue eyes or brown eyes
> The cost for Debian and it's mirror network is very high.
I'm curious how high it actually is. They say it's high, but that could well just be hand-waving. Sure, prior to things like LetsEncrypt those SSL certs would have been a notable financial burden. There's also some extra cost on infrastructure covering the cryptographic workload, but increasingly the processors in servers are capable of handling that without any notable effort.
Certificate cost is trivial. Let's encrypt makes it free, but with a small change to the host names (country code.ftp.debian.org instead of ftp.countrycode.debian.org), all mirrors could have been covered with a single certificate. Some CAs will let you buy one wildcard cert and issue unlimited duplicate certificates with the same name. So, that would cost some money, but probably not too much.
The real costs are organizational and technical.
Organizing all the different volunteers who are running the mirrors to get certificates installed and updated and configured properly is work. Maybe let's encrypt automation helps here.
From a technical perspective, assuming mirrors get any appreciable traffic, adding https adds significantly to the CPU required to provide service. TLS handshaking is pretty expensive, and it adds to the cost of bulk transfer as well.
I get the feeling that alot of the volunteer mirrors are running on oldish hardware that happens to have a big enough disk and a nice 10G ethernet. I've run a bulk http download service that enabled https, and after that our dual xeon 2690 (v1) systems ran out of CPU instead of out of bandwidth. CPUs newer than 2012 (Sandy Bridge) do better with TLS tasks, but mirrors might not be running a dual CPU system either.
Old hardware will eventually die and needs replacing. I run infrastructure for a CDN setup, and we actually _reeuced_ the CPU overhead with TLS 1.3 + HTTP/2.
When someone says the cost is high, most people jump to monetary issues. The cost is in the time and effort required to make the changes, and to have those changes synchronised across every single APT mirror.
But! As mentioned above, outside entities being able to monitor exactly which versions of which packages are being installed to which hosts is a significant security risk.
This sort of comments aren't helpful. Switching to HTTPS will require tremendous amount of work from volunteers. You need to convince me that (1) your usecase exists (2) your usecase can be remedied with HTTPS.
We deploy our software packages to our own infrastructure and clients using a private APT repository and basic HTTP auth. Obviously we're running it with apt-transport-https installed for making the latter not completely insecure.
I see no reason to do that for signed packages from the main repositories, however.
yes but privacy is a whole another thing that is maybe not worth it. with mirrors and so on getting https to properly work is not trivial. sure it would be nice.
HTTPS is really quite trivial, especially with the advent of letsencrypt. This is especially true for simple package protocols like APT, where a repository is simply a dumb HTTP server coupled with a bunch of shell scripts that update the content.
Assuming that we consider SSH-ing into a server a negligible effort, then adding HTTPS to a APT repository or mirror is also a negligible effort.
As for whether privacy is worth it: Absolutely, especially in this day and age. There is very rarely a cost too high when it comes to privacy, and in this instance, it comes for free.
The problem is, HTTPS is not designed for privacy in any meaningful term.
1) TLS session negotiation leaks all sorts of useful data about both systems, not to mention TCP and IP stack on which it sits. This data is grabbed in 5 minutes with an existing firewall filter. Combined with IP, it shows the exact machine and web browser (incl. Apt version) downloading the file in many cases.
2) It does nothing to prevent time, host and transfer size fingerprinting.
3) Let's Encrypt helps with deployment but you get rotating automated server certificates. It is reasonably easy to obtain a fake Let's Encrypt certificate so without pinning it is worthless for authentication, pinning a rotating certificate is hard too.
Debian does not have resources to handle impostor mirrors.
it's not trivial if we are talking about Linux boxes serving as servers let's encrypt has a good chance to not work out of the box, and especially with older boxes. and then there i are other things like needing a http server for obtaining the cert rotating it, distributing it.
and you loose the ability to use a proxy, and so on. with https you are still not protected with them knowi g where you get only what you did there.
it would be great to have the ability to have https but for APT in its current form and for what it is used the cost benefit for adding https is not that compelling to me.
Analogy: It's like your hanging on a rope and also add a safety net. If the rope breaks, you only fall on the safety net instead of the ground.
All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.
That's not always true though, and it can be really hard to tell if a particular combination is going to make things better or worse. e.g. the "Breach" HTTPS+Compression vulnerability.
To continue your analogy: The rope gets tangled in the safety net, forcing you to jump or proceed up with a loose rope, because you can no longer move the rope..
Digital security strength is measured on orders of magnitude, and two mechanisms providing security with very different orders of magnitude do not add in any practical sense.
If you've seen today's bug yesterday, or carefully looked at the previous CVEs, you would have seen that https would have significantly reduced the probability of exploitability.
> All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.
I am unconvinced about the last part. More commonly an exploit in either will cause security to fail, so adding more steps just adds more attack surface and leads to less security.
TM1: attacker does not posses zerodays to installed software
TM2: attacker possesses speccific (perhaps OS, perhaps library, perhaps userland) zerodays, usage of which (including unsuccesful attempts) should be minimized to avoid detection
in TM1: it's ok to use HTTP in the clear, as long as signatures are verified
in TM2: everything should be fetched over encrypted HTTPS, since HTTP would leak information about available attack surface
EDIT: not only would this increase security by not revealing what a user installs (perhaps download some noise as well such that it becomes harder to detect what a user is installing?), it could also improve security by turning the APT servers into honeypots, so that monitoring these can reveal zerodays...
TM3: attacker has a 0-day against a complex https server and they replace packages on some mirrors.
TM4: attacker can impersonate a server using Lets Encrypt certificate and bypass their automated verification, creating a fake mirror or a bunch. (HTTP has same vector.) They can also make DNS fail or reroute.
TM5: attacker has a 0-day against the more complex https client (e.g. curl).
TM6: Attacker fingerprints network connections to given servers by size and os or os + tls fingerprinting data
TM3 and TM5 are specific subcases for TM2, and turn the APT server / client into a honeypot
TM6: we should have an overlay onion router, and I agree that the current complexity is worrying, I'd love to see a minimalist version of TOR (with minimal I don't necessarily mean the code size should be small, but minimal assumptions, and that the safety of the system can be verified from the assumptions)
TM4: I don't understand, Lets Encrypt does not calculate private keys for public keys...
nQUIC is maybe an interesting approach for some specialist applications but more likely a dead end.
The idea you can replace QUIC with nQUIC is like when Coiners used to show up telling us we're going to be using Bitcoin to buy a morning newspaper. Remember newspapers?
nQUIC doesn't have a way for Bob to prove to Alice that he's Bob beyond "Fortunately Alice already knew that" which is the assumption in that ACM paper. So that's a non-starter for the web.
nQUIC also doesn't have a 0-RTT mode. Noise proponents can say "That's a good thing, 0-RTT is a terrible idea". Maybe so, but you don't have one and TLS does. If society decides it hates 0-RTT modes because they're a terrible idea, we just don't use the TLS 0-RTT mode and nothing is lost. But if as seems far more likely we end up liking how fast it is, Noise can't match that. Doesn't want to.
Noise is a very applicable framework for some problems, and I can see why you might think APT fits but it doesn't.
> nQUIC is maybe an interesting approach for some specialist applications but more likely a dead end.
Adding on to this, nQUIC (and Noise specifically) is significantly better for use-cases where CAs and traditional PKI don't make sense, e.g., p2p, VPN, TOR, IPFS, etc...
I agree that APT is not one of these cases. Currently APT has a root trust set that is disjoint from the OS's root CA set, but they could easily do HTTPS and just explicitly change the root CA set for those connections.
EDIT: from the nQUIC paper:
> In particular nQUIC is not intended for the traditional Web setting where interoperability and cryptographic agility is essential.
On another note, I think it would be helpful to expand some points for other readers:
> nQUIC also doesn't have a 0-RTT mode. Noise proponents can say "That's a good thing, 0-RTT is a terrible idea".
0-RTT is dangerous because of replay attacks. It pushes low-level implementation details up the stack and requires users to be aware of and actively avoid sending non-idempotent messages in the first packet.
> Maybe so, but you don't have one and TLS does. If society decides it hates 0-RTT modes because they're a terrible idea, we just don't use the TLS 0-RTT mode and nothing is lost.
One major point of using Noise protocol is to _simplify_ the encryption and auth layers, remove everything that's not absolutely necessary, and make it hard to fuck up in general. Things like ciphersuite negotiation, x509 certificate parsing and validation, and cryptographic agility have been the source of many many security critical bugs.
From an auditability perspective, Noise wins easily. You can write a compliant Noise implementation in <10k loc, vs. OpenSSL ~400k loc.
> But if as seems far more likely we end up liking how fast it is, Noise can't match that. Doesn't want to.
HTTP is insecure, but faster than HTTPS. Most sites now use HTTPS regardless. 0-RTT is insecure and while it might be OK for browsing HN, removing 0-RTT makes it much harder to fuck up.
>I can see that they don't want to make demands on a service they get for free, when HTTPS isn't even necessary for the use case
you could imagine a situation where https would be optional for APT mirrors. Then the package manager would have a config flag to use any mirror or only https-enabled mirrors (probably enabled by default). This would allow to use https without creating any demands to organizations that host those mirrors - if they can they would enable it, but it would not be required. The https-enabled hosts could also provide plain http for backwards compatibility.
The argument isn’t correct, what does a user do when the download is damaged by an injection? A re-download results in exactly the same tampered with file.
Yeesh, as someone who has had to troubleshoot more than a few times HTTP payloads that were mangled by shitty ISP software to inject ads or notifications I would love HTTPS as a default to prevent tampering. I get the arguments against it, but I have 100% seen Rogers in Canada fuck up HTTP payloads regardless of MIME types while rolling out "helpful" bandwidth notifications. Signatures will tell you yes this is corrupt but end-to-end encryption means that the person in the middle can't even try.
Likewise, integrity of the download is the primary reason I’ve switched downloads to HTTPS too. The argument that singed downloads is enough fails to address what the user is supposed to do after the integrity has failed? A redownload can result in the same tampered with file. This isn’t hypothetical btw, it happens in the real world, I’ve had ISPs in Ireland and South Africa damage downloads due to their injections and users don’t care if it’s their ISP, they get pissed off at you unfortunately.
I myself uses HTTPS mirror provided by Amazon aws (https://cdn-aws.deb.debian.org/). I do so because My ISP sometimes forward to it's login page when I browse HTTP URLs. Also, it does sometime include Ads (Yeah, it's really bad, but it does remind me that I'm being watched).
Yeah true, but the arguments for tls default ring a bit hollow, to me at least.
Someone who really wants the defense-in-depth should probably be switching to onion sources anyway, I was impressed with how quick they were.
As the article says, replay attacks are voided and an adversary could simply work out package downloads from the metadata anyway.
I personally use https out of general paranoia, but understand the arguments for not changing. It's two extra lines in a server setup script.
infosec twitter is crap like any other twitter subculture, full of drama queens and clickbait to increase their fav/rt count. What's even sadder is that they make no money off it.
Indeed, something similar happened just last week. theHacker News(not to be confused with HN) twitter lashed out at VLC for not updating over https, which essentially uses same(ish?) code signing as described by APT. A bit of a shit show.
HN also had a big thread participating in the fray...
I'm seriously tempted to start flagging links that point to "bad"/"outrage" bugtracker decisions like this, wide public distribution seems to make things quite a bit worse.
They use 1024bit DSA with SHA1, it is not cryptographically secure! Thus they would really benefit from HTTPS, it would provide another layer of protection against tampering.
Oh and we haven't even addressed that their "secure signing" doesn't also protect first installs that could be insecurely downloaded.
Most Debian mirrors support https. But HTTPS alone does not help you vs fresh connection if it is a rotating certificate like Lets Encrypt that has a dubious authentication chain.
Egypt or Turkey can issue valid fake certificates so you would have to check it if it's not one of those.
What a coincidence. Just earlier this week I was installing yarn in a docker container using their official instructions (https://yarnpkg.com/lang/en/docs/install/#debian-stable) and found out I had to install apt-transport-https for it to work.
Since the image was already apt-get install'ing a bunch of other packages at that point and everything seemed to work, the obvious question that popped in my head was: does this mean none of the other packages I've been downloading used https? That's what led me to this website.
If your personal ISP injected into HTTPS, it'd be broken too. So this is purely a complaint about the particular behavior of your ISP in that it serves HTTPS more faithfully than HTTP.
My corporate ISP hijacks HTTPS (MITM with self-signed CA), but not HTTP. Any system that uses any HTTPS security properties will verify certificates and fail on my work's network.
The argument about poorly behaved ISPs for one particular protocol but not the other cuts both ways — there are different kinds of poorly behaved ISP.
Right, some years ago I was involved in deployment of an update mechanism which (like APT) used signed bundles transferred in the clear. (Originally this was for privacy concerns: our users were more concerned about verifying the content of the "phone home" connections than about hiding their activity from an observer.) Anyway, some fraction of the time it'd fail because of an ISP or corporate injectobox. That stuff all goes over TLS now, not because there's any large benefit but it is very easy to add. We still get a fraction of failures due to injectoboxes, TLS or no TLS.
Moral of the story, I think, is that having a shorter chain of trust is good. In our case, the chain of trust started with a certificate in the original (sometimes OOB) download, the key for which we directly controlled. But for TLS, there are several links in between: the client host's cert store (under the control of OS vendors, hardware vendors a la Superfish, local administrators, etc.), the mess that is the TLS PKI community, your CA, several hundred other CAs, and finally you.
It didn't used to support it not too long ago so I had to setup my server to explicitly not redirect to HTTPS for one particular location because people would need to install apt-transport-https for it.
We used FAI[1] to install it into the boot images we used and then ran it that way (other methods), but there still is the verification of the packages you put on those. Short of manually auditing the code and compiling that yourself then there's not much else in the trust chain. It's not really that necessary though, realistically, with the other protection methods. We just did it as it was fun to do and well, we could!
One reason to prefer HTTPS is that in the event of a vulnerability in the client code, an attacker cannot trigger that vulnerability using a MITM attack if HTTPS is in use. One such vulnerability was recently found in apk-tools: https://nvd.nist.gov/vuln/detail/CVE-2018-1000849
While I agree with your point, a counterpoint is that vulnerabilities in the HTTPS implementation are also possible, and by introducing HTTPS code you are increasing the surface area of possible vulnerabilities.
I don't believe that increases the attack surface. As long as apt supports https repos and redirects:
An attacker who wants to exploit a buggy pre-auth (or improperly cert-validating) client-side ssl implementation, when the connection is http, can just MITM the http connection and redirect to https.
That's a great point, assuming APT follows redirects by default!
I don't know enough about it to know either way. I do know that you need to install a package to get HTTPS support for most connections, but I'm not sure if that package is just "switching" to using HTTPS by default or if it actually adds the ability for APT to read HTTPS endpoints.
The argument is that the package you are downloading is already signed with public key crypto and verified during the update process. It's integrity is "secured". However there could be bugs in that implementation, and bugs can be exploited to (in one of the worst case scenarios) gain remote code execution during a MITM attack.
A "solution" is to protect the endpoint with HTTPS, making MITM attacks impossible. Except that it's also possible that the HTTPS code could suffer from vulnerabilities which can lead to remote code execution. And if I'm being honest, the code which implements HTTPS is much larger and more complicated than the code which is doing the signature checking in APT right now, so by that measure it's actually a downgrade to something "less secure" since it's just adding on more complexity while not improving security much at all.
In reality I believe HTTPS is more heavily scrutinized than the signature verification code in APT is, and therefore could improve security, and there are additional other benefits to HTTPS aside from added security against implementation bugs (like an improvement to secrecy, even if it's small, and better handling by middleboxes which often try to modify HTTP requests but know to not try with HTTPS requests).
This isn't accurate. If the SSL code incorrectly trusts the wrong server, then you're no better off but also no worse off. If the code has a RCE vulnerability caused by bad parsing logic, then you're worse off than you would be without it.
My counter-counterpoint is that while OpenSSL had (has?) horrible security issues, it's still worth using HTTPS in principle, because a modern internet connected system that has no trustworthy SSL library is going never not going to have security problems. Whether it's hardening OpenSSL, shipping BoringSSL, or anything else, systems just have to get this right, and once they do, applications like apt can take advantage of it.
Yes, but this comparison doesn't favor APT/PGP at all. Using OpenSSL or similar for your HTTPS implementation means you're running code that the entire world already depends on for security, and which your own OS probably also depends on in other scenarios. Using PGP means you have some kind of custom transport implementation that you're responsible for. To the extent that you're solving the same problem, not using HTTPS is much riskier than using it.
> "Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer"
Is it really not difficult? I bet if you sorted all the ".deb" packages on a mirror by size a lot of them would have a similar or the same size, so you wouldn't be able to tell them apart based on the size of the dialog.
Furthermore, when I update Debian I usually have to download some updates and N number of packages. I don't know if this is now done with a single keep-alive connection. If it is, then figuring out what combination of data was downloaded gets a lot harder.
Finally, this out of hand dismisses a now trivial attack (just sniff URLs being downloaded with tcpdump) by pointing out that a much harder attack is theoretically possible by a really dedicated attacker.
Now if you use Debian your local admin can see you're downloading Tux racer, but they're very unlikely to be dedicated enough to figure out from downloaded https sizes what package you retrieved.
As I found at https://news.ycombinator.com/item?id=18960239 there can be duplication which is irrelevant for the point being discussed, as it is one version of a package duplicating another version of the same package, meaning that the size is still a unique identifier of the package. It is worth checking that.
> "Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer"
>> "Is it really not difficult? I bet if you sorted all the ".deb" packages on a mirror by size a lot of them would have a similar or the same size, so you wouldn't be able to tell them apart based on the size of the dialog."
Human readable sizes: Sure.
Byte size info: Not so much. And even if: Things would become very clear to the attacker after one update cycle for each package.
If you really want to mitigate information about downloaded packages you would have to completely revamp apt to randomize package names and sizes, and also randomize read access on mirrors...
> If you really want to mitigate information about downloaded packages you would have to completely revamp apt to randomize package names and sizes, and also randomize read access on mirrors...
There isn't a need to randomize package names, or randomize read access on the mirror, given fetching deb files from a remote HTTP apt repository is a series of GET requests. Randomizing order of these requests can be done completely on the client side.
Package sizes are still problematic. Here's a suggestion: if each deb file was padded to nearest megabyte, and there was a handful of fixed-size files (say, 1MB, 10MB and 100MB), the apt-get client could request a suitably small number of the padding files with each download. This would improve privacy with a minimum of software changes and bandwidth wastage.
If each file were padded to the nearest MiB, the total download size of the packages containing the nosh toolset would increase by almost 3000% from 1.5MiB to 46MiB. No package is greater than 0.5MiB in size.
I am fairly confident that this case is not an outlier. Out of the 847 packages currently in the package cache on one of my machines, 621 are less than 0.5MiB in size.
You're abusing the notion of a straw man, which this is not.
I am pointing out the consequences of Shasheene's idea as xe explicitly posited it. Xe is free to think about different sizes in turn, but needs to measure and calculate the consequences of whatever size xe then chooses.
No, it would not apply the same with different sizes. Think! This is engineering, and different block sizes make different levels of trade-off. The lower the block size, for example, the fewer packages end up being the same rounded-up size and the easier it is to identify specific packages.
(Hint: One hasn't thought about this properly until one has at least realized that there is a size that Debian packages are already blocked out to, by dint of their being ar archives.)
It's still useful to be able to connect to the local mirror without tor (and enjoy the fast transfer speeds), but still mitigate privacy leaks from analysis of the transfer and timings.
Transferring apt packages over tor is unlikely to ever become the default, so it's worth trying to improve the non-tor default.
They could also improve the download client to fix this.
For example, if the download client uses the byte-range HTTP requests to download files in chunks, there is nothing stopping it from randomly requesting some additional bytes from the server. Then the attacker would have a very weak probability estimate of what was actually downloaded.
I'm somewhat surprised that no-one has (yet) linked to this post [1] by Joe Damato of Packagecloud, which digs into attacks against GPG signed APT repos, or the paper which initially published them [2].
The post makes clear in the third paragraph: "The easiest way to prevent the attacks covered below is to always serve your APT repository over TLS; no exceptions."
Both sides sometimes argue disingenuously. It's true that caching is harder with layered HTTPS and that performance is worse. It's also true that layering encryption is more secure. (It's what the NSA and CIA do. You smarter than them?)
Personally, I'd default to security because I'm a software dev. If I were a little kid on a shoddy, expensive third world internet connection I'd probably prefer the opposite.
I think it's important to reiterate that HTTPS only adds network privacy about what you download. The signing of repos and their packages means they're guaranteed to be a pure copy of what's on the server.
That same mechanism means you can easily make trusted mirrors on untrusted hosting.
If someone steals a signing key then they also need to steal the HTTPS cert. Or control the DNS records and generate a new one or switch to HTTP.
Adding an extra layer of encryption is like adding more characters on a password. Sometimes it saves your bacon, sometimes it was useless and came with performance drawbacks.
If you still disagree with me, that's fine. But I want to hear why you continue to hold this opinion when worked for 1Password during Cloudbleed.
In a scalar sense, sure. In a binary "do we have enough security" sense, less so. I realise that's a shitty quality of argument and I could have been more explicit but you can always add more security. Why, for example, aren't you demanding reproducible builds and signed source uploads?
Simply put —and this is, I think, where we disagree— the signing of packages is enough. The design of Apt is such that it doesn't matter where you get your package from, it's that it matches an installed signature.
Somebody could steal the key but they would then either need access to the repo or a targeted MitM on you. Network attacks are far from impossible but by the point you're organised to also steal a signing key, hitting somebody with a wrench or a dozen other plans become a much easier vector.
The problem with the binary sense is that people misunderstand risk. For example, the blackout in 2003 was partially caused by the windows worm earlier in the day. Even though the computers that were used to control the power grid weren't running windows, the computers that monitored them were. So a routine alarm system bug ended up cascading into a power outage that lasted over a week in some places, including my home at the time. This was classified for a while.
The people that programmed Windows before 2003 probably didn't consider their jobs with the full national security implications.
Then you take something simple, like Linux on a simple IoT device. Say a smart electrical socket. Many of these devices went without updates for years. Doesn't seem all that bad, right? Just turn off a socket or turn it on? How bad could it be?
At some point someone noticed that they were getting targeted and and said: "But why?" The reason is simple. You turn off 100k smart sockets all at once and the change in energy load can blow out certain parts of the grid.
The point isn't that someone will get the key. The point is that we know the network is hostile. We know people lose signing keys. We know people are lazy with updates. From an economics perspective why is non-HTTPS justified? Right? A gig of data downloaded over HTTPS with modern ciphers costs about a penny for most connections in the developed world.
Although I would not class this as even potentially in-line with Blaster or the imminent death of the internet under an IoT Botnet, I see your broader point. The deployment cost approaches zero and it does plug —however small— a possible vector.
I do think it would cause a non-zero amount of pain to deploy though. Local (eg corporate) networks that expect to transparently cache the packages would need to move to an explicit apt proxy or face massive surge bandwidth requirements, slower updates.
That said, if you can justify the cost, there is absolutely nothing stopping you from hosting your own mirror or proxy accessible via HTTPS.
I'm not against this, I just don't see the network as the problem if somebody steals a signing key. I think there are other —albeit harder to attain— fruits like reproducible builds that offer us better enduring security. And that still doesn't account for the actions of upstream.
This is a good synthesis here -- downloading and trusting a key over HTTP is folly, but then, so is trusting much of anything that "just works."
If the whole PKI approach is to work, client has got to get trusting that public key right. In regular practice, that probably means checking it against a HTTPS-delivered version of same from an authoritative domain.
(How far down the rabbit hole do we go? Release managers speaking key hashes into instagram videos while holding up the day's New York Times?)
This page is wrong, for example they claim there's no privacy increase here but quite clearly there's a huge difference between an attacker being able to tell "you're downloading updates" and an attacker being able to tell "you're downloading version X.Y of Z package" - worse, that information could actually later be used to attack you based on the fact that the attacker now knows what version to find vulnerabilities in, including non-exposed client software for email, documents, browsers, etc.
It's a relatively insignificant security benefit for most, but could prove an important one for those who targeted attacks are used against.
They're speculating that you can do this using size of the full connection - there's truth to that, but under HTTPS padding will occur causing a round up to the block size of the cipher - meaning that there's a higher chance of overlap in package sizes.
It might actually become quite difficult to do such analysis, especially if multiple requests were made for packages at once in a connection that's kept open. You won't get direct file sizes either, you'd have to perform analysis for overhead and such - in any case it's significantly less trivial than an HTTP request logger.
Even then one is guessing and the other is establishing a fact. That part of the argument didn't sit right with me and I can't imagine somebody in Info Sec or Legal conflating the two..
But it does, because HTTPS can ensure you always get up to date data.
They have a solution involving timestamps for HTTP, but that is still clearly less secure than the guarantee from HTTPS.
I've personally experienced this too: using apt in the presence of a captive portal replaces random bits of `/var/cache/apt` with HTML pages, breaking future updates until you manually find and fix the problem yourself.
The reverse argument also works: using HTTPS may lead you to link and expose, say, OpenSSL where you otherwise would not have needed to. OpenSSL has had dozens of vulnerabilities in the past: https://www.openssl.org/news/vulnerabilities.html
Some of these vulnerabilities have the potential for arbitrary code execution, leaving you worse off than the simpler solution based on the verification of cryptographic signatures that has fewer vulnerabilities by virtue of doing less.
The discussion at https://whydoesaptnotusehttps.com is about the protocol. You can add implementation bug risks to the discussion if you want, but then include the risks from both the approaches being discussed.
You've proposed a reverse argument to an argument that was never made. ctz never said anything about vulnerabilities or implementation issues, they said a captive portal is a problem for apt over HTTP but not HTTPS. This is also true of ISPs that like to insert things into HTTP sessions.
One important factor this article left out is upgrades. If the given HTTPS implementation is broken because of what is now insecure protocols, insecure ciphers etc. Older systems can't update from the mirror if it's updated to use a 'secure' HTTPS configuration while it only supports the 'vulnerable' solution. If HTTPS is left insecure, then it is not much different from using HTTP.
APT's methodology avoids this and as the current signing and protection mechanisms are file based, the worst case scenario is introducing a new file with a new cryptographic signature along side the old schema, to support still updating a system running old security mechanism.
In comparison, trying to run multiple HTTPS servers with different configurations for specific versions of the system being updated would be a significant engineering effort, especially for mirrors.
Huh? All you would do is configure the web server running your apt mirror site to serve the same content on both HTTP and HTTPS ports. If the client want to use TLS, they connect to HTTPS. If they want to use plain HTTP, they connect to HTTP. Both sites serve the same content, which is just a series of flat files. AFAIK, the client is responsible for determining the correct versions for the installed distro based on the indices.
If your installed version is configured for https, but is incapable of using TLS 1.2, because it's rather old, at some point soon, a modern mirror would no longer allow it to connect as 2019 (or maybe 2020) seems to be shaping up as the year to kill support for TLS 1.0 and 1.1. Meanwhile, an http config would continue to work.
>This can lead to a replay attack where an attacker substitutes an archive with an earlier—unmodified—version of the archive. This would prevent APT from noticing new security updates which they could then exploit.
>To mitigate this problem, APT archives includes a timestamp after which all the files are considered stale[4].
> The Valid-Until field may specify at which time the Release file should be considered expired by the client. Client behaviour on expired Release files is unspecified.
Well, of course the client behavior is under-specified -- sometimes the client is a human constructing a URL to download a .deb in a web browser over a corp-approved proxy, and then hand-installing the package with `deb -i`, bypassing all the security checks. Or sometimes there's a caching proxy (or three) between the client and the server. Or maybe IT has modified apt to only connect to repositories maintained by the IT department, and rejects sources from other domains.
Besides the privacy issue of sending package names clear-text, there is a second non-mitigated issue: Censorship.
An MitM could selectively block certain package being installed / update. Imagine using this to prevent: Bitcoin being installed / enforce a ban on crypto without backdoors / block torrent installations.
This doesn't work as well with the 'recognize package size' method because you need to download the entire package before you know the size. Given the need for Ack in TCP, an MitM can't just buffer data until they have the entire package size.
> This doesn't work as well with the 'recognize package size' method because you need to download the entire package before you know the size. Given the need for Ack in TCP, an MitM can't just buffer data until they have the entire package size.
All they have to do is corrupt the final packet and the package checksum fails. An attacker only needs to buffer a single packet worth of data.
I bet there are many FLOSS advocates who don't read that page as being the result of a cost-benefit analysis. They read it as an inspiring story of the rebels winning one against the https Empire. Because I never see the caveat wrt apt that, "of course, we are a super edgy edge-case that should not be used as a model to rationalize a knee-jerk refusal to use SSL for common cases."
I say this because I've corresponded with such advocates about a completely common case for SSL-- setting up a LetsEncrypt certicate, say. The response I often get doesn't make any sense unless I assume they read a page like this and remembered the feels while forgetting all the relevant details that separate apt from their common case.
Even though the packages are signed cryptographically, there are possible risks when using an unencrypted connection.
A man-in-the-middle attack could simply work by serving you a signed, but outdated packages list, preventing your distribution from updating and leaving you vulnerable to security holes. It's the same attack an evil mirror could do as well.
So if you want to be really sure you should probably use two independent mirrors over an HTTPS connection.
The website mentions this towards the end (Replay attacks) : To mitigate this problem, APT archives includes a timestamp after which all the files are considered stale
The time stamp is described here[1], but it is not clear how the expiration date is decided.
Well, defining the expiration date is up to the server. Debian picks a week. Ubuntu does not use it, I have started a launchpad branch last year, but um, I don't know launchpad, so it might take some more years ;)
More precisely, an expiration timestamp is embedded in the repository metadata.
Packages in debian and derivatives are not signed. Instead, the manifest that lists all the available packages and their checksums is signed. That's also where the expiration data is stored.
Even more precisely still, not even the lists of packages are signed. Only InRelease and Release are signed, and they only contain the list of Package files. It's FreeBSD that has the approach of just one signed file containing everything. APT has moved closer to it over the years, but it is not there yet.
"HTTPS does not provide meaningful privacy for obtaining packages. As an eavesdropper can usually see which hosts you are contacting, if you connect to your distribution's mirror network it would be fairly obvious that you are downloading updates."
It is a dangerous mistake to decide what kind of privacy people need. Privacy should be absolute and without conditions.
What if you live in Iran? Some Ubuntu packages are already inaccessible due to government's pornography keywords censorship. E.g. I can't download "libjs-hooker" from this http link http://archive.ubuntu.com/ubuntu/pool/universe/n/node-hooker... from Iran. What if the government decides to censor the "tor" package?
Do we now have a custom domain name on a per article basis?
I find it strange to have a site that is just about one thing that is not that important to most people on a custom domain. If there were pages and pages of information then yes this might make sense but there isn't.
Coming soon...
howtotieyourownshoelaces.com
The premise of this article per domain reminds me of 1998 when everyone thought that instead of search engines people would be typing in URLs, e.g. 'yescupofteaplease.com' so URLs like 'pets.com' were seen as goldmines-to-be.
I see it as more akin to a vanity plate; no one expects it to be functionally useful, but it's something people will see and so it is somewhat decorative to put something there.
As an exploit analyst currently focusing on network traffic, can we stop all this fascination with SSL/TLS? TLS is incredible, but let's use the right tool for the job. Contrary to Let's Encrypt motto, applying TLS to _everything_ can be bad for security.
Let's Encrypt, when are you going to revoke placeimg.com's certificate? The site has been pushing Exploit Kit's malicious payloads since Jan 18 2019 via SSL. Many Flash/IE users are getting infected because most firewalls are unable to peer into SSL tunnels signed by you.
(To be fair, Let's Encrypt is not the only cert authority getting abused (Comodo, yes you))
>To mitigate this problem, APT archives includes a timestamp after which all the files are considered stale
How often is this, practically? If I'm understanding this right, each new timestamp would come only with a package upgrade, meaning the time period is quite a long time indeed, long enough for a replay attack to work. I would argue that there should be a mechanism requiring a signed the-latest-package-is-X message updated at least every day or so.
Edit: it looks like this is actually what's going on. The page wasn't clear, but it is a metadata "Releases" file that is timestamped, not the packages themselves.
The security repository generally serves up a field in its metadata saying that the data shouldn't be trusted for more than 7 days, if it hasn't changed since 2014 when I encountered this duration as part of my day-job work. It's safe to assume the trusted duration hasn't increased, at least.
Even though replay attacks will be of no use after some time, one could MITM during the vulnerable timeframe to prevent a critical (0day) security update from happening and therefore gaining control over the system, so if you're specially targetable, I believe it's totally worth switching to an HTTPS mirror instead.
Other than that, most should be safe ig.
One annoying thing is that the page name is misleading. apt does use/support https, it's just that Debian chooses for its default mirrors to it be optional.
We like to talk about, say, the compromise of integrity and/or authenticity, information leak and so on, but they are not only things TLS/HTTPS was prepared for. Indeed, it is often overlooked that we have two kinds of exploitations in this space. I tend to label them as "active" and "passive". One of the best known passive exploitations is a JavaScript injection from ISP---Comcast did it in 2017 [1] for example. They alone are typically harmless or annoying at best, but they are indicative of the real security problem lurking around, and often can evolve into active exploitations.
It might be probably true that APT is a simple service that does not require the full TLS capability. But APT is only prepared for active exploitations. Passive exploitations will effectively compromise the availability, by compromising the integrity in the relatively predictable way. I don't think APT is also prepared for passive exploitations---casual users will be much more prone to them.
"Max Justicz discovered that APT incorrectly handled certain parameters during redirects. If a remote attacker were able to perform a man-in-the-middle attack, this flaw could potentially be used to install altered packages."
Far easier to do a MITM attack when apt isn't using https by default
>Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer[2]. HTTPS would therefore only be useful for downloading from a server that also offers other packages of similar or identical size.
That seems like a false dilemma imho.
What's preventing APT from splitting up downloads into identically sized chunks of say 4kb?
The "Date, Valid-Until" timestamp mitigates replay-attack, cool.
But what if a vulnerability is discovered in a package and an MITM attack prevent you from downloading the new patched version, making you believe your system is up to date although it's not ?
There's also a current and ongoing "discussion" (I say pejoratively) regarding the same issue with VideoLAN. They can't (easily) use HTTPS because they don't control the mirrors. However the package download/update itself is also signed with GPG, thus guaranteeing tamper resistance.
Apparently Windows Update uses TLS without encryption (null cipher) which still provides integrity and authentication: https://twitter.com/swiftonsecurity/status/74533301869746995.... Debian's signing achieves the same result by signing the package list containing the packages' hashes with a known key.
> providing a huge worldwide mirror network available over SSL is not only a complicated engineering task
Nobody said each mirror couldn't have its own certificate and its own domain. The host names of these mirrors are usually obtained from a trusted source.
HTTPS in addition to GPG signatures does no harm. It also means that one would have to compromise two entities, my distro's GPG key and my mirror's certificate.
Beware of strangers bearing gifts. There is a lot of https scaremongering from a certain subset of people with objectives one finds difficult to understand.
If they were genuinely concerned about privacy, security and surveillance they would have a lot to say about the out of control surveillance economy and its participants including their employers, the security model of browsers, mobile operations systems, javascript, systemic user stalking and hoovering up data.
Yet those stories are mainly driven by people outside the tech ecosystem and within there is mostly silence interspersed with vague economic justifications so there is something truly bizarre about https extremism on the grounds of privacy.
If your packages are already signed why do you need a middleman? That is a better trust model for a distributed Internet than a centralized CA that is not accountable to individuals and can be compromised by power. Here people are neck deep in business models stalking people online 24/7 and tracking their location and some are 'concerned' about protecting the list of packages you download?
> Here people are neck deep in business models stalking people online 24/7 and tracking their location and some are 'concerned' about protecting the list of packages you download?
That's believable. "There are a thousand hacking at the branches of evil to one who is striking at the root." - Henry David Thoreau. Nice word "hacking".
At a research institute I worked for, we had a proxy server that intercepted http. The antivirus software on the proxy made it impossible to download some security updates once in a while.
Fortunately it was possible to change the ubuntu URIs to https:// to use HTTPS and the broken antivirus sofwate did not intervene anymore.
I wish APT used HTTPS if only to be able to tell my proxy "only allow HTTPS" and never have to worry about unencrypted traffic ever. Currently it's the only HTTP trafic my server are generating. Plus it's polluting my proxy logs as instead of having only the domain as it's the case for all the other log, I have the full URLs :)
So they don't consider privacy beyond which host I contact, which is weird, because I'm pretty sure which version of packages I have is relevant to an attacker.
And in case of a zero-day exploit it would be really handy not having to wait for some global timeout during which I won't see any updates.
I think https by default would be a wonderful addition. I understand the complexities from the project's perspective. I understand the users wants for privacy. If the project not defaulting to https is that much of a privacy issue for you, set up a local mirror then point your boxes to that. Yes, someone could potentially still snoop your local network. But if they can do that, you've got bigger issues to worry about than the fact that they can see what packages you're installing.
Those arguments are invalid.
Because debian/ubuntu fail to use https regimes like egypt/syria can track people who try install tor and they can cherry-pick block repos based on package name.
I'm sorry for not liking your favorite color and distro. Please deal with it.
And btw they don't digitally sign their package too (they sign separated meta data file having checksum which is not equivalent of embeding signature inside the package and validate it).
Compare that to yum/rpm which use secure https and signed rpm and signed metadata (both the medium and the payload are secured)
Your individual statements are correct, but they do not add up to valid argument in this case.
Kazakhstan forces their citizens to install government-issued certificate to use SSL. This allows Kazakhstan to track their citizens. Which proves, that a regime can track it's citizens even in presence of SSL encryption. In other words, using SSL/PKI does not inherently prevent tracking by powerful entities. You need to create your own government for that.
It is naive to think, that regimes like egypt/syria/US can't track people, while at the same time being able to exert overwhelming physical force over the exact same people. If you can force someone to hand over encryption keys, you can track them. Different countries do the same thing, everyone just picks their preferred ways: physically controlling Certificate Authorities in case of US, handing over encryption keys in case of Great Britain.
> Compare that to yum/rpm which use secure https and signed rpm and signed metadata
No, using more "secure" technologies does not amount to better security.
You can block based on a name because you see the name before the package is sent.
You can't block based on package length, because you need to let the entire update through before you know the length. At that point, it's too late to block. Buffering the entire message doesn't work because TCP expects ACKs.
A) you can buffer and send acks to the server and then trickle the data to the client
B) in the interest of memory usage, you could not buffer, and send selective acks to the server -- once you decide to allow it, stop blocking the first data packet, and let the client ack that without the sack and let the server retransmit.
c) b, but for network efficiency, actually let the client receive all packets but the first, and sack them itself --- then when you do allow the first packet, the rest of the packets won't need to be retransmitted.
That doesn't make sense? If you have the capability to refuse all http packages, you can still refuse all https packages coming from debian. My comment was for refusing specific packages in https. Buffer the first package and ack. Then wait the rest, count the bytes. If bytes == N then I know this person is downloading tor, refuse the first package such that they can never download tor.
It is. You want to download Tor in some hard to track way, you probably shouldn't use an easily trackable source. There are better options including getting sent an email and torrents.
And given the scope of the attacker, fingerprinting by size and server is trivial so easily https adds nothing related to anonymity nor security.
My corporate network used a transparent https man-in-the-middle proxy. That is, transparent as long as you're using a Windows machine that has been configured by the help desk and using applications that utilize the Windows certificate store, which doesn't include git, pip, or npm.
If pip or npm used gpg signing of packages rather than https, this wouldn't be a big deal. But as it stands, it's a nightmare on various Windows systems, Linux boxen, and Docker images to get the code you need.
HTTP/2 is more likely to be a downside than an upside for apt. You would have additional framing overhead, and I can't see a benefit over pipelined http/1.1 be (which is definitely doable because all the requests are get for static files). Maybe header compression of the small headers could be a very minor win. Multiplexing within a tcp stream is at best neutral for this case, but probably negative: on the client side, it means the possibility of multiple partial files instead of one; on the server, it means more parallel demand on the file system.
The design of HTTP/2 helps websites more. It supports push since webpages include resources that a server knows a client is going to need anyways. This avoids request latency for websites. It also allows connection reuse (via multiplexing. However, again this is mostly helpful for websites. Decrease latency by avoiding tcp hand shake. It allows for slightly smaller headers since it's a binary protocol.
However, pretty much none these really help with large file downloads, and other use cases like APT. Even the headers will be dwarfed by the files themselves, and the header information is probably already pretty minimal.
Also HTTP/2 does not require TLS it's just pretty much all the browsers ignore that the standard does not require TLS. However, they are trying to push for more encrypted traffic. A decision I don't really think browsers should be making.
> HTTPS does not provide meaningful privacy for obtaining packages.
That's a very subjective and circumstantial statement being passed off as fact.
Just because one can list a few scenarios where HTTPS wouldn't prevent a malicious actor from achieving their goals, doesn't mean HTTPS doesnot increase security/privacy for apt in other situations.
Further, in certain contexts, guessing and proving are not fungible.
Layering HTTPS on top of the existing authentication system adds the obvious benefit of confidential communication, such that an eavesdropper can't see what packages you're downloading. There's an argument that you can infer the package from the size of download, but this is defeated by downloading partial regular sized chunks of the file with some overlap to mask the actual size.
The only sensible argument against HTTPS seems to be infrastructure cost. This made more sense in the past, but nowadays hardware crypto acceleration (e.g. AES-NI) is commonplace, and certificates can be obtained for free.
Ultimately it's up to mirrors to decide if they're willing to provide HTTPS.
Attacker can simply count the bytes. They can buffer one packet, send ack, and reject it based on bytes transferred. Also it's trivial know when next package starts. Apt doesn't steam all packages at once, it first send first package with http keep alive then waits until client orders the second package.
nothing justifies debian making you `apt-get install apt-transport-https` over `http` before you can add your own repos which are https only of course. but hey, it's debian.
My local Debian mirrors support HTTPS, and I assume other mirror sites worth their salt do too. Easy enough for Debian to redirect to your local mirror.
The local apt-cacher-ng instance I run in my office network cannot be redirected to by Debian, because cannot be aware of it. The apt client will need to build support for local proxies.
As it stands right now, apt-cacher-ng cannot work with https sources.
Fedora handles this use case beautifully with MirrorManager, which includes the EPEL repos as well. All of the logic is server side, when a yum/dnf client connects to the Metalink server to fetch a mirror list from our IP block it gets sent our internal mirror - I wish more distros had similar setups.
Yes, the mirror is configured as private and is only served to machines in my IP range - since it’s on the internal network it does nobody else good to have access.
I am seeing quite a bit of misinformation about how package managers work so I'd love to share what I have learned. I work with index files on a daily basis, and we might possibly generate more index files than any other organization on the planet. Here is my chance to share some of this knowledge!
TLDR/Summary
We can trust the Release file because it was signed by Ubuntu. We can trust the Packages file because it has the correct size and checksum found in the Release file. We can trust the package we just downloaded because it is referenced in the Packages file, which is referenced in the Release file, which is signed by Ubuntu.
Some basic package manager principles
I work with APK, DEB, and RPM based package managers and each of them behave very similar. Each repository has a top level file, signed by the repository's maintainer, that includes a list of files found in the repository and their checksums. When your package manager does an update, it looks for this top level file.
For DEB based systems, this is the Release file
For APK based systems, this is the APKINDEX.tar.gz file
For RPM based systems, this is the repodata.xml file
These files are all signed by the repository's gpg key. So the Release file found athttp://us.archive.ubuntu.com/ubuntu/dists/bionic/Release and is signed by Ubuntu and the gpg key is included in your distribution. Let's hope Ubuntu doesn't let their gpg key into the wild. Assuming that Ubuntu's gpg key is safe, this means that the system can verify that the Release file did in fact come from Ubuntu. If you are interested, you can click on the previous link, or navigate to Ubuntu's repository and open up one of their Release files.
Release file
In the Release file you'll see a list of files and their checksum. Example:55f3fa01bf4513da9f810307b51d612a 6214952 main/binary-amd64/Packages
The left column is the checksum, then the size of the file, and lastly the location of the file. So we can download the files referenced in the Release file and check them for the correct size and checksum. The Packages or Packages.gz file is the one we care about in this example. It contains information about the packages available to the package manager (apt in this case but again, almost all of the package managers behave very similar).
Packages file
Since we know that we can trust the Release file (because we have proven it was signed by Ubuntu's gpg key), we can then proceed to download the contents of the Release file. Let's look at the Packages file specifically as it contains a list of packages, their size, and checksum.
The Packages file includes a list of packages with information about where the file can be found, the size of the file, and various checksums of the file. If you download a file through commands like apt install and any of these fields are incorrect, apt will throw an error and not add it to the apt database.
It's time to debunk some myths!
Can an attacker send me a fake Release file?
Sure, but apt will throw it out because it's not signed by Ubuntu (or whoever your repository maintainer is like centos, rhel, alpine, etc)
Can an attacker send me an old index from an earlier date that was signed by Ubuntu that has old packages in it with known exploits?
Sure, but apt will throw it out because it will have a date (in the Release file) that is older than what is stored in the apt database. For example, the current bionic main Release file has this date in it: Date: Thu, 26 Apr 2018 23:37:48 UTC So if you supply it with a Release file older than that timestamp, it will throw it out because it is older than what it currently knows about.
I hope this helps clear the air!
Shameless plug. If you are serious about security and not just compliance, check out our Polymorphic Linux repositories. https://polyverse.io/ We provide "scrambled" or "polymorphic" repositories for Alpine, Centos, Fedora, RHEL, and Ubuntu. We use the original source packages provided in the official repositories and build the packages but with memory locations in different places and ROP chains broken.
Installation
Installation is a one line command that installs our repository in your sources.list or repo file. There is no agent or running process installed. It is literally just adding our repository to your installation. The next time you do an `apt install httpd` or `yum install docker` you'll get a polymorphic version of the package from our repository. You can see it in action in your browser with our demo: https://polyverse.io/learn/
What does it do?
Many of the replies in this post referenced an attacker tricking a server into an older version of a package that has a known exploit. We stop this. Even if you are running an old version of a package, with a known exploit, memory based attacks will not work on the scrambled package because the ROP chain has been broken or as we call it "scrambled". So with our packages, you can run older versions of a package and not be effected by the known exploits. This also means that you are protected from zero day attacks just by having our version of the package.
FREE! For individuals and open source organizations you can use our repositories for free. I hope you try it out!
Let's encrypt is also "hilariously" centralised. What do you do if Let's encrypt is blocked in your country or they refuse to issue you a certificate because of US sanctions?
Block Let's Encrypt? How would that even work? They'd have to block all addresses where the webserver uses Let's Encrypt certificates. Or MITM your SSL connections. If Let's Encrypt is blocked, you're not getting the full Internet. Complain with your ISP or government if they do that!
If Let's Encrypt refuses to cert your domains, got to another provider. They're not the only one.
I still don't see how that would be a problem Debian should consider in their considertions around SSL. After all, if your server's upstream is blocks access to Let's Encrypt servers, why are you trying to operate an official Debian mirror there? What's the point of having a Debian mirror connected to such a shitty upstream?
There is also the possibility that Let's Encrypt will not issue your domain TLD a certificate if for example there are US sanctions against that country.
There is a strong need for a second "Let's Encrypt" based in another country other than the US. Preferably two or three based in Europe and Asia.
Should be HTTPS by default just for the privacy of downloading packages without any intermediaries knowing the package names. It could easily be collected and used in attacks against users of vulnerable versions.
Did you even read the article? This is explained at length. Https doesn't give you that benefit since attacker can still single out package based on length and in https address is in plaintext.
I feel the HTTPS hysteria is going too far. For some strange reason people started to consider it a security panacea, without often understanding its limitations. Accusations against APT are a perfect example (it's built-in security mechanisms are superior to what HTTPS has to offer).
Secure against modification, is not the same as secure against information leakage and privacy.
Using a non-encrypted connection means that it's trivial to work out what packages you download. Using a secure connection at least makes it 1 step harder to infer that information.
However, whether packages should be kept private is all-together another question. I argue that OS updates and packages related to that do not need to be kept private, but applications packages do.
> Using a non-encrypted connection means that it's trivial to work out what packages you download. Using a secure connection at least makes it 1 step harder to infer that information.
It is trivial still in the case of APT. That's exactly my point: people start believing HTTPS will protect them against many attack vectors it doesn't. That's OK for uninformed people to believe in the magic of a padlock in the address bar, but technical folks should really know better.
If I have access to your network traffic and intend to see which packages you download by apt, I will do it irrespective of whether you use HTTPS or not.
That’s like saying I don’t need a helmet because I have all of this body armour.
At a minimum, HTTPS prevents leakage of information about your configuration but there are several direct attack vectors listed in other threads. Please stop calling it “hysteria”.
> If you use homosexuality in order to insult/make fun of someone, it is homophobic. You might disagree, but that’s irrelevant. It is. Calling members of the projet on their personal phones and insulting them is probably a bit of an overreaction, don’t you think?
I trust the file signatures, but if you need to write a full article arguing something is secure then it can be made more secure by making the system simpler and more standard.
One thing that wasn't touched on - the mirror network. There are 100's of mirrors for all the major distros ran by third parties, if e.g. Debian wanted apt over HTTPS, they would need to hand out a debian.org SSL cert+key to all of them.
(And convince them all to take the CPU overhead hit of TLS)
EDIT: Since I can't reply to all the downvoters, I'll add here.. LetsEncrypt does not solve this. http://us.archive.ubuntu.com/ - that goes to likely 10's of different mirrors. Which one will the LetsEncrypt verification call hit?
That was largely the case until recently, but I don't think it would be difficult now to set up LetsEncrypt and do HTTP-01 challenges. A slightly more complicated set up, but one entirely within the Debian org's means, would be to use the DNS-01 challenge.
But to the article's point, the DNS requests and IPs and file sizes would all be largely transparent, and that's probably enough to figure out what's being downloaded.
On the other hand, HTTP/2 could improve throughput and ensure proxies aren't tampering or replaying.
Actually, no, the mirrors may not be able to verify themselves with let's encrypt.
A hostname like ftp.us.debian.org resolves to many different mirrors, and may not resolve consistently around the world -- if that's the case, let's encrypt will not be able to verify the hosts through http challenges.
Also, 1gbps is pretty small. I can't find any documentation on traffic, but I'd imagine mirrors in popular places are at least on 10gig.
I have a 10Gbps uplink on my personal server and I can easily saturate it without getting capped on CPU or RAM.
LE offers other challenges to verify hosts, like DNS verification. DNS verification can be done easily with an external API for mirror owners to hit (most ACME clients offer DNS challenge with the standard update protocol for DNS which can be secure appropriately).
Mirror owners don't control the DNS for debian.org.
Debian could get the certificates, but getting the certificates was never the issue -- some CA would be happy to issue certificates for little or no cost to help Debian and gain mindshare. Coordination between 3rd party, volunteer mirror owners and the Debian organization is the issue.
What kind of CPU and traffic patterns are you using to hit 10 gbps of TLS protected traffic?
>What kind of CPU and traffic patterns are you using to hit 10 gbps of TLS protected traffic?
Mainly serving a file directory with apache with files ranging between 100M and 1G in size. Should be easily comparable to Debian or Ubuntu repositories.
Debian mirrors are going to get a lot of _very_ short connections for people checking if there are any updates frequently. Those are effectively zero bandwidth, but you need to do a full TLS handshake, which is the most expensive part.
A dedicated server of decent size at the right hoster starts at 25€ a month. That is not that much and I bet most of the volunteer servers are already above this price range.
I love how complex this has become. Debian now needs to build a custom API, or use DDNS secured just right for LE verification records (which would give mirrors full access to obtain any debian.org cert, as the DNS-01 challenge doesn't contain enough info to filter on. You either control DNS, or you don't - DNS-01 doesn't have a middle ground of owning a single record AFAIK, and it certainly doesn't have the concept of shared ownership of a single record without race conditions..).
Now multiply that by the 1000 or more projects that each mirror syncs content from, all with something different because nothing standard exists.
LetsEncrypt is great, I love it, I have 10's of certs for personal stuff from them. I think they've completely changed the CA landscape, hopefully forever.
However I'll say it again, LetsEncrypt does not solve this problem. That's OK. LetsEncrypt doesn't have to solve every problem with TLS!
>DNS-01 doesn't have a middle ground of owning a single record AFAIK,
With DNS-01 you only own up to the domain you verified. If you verify ftp.de.debian.org then you can't issue certs for de.debian.org or debian.org but you can issue for www.ftp.de.debian.org.
I see an issue with that - but it's possible I'm too paranoid when it comes to many third parties being able to issue certificates for my domain on names I wasn't expecting.. Each to their own!
Either way - assuming restricting issuance to exactly 1 name is a solved problem.. This:
What? No.
The mirrors would just need to install a letsencrypt-compatible client and setup SSL via that.
is still a far cry from reality thanks to all the other issues.
If anything, it's: What? No. Certs are just the tip of the iceberg, even if LetsEncrypt solved that problem neatly (and they don't), you have ignored the massive complexity of the issue, both the technical and organisational issues.
It will have to be solved considering an major vulnerability was released today that allows any attacker to get root-level RCE by manipulating the HTTP Response.
HTTPS as default would have severely reduced the attack surface for this bug.
That's only for the mirrors which have a shared debian.org hostname. If you use e.g. mirrorbrain it'll redirect to the closest mirror.
Based upon https://www.debian.org/mirror/list, it seems they all have pretty much unique hostnames (ftp.<COUNTRY>.debian.org). You can easily get a certificate for that.
In brief: It's not a huge issue to get a certificate.
Another scary thing is that despite not including it, when you do have a need for it you're advised to install apt-transport-https[0]. I can't remember the specific package that requires me to go this route, but it always reminds me "oh they're not using https", I'm surprised they don't pull over SSH or something, doesn't git work with SSH as well?
Apt supports SSH (and https). The reasons not to do it for the default repos are explained in the article -- caching, complexity of keeping https certificates uptodate.
Caching, or more precisely the benefit of ability to cache HTTP response payloads (on shared caching proxies for example) is, to my surprise, actually not mentioned in the article. I'd expect it to be the number one reason to stick with HTTP in this case, not the described hassle and futility of HTTPS.
In my experience, the number one reason to NOT stick with plain HTTP would be "transparent" shared caching proxies which cache older copies of files long after they've been replaced on the server, cache incomplete downloads as if they were complete downloads, and misbehave in several other annoying and hard-to-diagnose ways. By using TLS, these broken middleboxes often can be bypassed.
Ah, it is interesting insight, and sure, debugging with such broken middlebox must be serious bummer I believe. Are they really that frequent? (I'm not devop, so asking in all sincerity, since I cannot recall dealing with them myself.)
Anyway, I assume there are very few (if any?) scenarios with no way to overcome such bug in broken proxy (yes, after that exhausting investigation, but still) and there are obvious potential benefits in scenario when such middlebox does its work well.
With TLS (as I understand it) those potential benefits as well as potential bugs are just tossed away together (I'd not use therm 'bypassed' here) and every single download is forced to be made along full wire length, what could be pretty nasty in some locations. Eric Meyer recently wrote interesting article [0] on this topic. (I understand it is all quite obvious stuff, but well expressed IMO.)
The point of the article is that Debian's trust model does not privilege the package servers - anyone can set one up, using whatever hair-brained technology they want (even, say, BitTorrent). The packages are individually signed and verified. SSL doesn't get you anything. It's the end-to-end principle in action.
With TLS, Eve cannot see (as easily) which packages Alice is downloading from Bob's mirror. If she could, Eve could use that information to decide which exploitable applications to target on Alice's machine.
This was discussed elsewhere in the same discussion - basically the size of a download would be too similar for many packages and the fact that there are keep-alives would make it look like on impossible to decrypt stream anyway with no sizing data at all.
HTTPS might prevent some kind of surveillance, entities (ISPs, governments, any other MITMs) from determining what kind of packages you're installing, and this might be beneficial, but plain old unencrypted HTTP is often faster if these aren't a big concern (they likely aren't in most developed nations and for most occupations). Lack of encryption overhead as well as transparent proxies being able to serve files are huge boons to this.
The article points out that HTTPS offers almost no protection against this kind of surveillance, since packages are almost uniquely identifiable by size.
With HTTPS you would be able to use HTTP/2 which means you can multiplex a single connection and once you download more than 2 packages, identifying which you installed is impossible.
With plain HTTP it remains possible no matter how much pipelining you do.
No you would not be able to use HTTP/2. HTTP/2 is not implemented in apt, and probably won't be for a very long time, as it directly clashes with apt internals.
That said, pipelining over https is surely possible to, and reduces the risk.
That said, if you're installing security updates automatically, as you should, anyone will know anyway, as there are only about 3-5 possible combinations of updates you'll be downloading on a particular day in one session.
I'm aware that APT does not use HTTP/2 but it would be able to use it with HTTPS.
With automatic security updates, the risk of an attacker finding out what packages you have is less valuable considering you are installing the latest patches.
It would be more interesting if that doesn't happen, in which an attacker can learn what you have installed and wait until exploits appear. Automatic updates would negate this attack model.
edit: As I've demonstrated in a sibling comment; even 5 packages is already out of scope as solving which packages they are is a task of millenia. If you use 4 it could possibly be done by throwing a supercomputer at it for a few months.
Well, maybe; how do HTTP/2 servers allocate bandwidth? If they do it per-stream, you can still identify the size of each package by watching the total connection bandwidth decrease when each package ends.
If they allocate it for the whole connection, then all they get is a total, which still gives the attacker some information (there's a limited combination of packages that sum up to that number).
You are assuming, that all packages are equally interesting (and your math does not account for package dependencies).
In practice, attacker either does not care about your packages, in which case hiding that information gains you nothing, or wants to be alerted, when you (or anybody else) install one of few specific packages. Those combinations can be computed in advance and identified in traffic.
Not necessarily, I think you underestimate the complexity of this attack.
Even if you were interested in a few packages, if any additional packages are mixed in or if dependencies are already installed, this problems become a lot harder again.
While HTTPS doesn't make such an "attack" impossible, it makes it very hard and compared to HTTP the attacker cannot inject or replace data (replay attacks are possible with APT on plain HTTP)
> once you download more than 2 packages, identifying which you installed is impossible.
Not impossible, just marginally harder (if timing attacks can be though of as "hard").
Keeping your specific packages of choice in secret does not buy you anything anyway. The attacker with access to your traffic will always knows, when you perform system updates, which is more important than names of specific packages.
Because why bother? They’re checking hashes of packages and digital signatures of the hashes. Why tax your system with needless session encryption of large data transmissions?
Speaking of "scary things"... Don't they default to decentralized updates in Windows 10? Sounds like simply joining the swarm will disclose anyone, what update packages you download and when.
Does that mean, that Windows 10 uses "less secure" protocol for downloading it's own updates, than for downloading WSL updates with apt?
I have a few problems with this. The short summary of these claims is “APT checks signatures, therefore downloads for APT don’t need to be HTTPS”.
The whole argument relies on the idea that APT is the only client that will ever download content from these hosts. This is however not true. Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors. At the very least Debian should make sure that there are a few HTTPS mirrors that they use for the direct download links.
Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.
Finally the chapter about CAs and TLS is - sorry - baseless fearmongering. Yeah, there are problems with CAs, but deducing from that that “HTTPS provides little-to-no protection against a targeted attack on your distribution’s mirror network” is, to put it mildly, nonsense. Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.