Wikipedia’s Switch to HTTPS Has Successfully Fought Government Censorship

shpx · on May 30, 2017

It won't last, at least for China. Their government is working on a clone of wiki, scheduled for 2018[0]. Once that's done they'll likely completely ban the original.

Wikipedia publishes database dumps every couple of days[1]. So it shouldn't be that expensive for smaller governments to create and host their own censored mirror. You'd maintain a list of banned and censored articles, then pull from wikipedia once a month. You'd have to check new articles by hand (maybe even all edits), but a lot of that should be easily automated, and if you only care about wikipedia in your native tongue (and it's not english) that's much less work.

The academics will bypass censorship anyway, since it's so easy[2], so an autocrat won't worry about intellectually crippling their country by banning wikipedia. Maybe they don't do this because the list of banned articles would be trivial to get.

Better machine translation might solve this by helping information flow freely[3]. We have until 2018 I guess.

[0] https://news.vice.com/story/china-is-recruiting-20000-people...

[1] https://dumps.wikimedia.org/backup-index.html

[2] https://www.wired.co.uk/article/china-great-firewall-censors...

[3] https://blogs.wsj.com/chinarealtime/2015/12/17/anti-wikipedi...

Markoff · on May 30, 2017

wut? everyone in China already use Baike instead of Wikipedia, nobody really understand why they are making another website

runn1ng · on May 30, 2017

To compare

Chinese Wikipedia has 940,000 articles, baike has 6 million articles.

dmix · on May 30, 2017

Wikipedia editors are pretty strict about what gets to remain a page. Everyone knows they delete articles unless it has lots of sources and public interest.

With 6x the articles on baike I can't imagine that there is that level of quality control. Unless there are 6x as many things worth documenting in China vs rest of the world.

An interesting statistic none-the-less.

chucksmash · on May 30, 2017

English Wikipedia has 5.4 million articles. I imagine many of them would be notable in Chinese too.

ttflee · on May 30, 2017

Baidu Baike has not changed its portrait of bad quality/credibility issue/plagiarism from my memory.

ue_ · on May 30, 2017

That doesn't surprise me; in Japan there are various Japanese wiki sites, usually with less information than even the Japanese versions of Wikipedia, but still with more articles. Usually they even have comment sections below the articles, which can become quite toxic, at least on certain articles.

lactau · on May 30, 2017

Because it is a for-profit website with strict content re-use policy?

Yizahi · on May 30, 2017

Russia will follow soon. They already heavily editing Russian Wiki for "inconvenient" information.

thriftwy · on May 30, 2017

Russian language definitely doesn't belong to Russian Federation the state but to a wide range of Russian-speaking people worldwide.

If you see that somebody spins the Russian Wiki, you should definitely try to make it right to the extent suggested by Wikipedia norms.

Yizahi · on May 30, 2017

Won't work, they are really closed in moderation. A few years ago they even renamed this article (https://en.wikipedia.org/wiki/Kievan_Rus%27) because they have identity crisis - they try to pose as the oldest part of slavic nations, the core nation and therefore must be obeyed (literally). So to shift the history they renamed the article to "Ancient Rus" to make people forget the Kiev part. (not the only thing they do of course)

As another exhibit: https://en.wikipedia.org/wiki/War_in_Donbass

This article in RU denies any involvement of Russia in Russian-Ukrainian war, however weird that may sound. They are either complicit or so deep in denial that it is impossible to talk to them about the war.

Currently Russian Wiki segment can't be trusted except for bare facts and non-political entries.

dbdr · on May 31, 2017

> A few years ago they even renamed this article (https://en.wikipedia.org/wiki/Kievan_Rus%27) [...] to "Ancient Rus" to make people forget the Kiev part.

The russian version of that article is currently Киевская Русь [1] (Kievan Rus), though Дре́вняя Русь (Ancient Rus) is listed as a synonym. So it seems that specific change has been reverted, right?

[1] https://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D0%B5%D0%B2%D1%81...

Crontab · on May 31, 2017

And probably England too - who has recently displayed some really disturbing desires to censor and control the Internet.

wfunction · on May 30, 2017

Why hasn't it been done yet? It's not like Wikipedia is a new thing.

imron · on May 30, 2017

Baidupedia has been around for nearly a decade.

gman83 · on May 30, 2017

Which is a total ripoff of Wikipedia:

https://en.wikipedia.org/wiki/Wikipedia:Mirrors_and_forks/Ba...

imron · on May 30, 2017

It is! And that's exactly what wfunction was wondering.

wodenokoto · on May 30, 2017

I thought China already blocked https, so switching to https only would effectively ban/block wikipedia.

bartread · on May 30, 2017

I don't believe so. I just tested one of my own websites, which only serves over HTTPS, from Hong Kong (admittedly a special case), and Beijing. It worked fine from both. Surprisingly, because I thought Adsense was blocked, an advert even appeared on the Beijing screenshot. On the other hand, it reported as temporarily unavailable from Shanghai.

I was using https://www.dotcom-tools.com/website-speed-test.aspx?se=1403....

wodenokoto · on May 30, 2017

Okay, thanks for testing for me. When I lived in Shanghai 5 yrs ago, I had a lot of trouble connecting to https and whenever possible, I would try and connect unencrypted.

bartread · on May 30, 2017

That's interesting. My admittedly flawed understanding is that the Great Firewall of China isn't implemented with a unified set of policies, but rather varies from province to province, which might help explain your experience and my test results.

awinter-py · on May 30, 2017

Can an expert comment on side-channel attacks on HTTPS and whether they're less viable on HTTP/2?

My assumption is that because wikipedia has a known plaintext and a known link graph it's plausible to identify pages with some accuracy and either block them or monitor who's reading what.

I also assume that the traffic profile of editing looks different from viewing.

chimeracoder · on May 30, 2017

> My assumption is that because wikipedia has a known plaintext and a known link graph it's plausible to identify pages with some accuracy

At least in theory, the latest versions of TLS should not be vulnerable to a known plaintext attack. TLS also is capable of length-padding, which would reduce the attack surface here as well for an eavesdropper.

My understanding is that HTTP/2 makes it even more difficult to construct an attack on this basis, because HTTP/2 means multiple requests can get rolled into one.

Of course, all this is assuming an eavesdropper without the ability to intercept and modify traffic. In practice, governments will probably just MITM the connection - we have precedent for governments abusing CAs like this in the past - and unless Wikipedia uses HPKP and we trust the initial connection and we trust that the HPKP reporting endpoint isn't blocked, then it's still possible to censor pages, without anybody else knowing[0].

[0] ie, the government censors will know, and the person who attempted to access the page will know, but neither Wikipedia nor the browser vendor would be able to detect the censorship automatically.

colmmacc · on May 30, 2017

TLS1.2 doesn't have an effective padding scheme, and with most sites (including Wikipedia) moving to AES-GCM and ChaCha20, it is actually less effective than the primitive CBC padding, which provided some protection.

TLS1.3, which is still a draft, does have support for record-level padding, but I haven't seen any of the experimental deployments using it.

HTTP/2 does have support for padding, but again, it's not common to see it being used, at least not in the kind of sizes it would take to obscure content fingerprints.

Wikipedia is a particularly hard case for traffic analysis fingerprinting. First, the combination of page size and image sizes are just highly unique, even modulo large block/padding sizes. But more importantly, anyone can edit a wikipedia page, so if the size of a target page isn't unique, it's very easy to go ahead and edit it to make it so. It would take very large amounts of padding to defeat this.

So it's definitely possible to fingerprint which wikipedia someone is browsing. But it's probably not easy to block it; the fingerprint is only detectable after the page has been downloaded. So it's not very useful for censorship.

chimeracoder · on May 30, 2017

> But it's probably not easy to block it; the fingerprint is only detectable after the page has been downloaded. So it's not very useful for censorship

Well, it's detectable after the request has been made and Wikipedia sends the response. Assuming that a government has the capabilities to block delivery of that response (which they do), they can still implement censorship at this level, before the page reaches the end user.

TorKlingberg · on May 30, 2017

> In practice, governments will probably just MITM the connection

If they routinely MITM connections they will quickly be found out, and the CA would be removed from browsers.

mfjordvald · on May 30, 2017

Except China has their own browser made by a state controlled company that a lot of people use. This browser is already demonstrated to accept the government CA and ordinary people in China don't care.

smcl · on May 30, 2017

What's the browser, or do you mean Opera?

mfjordvald · on May 30, 2017

360 Secure Browser by Qihoo, we saw back in 2014 that it was already compromised then: https://en.greatfire.org/blog/2014/oct/china-collecting-appl...

thriftwy · on May 30, 2017

If some websites will break down in this browser while working in other browsers, people will probably notice.

Spoom · on May 30, 2017

Why would they break? If the MITM certificate is trusted at the local level, everything should work fine in their browsers.

thriftwy · on May 30, 2017

Except some pages won't be there (that's what censorship is isn't it?)

cyphar · on May 30, 2017

And one thing to note is that people generally don't randomly pad the length of articles, so it's not _very_ difficult to figure out what articles you might be reading -- even over TLS.

colmmacc · on May 30, 2017

Random padding wouldn't really help; an active attacker can force retries, so the random distribution can be mapped (and then subtracted). To defeat TA you need to pad to a fixed length for all cases, or for a very large amount of cases.

E.g. if every wikipedia page, plus all of the content it includes, came to exactly 10K, 20K, 30K, ... in size, then you could obscure what the user is reading.

giobox · on May 30, 2017

I've seen the theory that you could work out which pages are loaded from wikipedia over SSL by looking at other metrics like content length etc, but thanks to stuff like gzip compression, caching headers etc, this is much harder to exploit in practice. Plus there's the huge overhead of maintaining a database to link the frequently changing metrics back to the appropriate page on wikipedia. There's a great link somewhere (which of course I now can't find) where somebody prototyped this idea and found it really pretty hard to implement.

In the event this was even tried, it would presumably be trivial to defeat with injection of random content somewhere in the server responses anyway. This of course all assumes we can trust the root certificate authority though :P

cyphar · on May 31, 2017

Yeah, good point. I presumed it would be a pain, but I never thought to see if someone actually tried it.

Though you shouldn't be compressing things over TLS. I think the only proper solution is to pad out all articles (and images) to the nearest 2kB or something so that you can't figure out the length (randomness can be thwarted by forcing refreshes).

shif · on May 30, 2017

The government could force pc manufacturers to deploy a root CA that they control and then do a MITM proxy to read everything the user is doing, they could also redirect wikipedia domain to another domain that just acts as a reverse proxy and deploy a legit cert on that other site

tonysdg · on May 30, 2017

Governments already deploy root CAs -- see here: https://security.stackexchange.com/questions/71171/is-there-...

tony101 · on May 30, 2017

Most Wikipedia/Wikimedia domains have HSTS preloaded in browsers, preventing redirection without a valid certificate.

theEXTORTCIST · on May 30, 2017

AFAIK HSTS doesn't break TLS MITM. A valid x509 certificate is generated by the attacker (using a Certificate Authority trusted by the victim's browser) for the domain the victim is visiting and all is well for both TLS sessions (Client<->Attacker, Attacker<->Server). This all relies on the attacker having access to sign certs from the trusted CA.

Certificate pinning in the HTTPS client would mitigate TLS MITM (HPKP).

petre · on May 30, 2017

There was an IPFS clone of wikipedia after Turkey blocked it.

http://observer.com/2017/05/turkey-wikipedia-ipfs/

darkhorn · on May 30, 2017

There were few censored pages on the Turkish Wikipedia when it was on HTTP. They were the "vagina" article and election prediction article. Only those pages were censored.

Last month there were some articles on the English Wikipedia about ISIS-Erdoğan (I don't care true or not). Then they have blocked all Wikipedia (all languages). Because they were unable to block those individual pages.

thr0w__4w4y · on May 30, 2017

Yup. Was there 2 weeks ago working with a group of Turkish engineers - I went online to get some technical information about a particular stream cipher, and WHOOPS! - Wikipedia is blocked, completely.

Fired up my VPN, accessed the page, thank you very much.

"The Net interprets censorship as damage and routes around it." - John Gilmore

rocky1138 · on May 29, 2017

How do governments censor only parts of Wikipedia when the site is encrypted? How do they know which pages you are browsing if they can't see the URL?

zeta0134 · on May 29, 2017

That's just it; they can't! When you visit Wikipedia over HTTPS, the only thing actually visible in plain text is wikipedia.org, and that's only if your browser is using Server Name Identification (SNI).

Since the rest of the request, including the URL is hidden, governments and other malicious agents between you and the server cannot actually see what pages you're requesting directly. They can only see that you are accessing wikipedia.org and transmitting some data. You may still be somewhat vulnerable to timing attacks to try to identify what pages you're viewing, but censorship can't happen at the page level over HTTPS; you have to block the whole thing in one go.

Amulet- · on May 29, 2017

The article says

Although countries like China, Thailand and Uzbekistan were still censoring part or all of Wikipedia by the time the researchers wrapped up their study

The top comment might be asking about the "were still censoring part" of the article.

zeta0134 · on May 29, 2017

Oh, huh! I missed that entirely, now I'm curious too. HTTPS should make that difficult, but China has been known to employ all sorts of weird shenanigans-- perhaps they're running a "trusted' MitM as part of the great firewall?

I know that certain companies (like Google and Microsoft) will actively censor themselves to continue to operate within China, but I figured Wikipedia would be against that practice on principal. Now I'm curious as to how it's done.

matt4711 · on May 30, 2017

I think china blocks zh.wikipedia.org but all other languages are not blocked.

kalleboo · on May 30, 2017

When I visited China a bunch of years ago, zh.wikipedia was completely blocked, and on English wikipedia, only certain articles were deadholed (tiananmen square...)

Bakary · on May 30, 2017

I just tested a few pages and it looks like you are right.

beefsack · on May 30, 2017

Nitpick: Google opted to pull out of mainland China instead of self-censoring. They moved Chinese operations to Hong Kong, but operate uncensored there.

bumblebeard · on May 30, 2017

Google was perfectly willing to self-censor in China until they were hacked by the Chinese government in 2010. That's when Google China moved to Hong Kong.

https://googleblog.blogspot.com/2010/01/new-approach-to-chin...

zkms · on May 29, 2017

This is how "domain fronting" (https://en.wikipedia.org/wiki/Domain_fronting) works as well -- encryption makes blocking an all-or-nothing deal, and blocking everything that goes to an extremely popular IP range / SNI causes too much collateral damage :)

Edmond · on May 29, 2017

For committed governments like China, TLS may just be an extra hurdle but they can get around it if they want. Basically China could simply implement a massive proxy that terminates TLS.

If your internet traffic is going to flow through infrastructure that a curious government owns, then you'll know that they're monitoring the traffic but there is no way to keep them from seeing what you're doing.

matt123 · on May 29, 2017

No, TLS is not vulnerable to a MITM unless a) your client trusts the certificates issued by the attacker, or b) the attacker successfully forges the certificate of the website you are trying to visit.

That is, assuming you don't click away your browser's security warning.

https://security.stackexchange.com/questions/8145/does-https...

ars · on May 30, 2017

> TLS is not vulnerable to a MITM unless a) your client trusts the certificates issued by the attacker,

Or in other words it is vulnerable.

China can (and probably does) issue a certificate that all Chinese browsers must install, they can then do MITM https using their certificate to sign the new versions.

Companies do this routinely BTW. Since it's their equipment, it's considered just fine. (But be aware of it if you are using a company computer.)

zht · on May 30, 2017

do you have any examples of China issuing a certificate that all browsers trust?

I've never seen or heard of this (at least across all browsers), so I find this unlikely.

ytch · on May 30, 2017

https://security.googleblog.com/2015/03/maintaining-digital-...

"On Friday, March 20th, we became aware of unauthorized digital certificates for several Google domains. The certificates were issued by an intermediate certificate authority apparently held by a company called MCS Holdings. This intermediate certificate was issued by CNNIC."

schoen · on May 30, 2017

There is a concern dating back many years that a government will mandate that UAs trust a particular government-controlled CA (that eventually, but maybe not at first, openly performs MITMs). This is one reason that browsers really want to keep control of their root programs and not be mandated by governments to include any particular trusted roots -- including to maintain a remedy against roots that do appear to deliberately facilitate MITMs.

Although there have been lots of concerns about CNNIC, I don't believe that the Chinese government currently either (1) routinely uses CNNIC to perform MITMs for censorship or mass surveillance purposes, or (2) purports to require UAs to trust CNNIC or another Chinese root in order to be used by Chinese users. I'm happy to be corrected if someone knows otherwise.

kijin · on May 30, 2017

They wouldn't "routinely" abuse their root to monitor large populations. That would be too obvious and result in near-immediate loss of their precious root.

What's more dangerous, and much more likely, is that they might use forged certificates against specific individuals for a short period of time, for example, to intercept login credentials. The attack will go unnoticed as long as they also block the corresponding HPKP reporting URL (if the targeted site uses HPKP at all).

schoen · on May 30, 2017

I think this distinction is a good one, and I'll try to be more attentive to it when writing about this in the future.

Hopefully the risk for the attacker from the two kinds of attack are gradually converging, due to pinning and especially Certificate Transparency.

manquer · on May 30, 2017

revoking the root outside china will have no bearing within. All devices sold and used in china could be forced to include that root. There is not a lot a user could do , especially in mobile if you have locked phone and only access to the official app store

kijin · on May 30, 2017

There's a fine line between cartoon-villain evil, exemplified by people like Kim Jong Un who just doesn't seem to give a fuck, and just-enough-to-achieve-your-objectives-but-not-enough-to-make-too-many-people-notice evil, which is what China is aiming at.

Lots of people travel in and out of China with all sorts of computing devices. China does care about the reputation of their root and of their highly profitable electronic exports.

mfjordvald · on May 30, 2017

They don't need one all users trust as a lot of Chinese users will use Chinese browsers and those are already compromised.

https://en.greatfire.org/blog/2014/oct/china-collecting-appl...

Edmond · on May 29, 2017

It isn't but if you live in China and want to use the internet, you'll likely be forced to use a proxy that MITMs and serves its own certificate....My point is that TLS is not a solution to prevent government interference when the user has to rely on the government infrastructure for access.

throwaway2048 · on May 30, 2017

This is not how the great firewall works, check the facts known, not just baseless speculatation.

comex · on May 30, 2017

Not how it currently works. The parent was referring to hypothetical future changes.

ishitatsuyuki · on May 30, 2017

Note that the server certs are also parsable even if you're not using SNI.

raverbashing · on May 30, 2017

DNS requests are still transparent

rgbrenner · on May 29, 2017

according to the paper.. the answer is subdomains.. For example, in one instance China blocked zh.wikipedia.org (the entire subdomain.. they cant see what page you're visiting), but left their other 291 subdomains unblocked.

varenc · on May 29, 2017

Governments can't censor parts of Wikipedia when it's all encrypted, that's sort of the point of the article.

MR4D · on May 29, 2017

I wonder about this. If a government can hack into a server and steal the private encryption key, then they could just look like any other server in the server farm, right?

Given the recent Shadow Brokers release of the NSA tools, it seems to me that this was not only possible, but probable (not necessarily with Wikipedia, but any website).

rocky1138 · on May 29, 2017

https://news.ycombinator.com/item?id=14442127

jacquesm · on May 29, 2017

Well, they can block it whole. Once they figure out they can't block parts that's exactly what they will do. Either that or a re-host on their own infrastructure with the offending parts removed, conveniently seizing whatever domain names wikipedia has in that country for added authenticity.

azernik · on May 29, 2017

That's exactly the point of the article?

ekarulf · on May 29, 2017

Who says they can't see the URL? A sufficiently motivate government would probably be able to create forged certificates and mass interception isn't really out of the question. Especially with browsers homogenizing on fast ciphers AES-GCM/POLY-1305, I bet it's much more economical than you would think.

Cert Pinning or HPKP is one type of solution, but it's tricky to get right especially for a large site like wikipedia.

rocky1138 · on May 29, 2017

From TFA:

"In Turkey, Wikipedia articles about female genitals have been banned; Russia has censored articles about weed; in the UK, articles about German metal bands have been blocked..."

iancarroll · on May 29, 2017

Even if this could happen, which is far from trivial, it would be almost immediately detected.

gwern · on May 29, 2017

After reading through the whole paper, I would have to say that there is far less censorship of WP, HTTPS or HTTP, than I guessed.

enzolovesbacon · on May 29, 2017

  Critics of this plan argued that this move would just result in more 
  total censorship of Wikipedia and that access to some information 
  was better than no information at all

I'm no critic of this plan but I still don't understand why this wouldn't result in more total censorship. Someone explain please?

dTal · on May 30, 2017

Because Wikipedia is too useful. Note that it required a certain self-confidence that this was the case for Wikipedia to implement this strategy. And it's self-fulfilling - if Wikipedia allowed itself to be censored, then it would have fewer contributors and its usefulness would suffer.

There's a rather interesting analogy to be made with the GPL here. Critics argue that companies shy away from it because they cannot control it. Yet its entire goal is to not be controlled, and it draws its strength from the conviction that the body of GPL software is too useful to ignore. And again, that's self-fulfilling.

It takes courage, but it's important to know when you have the power to say "all of me, or none of me".

dragonwriter · on May 30, 2017

> Critics argue that companies shy away from it because they cannot control it.

No, they don't. Critics point out that companies avoid it, and non-critics ascribe this avoidance to "can't control it", which is false, because nothing under a third-party copyright under any non-exclusive license can be controlled by the licensee, but businesses avoiding the GPL don't generally avoid all non-exclusive licenses.

cyphar · on May 30, 2017

I think "can't control" refers to sublicensing in this context. People's dislike over copyleft stems from wanting to make software proprietary (or proprietary-friendly through lax licensing). Copyleft removes that control, and the GPL's main strength is that it is so ubiquitous that you cannot practically avoid it (in most cases).

dTal · on May 30, 2017

Insofar as companies avoid it, they do so because it constrains their behaviour in some way. Call it what you will; my wording was perhaps sloppy.

For the increasing number of companies that do participate in the GPL ecosystem, they do so because the opportunity cost of not participating outweighs the concomitant behavioural constraints. This produces a strong network effect as GPL software gains contributors, making GPL software more useful.

Wikipedia's anti-censorship strategy is analogous in that the switch to HTTPS raised the opportunity cost of censorship to the loss of the entire Wikipedia "ecosystem", which for many regimes is more severe than the "cost" of not censoring. This too produces a network effect as Wikipedia gains more contributors, thus further increasing its value.

samcheng · on May 29, 2017

If a censor can't tell which specific parts of wikipedia someone is trying to access, then they will be more likely to simply block the entire site.

HTTPS encrypts the URL and the content, but does not mask the DNS lookup nor the server being connected to.

enzolovesbacon · on May 29, 2017

Yes, I understand that. I mean, why don't these censors block the whole wikipedia.org access then?

If they don't want their population to access a Wikipedia topic/article and can't block/determine if someone is accessing it, the easiest thing to do would be just block it right away. So why they won't do it?

(PS: I'm in no way in favor of censorship, I'm just trying to understand such mindset)

hannob · on May 29, 2017

If you censor too much people may be pissed. It's much easier to decide "we censor specific articles about specific subjects" than "we censor all of wikipedia". Censoring a popular mainstream webpage may cause too much opposition. Maybe even the politicians who make the decision and their families like to look up things on wikipedia.

kgwxd · on May 30, 2017

Then https will force them to either extreme which I think is a good thing. No option to slowly raise the temperature so the frogs won't jump out of the pot.

SomeStupidPoint · on May 30, 2017

But can't they just download the Wikipedia backup, purge the articles they don't like, and redirect the DNS lookup to a local copy?

To the average citizen, it won't look much different than going to actual Wikipedia.

FreeFull · on May 30, 2017

This is feasible, assuming the government is willing to pay for the hosting.

kijin · on May 30, 2017

As well as forge an SSL certificate for *.wikipedia.org.

Last time I checked, Wikipedia had HSTS enabled. So trying to forge their DNS without also forging their SSL certificate would be equivalent to total censorship for anybody who has previously visited Wikipedia.

gjjrfcbugxbhf · on May 30, 2017

Assuming the government in question has access to a root certificate this should be possible.

sqeaky · on May 30, 2017

Even presuming it is possible, it still raises the cost of censorship. Simply raising that cost is a good thing.

Bakary · on May 30, 2017

They are currently creating a Wikipedia clone in China so I'm guessing they are allowing wikipedia only on a temporary basis.

foota · on May 29, 2017

Probably because they recognize the utility of Wikipedia.

mirimir · on May 30, 2017

As with GitHub.

shusson · on May 30, 2017

TIL: HTTPS encrypts the URL.

blhack · on May 30, 2017

I think it's a fun/educational process to interact with some daemons over telnet. You can telnet into port 80 and create an HTTP request, for instance.

Certification negotiation happens before the GET request happens, which means that the "URL" (or, rather, everything after the domain) is encrypted.

You can also see some of this process with curl. So:

     curl -vvv https://www.google.com/

gol706 · on May 30, 2017

Telnet is a great way to realize that HTTP is just some simple text commands and not some mysterious binary protocol.

WireShark also provides a good visualization of the HTTPS negotiation process and the various layers of HTTPS requests and responses. It does take a lot more to figure out than telnet though.

Matt3o12_ · on May 30, 2017

For all those wo are not aware what HTTPS encrypts:

HTTPs encrypts basically the whole protocol, this includes your request (the URL, your fingerprint -- e.g. browser, plugins installed, preferred languages) and the response (the content, type of the response (text, video, audio file), and some other not some important things).

What HTTPs does not encrypt is the domain and ip. The domain is leaked through DNS. DNSSec will not help either because it will not encrypt the DNS request. It rather signs it so that you can be sure it is authentic (not tempered with) but everyone can read it. This includes the wifi hotspot you use, your ISP, your government and anyone who tampers with the wires (theoretically even your neighbor and nearby people if you use mobile data since the connection from your device to your ISP is not really strong[1]).

Even if you would encrypt the DNS traffic (or you use just use the host's ip directly), the person who intercepts your traffic could just build a database with IP addresses that correspond to DNS entries (or do a reverse lookup, however, not every IP address has a reverse lookup configured to the domain you are visiting).

In wikipedia's example, this can still be pretty bad. For instance, if an oppressive government realizes that you visit wikipedia version of a particular language pretty frequently (compared to the rest of the population), they might make assumptions about you and profile you. When you visit the German wikipedia site, you are actually visiting de.wikipedia.org instead of en.wikipedia.org which can be intercepted and seen.

This gets worse for static file servers which serve different images at different subdomains (e.g. static512.domain.tld). So, if a DNS request is made to static523, static123, static721, and static132, an attack might be able to guess which article you are reading (or narrow down the choice) because their will not be many articles which have images served by those particular file servers. Thankfully wikipedia does not do that. Everything is served through upload.wikimedia.org but newpapers/forums, etc might not do that or they even have a unique domain for that article (e.g. embedded chart/video, which comes from a unique their party and is loaded automatically).

So all in all, HTTPs is pretty good but you still leave a lot of metadata (the DNS requests are just the tip of the iceberg) that can be used to learn a lot about you. If you want to be safe, use Tor or a VPN. If you use a VPN be aware that you just shift the trust from your current location to another one (so that the VPN provider, their ISP, and the government where the VPN server is located can read all those metadata, which might be not a big deal or even worse, depending where you actually life. Furthermore some VPNs have been known to be broken easily and your ISP/government still sees that you are using a VPN or even Tor).

[1]: One exception is LTE internet but you could still downgrade the connection to 3G or edge to intercept the domain

schoen · on May 30, 2017

> What HTTPs does not encrypt is the domain and ip. The domain is leaked through DNS.

Currently also with SNI.

https://en.wikipedia.org/wiki/Server_Name_Indication

This is important for some censorship circumvention schemes and also because some people have suggested that encrypting SNI is useless because DNS leaks the hostname [however, not necessarily along the same network path!!], while some people have also suggested that encrypting DNS queries is useless because SNI leaks the hostname.

SpacePotatoe · on May 29, 2017

I just wonder what UK government has against German metal bands

SXX · on May 29, 2017

Against album cover:

https://en.wikipedia.org/wiki/Virgin_Killer

WalterGR · on May 29, 2017

That needs a NSFW warning. Fuck.

Amulet- · on May 29, 2017

That seems a bit random

bowersbros · on May 29, 2017

because the artwork is of a nude 10 year old.

Markoff · on May 30, 2017

ever been to beach in Europe, particularly naturism beach? not sure what's problem here, apparently some puritans who are fine with violence but consider naked body disgusting

Amulet- · on May 29, 2017

oh that makes more sense to me

olivermarks · on May 30, 2017

To be fair to the Scorpions, quote from Wikipedia...original concept for song

'...Time is the virgin killer. A kid comes into the world very naive, they lose that naiveness and then go into this life losing all of this getting into trouble. That was the basic idea about all of it' Different times... https://en.wikipedia.org/wiki/Virgin_Killer

stordoff · on May 30, 2017

Not, strictly speaking, the UK government. The Internet Watch Foundation, a non-governmental organisation, placed the article/image in question on its blacklist, a list which most major UK ISPs use (notable exceptions at the time were the UK universities' and military networks IIRC).

AFAIK, whether or not the image is actually illegal under English law is somewhat unclear (the definition of "indecent" is rather woolly), though it's certainly a poor choice for an album cover.

Edit: "to its blacklist" -> "on"; added "a non-governmental organisation"

lmm · on May 30, 2017

The IWF is a quango, not really nongovernmental as such.

vbezhenar · on May 30, 2017

Currently HTTPS sends domain in clear-text before establishing a connection. It allows to host (and block) website by domain, not by IP. May be HTTPS should have optional extension to send URI in clear-text before establishing a connection. This way, if censors decide to block Wikipedia, users can opt-in into this behaviour and have unblocked Wikipedia except few selected articles.

knome · on May 30, 2017

Absolutely not. The response to censorship should not be to make things easier for the censor.

Anyway, the idea is unworkable as the user's client could simply lie about what URI it's going to send after the encrypted connection is setup.

vbezhenar · on May 30, 2017

> Absolutely not. The response to censorship should not be to make things easier for the censor.

It's not about making things easier for the censor. It's already easy. It's about making life easier for people who have to live with censorship (pretty much the entire world, I guess?).

> Anyway, the idea is unworkable as the user's client could simply lie about what URI it's going to send after the encrypted connection is setup.

Good server should response with error, I guess.

thriftwy · on May 30, 2017

People should be fighting for their rights and freedoms, not make their slave life easier.

avaer · on May 30, 2017

Two problems:

- Unlike the host, URIs are a property of the request, not the connection, so sending it as part of the connection handshake doesn't really make sense.

- Unlike the host, there is a very long history of putting secret things into the URI. Even if the extension is built with this in mind, the number of security breaches that will result is greater than zero, with probability one. That's probably not the correct price to pay for convenient censorship infrastructure.

unscaled · on May 30, 2017

And how would you make sure Wikipedia honors that clear-text URI (instead of a different encrypted URI inside the request)?

Even when using SNI (the optional extension that sends the domain name in cleartext), the web server fully entitled to ignore it.

libeclipse · on May 30, 2017

> a positive effect

Any numbers/figures?

shpx · on May 30, 2017

https://dash.harvard.edu/bitstream/handle/1/32741922/Wikiped...