Hacker News new | past | comments | ask | show | jobs | submit login
Securing access to Wikimedia sites with HTTPS by default (wikimedia.org)
160 points by freeasinspeech on June 12, 2015 | hide | past | favorite | 47 comments



User generated content sites are some of the most important that need to implement HTTPS.

Consider reddit for example. The entire site is HTTP. That means your ISP can pull up a list of your entire reddit browsing history. They can see every post you read, every comment you wrote, etc. Collectively this data could form quite an accurate picture of a person's mental makeup. Is that the kind of data you want floating around?

HTTPS encrypts query strings so upstream providers cannot see which specific pages you browse.

Of course, HTTPS still shows domain names... but it's a start. Next step is DNSSEC for everybody!

p.s. Anyone interested going down the rabbithole exploring mass tracking of individuals, should google "entity mapping llc" and poke around the results...


https://www.reddit.com/prefs/security/

You can force it to https only for your user account.


Thanks for that tip. I just enabled it for my account. I wonder why they don't redirect to https by default.


from what I remember reading is that when they first started with HTTPS, they weren't able to easily deal with the added overhead. They switched to cloudflare and are working on rolling out HTTPS across the site. One of the changes involves removing the reddit bar.


For what it's worth, Reddit supports HTTPS but doesn't enforce it or enable it by default.


Our HTTPS Everywhere software will change that default (in your browser) for Reddit and thousands of other sites.

https://www.eff.org/https-everywhere


I am a fan of the sentiment behind HTTPS Everywhere but I wish you would spend more time educating people on root certificates and trusted CA's.

Why are there 200 root certificates in my Apple key chain? That is at least 200 entities who can MITM my SSL connection without my knowledge.

The trust assignment protocol is a vital aspect of communication security. What good is end-to-end encryption if I don't know which "ends" to trust?


The main EFF contribution to this problem right now is the SSL Observatory.

https://www.eff.org/observatory

You can allow your copy of HTTPS Everywhere to send us certs, which can help researchers understand what CAs are doing and potentially detect misissued certs.

Two other important mechanisms are Certificate Transparency and HPKP.

http://www.certificate-transparency.org/

https://en.wikipedia.org/wiki/HTTP_Public_Key_Pinning

The former is a way -- I hope! -- to eventually require the open publication of all issued certs that the public is expected to trust. The latter is a way for sites that you successfully connect to at one point to prevent other CAs that they don't have any relationship with from helping to MITM your future connections.


Maybe I'm mistaken, but what does SSH have to do with certificates in the trust store?


Yeah that was a typo sorry.

For the answer to what SSL has to do with certificates in the trust store, the best demonstration is by example. Try to setup mitmproxy on EC2 to MITM your own HTTPS connections. In order to do so, you will need to install a trusted root certificate on your device.


That should probably read "SSL".


This is a great move and has inspired me to think of how I can do this for the sites I run.

The problem I have is that I run forums which accept user generated content, and the links in the content are then parsed and embeds are put in.

For example, YouTube links have the embedded video put below to the link.

YouTube support https, but a lot of smaller sites that offer really useful tools don't yet support https. An example is something like http://www.bikely.com which does not support https at all, and yet the embedded maps are common enough on the cycling forums I run.

I cannot just proxy via a https domain as these sites usually have JavaScript that require permission to talk to their own domains.

I've decided to start emailing all of the 3rd parties to ask that they add https to their site.

Almost 40% of my traffic is now over https, and it's harder to increase that as every time I try to I receive support complaints about mixed-content warnings, missing content, etc.

If you also run a site that has embeds in user generated content, please consider emailing the 3rd parties and explaining why they should move to https.

There are only a few things holding back a lot of sites and it's no longer the cost of a cert and SSL termination:

1) Embedded widgets that are http only today

2) Advert widgets/scripts that are http only today

Those things hold the vast majority of news sites back too. I even checked the Guardian yesterday and no https available. The only news site that was https was The Verge, but I'm not sure tech news is really news.


Exactly. The problem is that News sites rely on advertising.

Due to the general low technical quality of adv companies/companies' suppliers/customers etc you will find often find an image/flash/js served over plain HTTP or with a bad certificate.

That single HTTP bit will immediately trigger the terroristic browser behaviours we learned to accept and love, including but not limited to: red signs, technical jargon pop ups, fully red pages with jargon disclaimers of all sorts and "getme outta here" buttons.

Funnily enough, I remember reading a technical blog post written by Guardian devs themselves, citing this very problem.


The browser isn't exactly wrong for triggering those 'terroristic' behaviors. If there is one single http bit being included and rendered on the https web page then the entire security of the page is rendered useless if there is a MITM attacker between you and the source. That single http bit can include whatever the attacker needs in javascript to redesign the page to steal content and provide tampered content.


Replacing embeds with plain hyperlinks seems like it would be better from a user privacy perspective.


Or at the very least with iframes. If you run a forum and you're allowing directly embedded content, you're trusting that content not to break your security.


I use Markdown for user content, this is passed through a Go library I wrote to strip out iframes, embeds, etc... https://github.com/microcosm-cc/bluemonday and then as a post-processing task once I trust the content, I find the links that I know how to handle (YouTube, Bikely, etc) and embed third party content in iframes.

This is basically a way to do the equivalent of Twitter cards, it respects the JavaScript and web security model, but does mean that the iframes contain http content on a page that is https

Where I'm trying to get to is to have all iframes, etc be https


> I find the links that I know how to handle (YouTube, Bikely, etc) and embed third party content in iframes.

Ah, that makes sense. From your previous comment, I didn't realize you were recognizing and explicitly handling sites like Bikely.


I think iframes will result in a embedded content warning?


Yes, they will, but they're an improvement over script tags (which will also result in a mixed-content warning).


Yes, they do.


not just small tool, last.fm's artists avatar are only served over http and https is self-signed/not trusted.


I'm not sure what forum software you run but the majority can be modified. The way I would approach this is to modify the output and replace all instances of supported embed URLs with https.


But that doesn't work for sites that don't support HTTPS. It would either break the embed by trying to connect over HTTPS anyways, or try to load a bunch of HTTP content into an HTTPS page which browsers will complain about.


I think most of the major embed providers support https.


I cited an example, http://www.bikely.com is used by users on the forums I host and does NOT support https.

Neither do several other embedded route/map providers like gpsies.com and so on.


I've had a few conversations with other advocacy nonprofits who are not using https, even when they know they are being targeted for surveillance. It seems the biggest reason they don't switch is because their IT staff would bitch about the work and they don't want to sour the relationship between management and IT. Typical underfunded IT bureaucracy and politics.

What I've started to do is offer to play bad cop, and come in as a consultant and take the heat from IT when I propose the work to switch to https. That way, IT bitches about the consultant and not management.


Very cool. Well done you. How do we scale you? :)


On Wikipedia, one of the issues is that a lot of the behind-the-scenes JavaScript gadgets (for routine administration/maintenance tasks) are written by community members.

When HTTPS was first rolled out on Wikipedia (before that we had the joy that was a single secure server for all language sites - secure.wikimedia.org), a lot of these scripts wouldn't work because they hard-coded protocols and so on. Fixing all those edge cases was quite difficult. The scripts are often abandonware but essential for admins to do their jobs.

One of the downsides of community projects like Wikipedia...


Not perfect, but wikimedia scores a respectable A on ssllabs:

https://dev.ssllabs.com/ssltest/analyze.html?d=wikimedia.org...

Same IP as en.wikipedia.org, so does that mean they're using SNI to serve different certs? http://caniuse.com/#search=sni


Nope, it's a wildcard cert -- if you view the cert contents you see that the subject includes

CN = *.wikipedia.org


Actually, it's a bit more complicated than that. We don't just serve .wikipedia.org from our frontends, but also e.g. .wikisource.org etc.

So yes, we are using SNI but for wildcart certs :)


This is an interesting site: https://httpswatch.com/global


You should also check out the Trustworthy Internet project report on SSL/TLS deployment quality.[0] The big increase in B and C grades this month (compared to last month) is due to Qualys's SSL Labs tester becoming more strict about DH parameter size (less than 2048 bits caps your grade at B; see also Logjam), lack of TLS 1.2 support (caps your grade at C), and RC4 deployment with TLS 1.1 and 1.2 (caps your grade at C, RC4 deployed for TLS 1.0 still only caps you at B).[1]

[0] https://www.trustworthyinternet.org/ssl-pulse/ [1] https://community.qualys.com/blogs/securitylabs/2015/05/21/s...


This seems good, but too limited. Why not make it so that wikipedia can only be served over HTTPS?

I remember when the UK censored wikipedia because one page had an image they considered child porn and they transparently redirected the domain to point to a proxy. If they only serve it over HTTP this should hopefully break such a scheme - which isn't the perfect solution (that would be the very painful death of those involved with the censorship), but at least it makes it clear what is happening.


> This seems good, but too limited. Why not make it so that wikipedia can only be served over HTTPS?

That's what they're doing. From the second paragraph: "We will also use HTTP Strict Transport Security (HSTS)"


  * HSTS requires HTTPS connections  
  * Internet.org (currently) requires HTTP only [1]  
  * But Wikipedia is a member of Internet.org [2]  
I'm curious how Wikipedia will handle this. Perhaps allowing Facebook to MITM?

Or maybe the security mentioned in the original article isn't intended for those who can't afford it[3].

  [1] https://developers.facebook.com/docs/internet-org/platform-technical-guidelines
  [2] https://internet.org/press/introducing-the-internet-dot-org-app
  [3] https://en.wikipedia.org/wiki/Wikipedia_Zero


HSTS only works after you connect over HTTPS. If you can only access a site over HTTP, HSTS doesn't do anything. So maybe people hitting WP from Internet.org can never get it over HTTPS in the first place?


This is all great, and I can see the benefit of having HTTPS available for all the sites.

That being said, various documentation has started serving docs on HTTPS only which means I can not access it from work.

There are a lot of scenarios where having HTTPS will just impede people doing their work.

And finally, how are we to trust that for example Version or Thawte are not influenced by the likes of NSA and make possible for them to decript our traffic with ease?


>how are we to trust that for example Version or Thawte are not influenced by the likes of NSA and make possible for them to decript our traffic with ease

Abandon all hope that HTTPS will safeguard you from the NSA or any major foreign intelligence agency.


And with that I absolutely agree.

So what is the level of paranoia that SSL is useful for? Since this is what the article says:

> Encryption makes it more difficult for governments and other third parties to monitor your traffic. It also makes it harder for Internet Service Providers (ISPs) to censor access to specific Wikipedia articles and other information.

And we agree it doesn't really help with government surveillance?

Do ISPs randomly censor access? Or do they again do it on government requests. Cos if government finds that your site needs censorship why would they not just block the whole site? Another thing that is harder for ISPs to do with SSL is caching.

Maybe I'm not brightest child on the block so I'm still struggling to figure out what is a benefit of having HTTPS everywhere.

And having the likse of Google punishing non SSL sites just makes this fad worse. I don't need SSL on StackOverflow, Django or Python documentation. Does anyone?


HTTPS does help with government surveillance. It won't save you if the NSA is targeting you individually, to the point where they're prepared to use targeted active exploits whose detection and identification would cost them both technically and PR-wise... but it will prevent (some of) your data from being passively vacuumed up en masse along with everyone else's, which for most people is a more pressing concern.

Well, unless the NSA has some magic passive SSL strip attack, which is not out of the question, but very unlikely.


Yes, I believe I do. It is no ones business but mine and the site's what I am reading or contributing.


Fair point, I am not pretending that everyone will have same requirements and oppinions. But even with SSL at least the domain is still visible. And in some cases there are ways to infer what URL you actually visited.

I also see companies using MITM successfully in a way that unless you check the cert your self it seems legit. I still use HTTPS when I go to Google but I can see the cert is spoofed.

And what about the people that don't care and are effectively prohibited from using a public data site at all since the site decided to use HTTPS only? Do way say we don't care about them? Since few years back we wanted our sites to be available to everyone, on old browser new browsers, mobiles and so on.

And having people smarter then me (like Roy Fielding) agreeing this does not do much for privacy rather content confidentiality (and actually making communication less private) is not making me any more convinced.

Bottom line, and I don't expect everyone to agree, is that I am all for using SSL even by default, but for public data I would still want to have access to it over plain HTTP.

I want/need that choice, otherwise we are hindering corporation employees and people living in the countries in which governments do massive surveillance. I think it is important for people to realise that SSL is not the ultimate solution for data integrity and specially privacy as it is often posed to be.

Thanks to all expressing your views in comments.


This is good.


If you go to the Turkish Wikipedia you will see that at the top there is banner. It says that some of the pages, such as vagina and human penis are blocket in Turkey. This is a great move, I mean the SSL. But Turkey will block the entire domain then. Then people will go to buy VPN from foreign countries etc. Bad for Turkey. Fuck those who are ass hairs!


I don't understand why you downvote me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: