Hacker News new | past | comments | ask | show | jobs | submit login
Google to offer encrypted search next week (cnet.com)
49 points by rooshdi on May 14, 2010 | hide | past | favorite | 44 comments



It's cool that Google is already starting to copy features from Duck Duck Go ;-) Next stop: promising not to keep a log of our searches, or assuming I want to be "signed in" on Search, just because I am on Reader..?


In fairness, implementing it on Google's scale is not really the same "feature", and since Duck Duck Go isn't a self-contained search engine, you can still question how end-to-end secure its final implementation is.


Why would it be any different on Google's scale? I mean, presumably the solution is the same install-ssl-plugin-and-point-it-to-a-cert fare that you would go through for any web server. Am I missing something here?


I think you're missing how expensive it is to make SSL work on a large scale. That being the reason it isn't the default everywhere.


Does the cost of SSL grow super linearly when you scale up?


Super linearly is a new term to me, but it doesn't grow linearly. However, it does grow, and Google is huge. The cost, in terms of labor, focus bandwidth, and money, is large at their scale.


The cost of TLS is linear in every way I can think of. The TLS handshake is basically constant overhead--in space and time--and O(n)--in space and time--transform of the actual data.


The constant factors are high relative to simply serving HTML, especially on a site that tracks non-encrypted performance down to the millisecond.


Sure, but talking about it being expensive 'on a large scale' is a bit odd - since revenue/profits also grow with scale. It seems to imply a greater than linear growth in costs.


Google has a completely different back-end architecture than a dinky website, and the costs come from the sophistication and complexity of the systems required at that scale.

Google has multiple data centers, dedicated caching and web clusters, complex internal network routing and load-balancing, etc. Then there's stuff like javascript and image hosting optimizations that you've got to make sure work without triggering SSL warnings across thousands of servers. Enabling SSL support for something like Googe search could potentially touch tens of thousands of systems. In short, maybe it's as simple as flipping a bit to enable SSL on the web farm, but I doubt it. There's probably a massive amount of internal engineering and organization behind the change, and that cost is what you're neglecting.

To give you perspective, it's trivial to enable SSL on a single-box website; it can be done in an afternoon. When we turned it on at Justin.tv, it took a few days (let's call it a week) of initial work, and ongoing maintenance costs to make sure that SSL sessions weren't breaking as we made improvements across the site. The costs go up as you get bigger and more complicated.


If it takes 4 hour to get SSL working for a site with 5,000 users then linear growth to a site with 250,000,000 users is 50,000 hours. IMO, google can get SSL working in less than 50,000 hours.


spideroak.com has always been an all SSL site, and there are a few additional annoyances:

- You need a cert for example.com, and maybe a wildcard cert for .example.com, and for any additional levels of subdomains such as *.{x,y,z,etc}.example.com.

- Not all search engines seem to credit links to http://example.com/ and https://example.com equally.

- Even modern browsers sometimes abort downloads over SSL silently, without even an indication to the user that the download failed (despite sending a content-length header.) That's the really frustrating one.


I'm sure that the SEO problem isn't a big deal for Google. ;)


Google's log of searches is an amazingly useful feature. Sometimes I google for some obscure library, find it, go to lunch, and then completely forget what it was called or what search terms I used to find it. With the search history, this is not a problem.

I also use Google search for package tracking, and it's nice to get the numbers out of the search history instead of the merchant's site.

I know everyone wants to think that Google is collecting this information so that they know who to come for first when they take over the world... but it's also possible that the feature exists simply because it's useful.


How far are we from just encrypting everything on the web because it has a negligible cost? Is this purely CPU-limited or is there another reason (does it take much more bandwidth? screw with caching?)? Is Moore's Law taking care of this quickly, or is it still far off?


"Screw with caching" is probably the biggest issue. In order to negotiate the SSL connection, any proxy (on either end) needs to have your certificate. There's also the issue that some browsers (Firefox, I'm looking at you) do their best to scare users about self-signed certificates, which means that you need to pay for a cert.

It also breaks the shared-hosting model: your entire HTTP stream is encrypted, which means that key negotiation occurs before the server has your HOST header. This, in turn, means that the TCP/IP data (IPs and ports) are the only pieces of information that the server can use to determine which certificate to use. In practice, this means that a server can only host multiple distinct sites if it has multiple distinct IPs.


  There's also the issue that some browsers (Firefox, I'm looking at you) do their best to scare users about self-signed certificates, which means that you need to pay for a cert.
Firefox is absolutely in the right, here. Self-signed certificates are worthless; if you'd like a free certificate, use StartCom or CACert.

  It also breaks the shared-hosting model: your entire HTTP stream is encrypted, which means that key negotiation occurs before the server has your HOST header.
This is only true for older servers; modern TLS implementations support name-based certificate negotiation.


They're worthless except insofar as they let you get the encryption benefit out of SSL without paying the cost necessary to get the authentication benefit.


What's the point of encryption if you don't know what server you're connected to? Somebody could be intercepting your traffic, and you wouldn't realize it.


The point is to stop one particular class of attacker: the one that's passively sniffing and archiving network traffic while trying to remain invisible (and thus isn't actually altering the traffic profile). This is a benefit (over in-the-clear HTTP), but Firefox and others effectively tell users that it's more dangerous than regular HTTP.


What's the benefit of using CACert when browsers don't include those?

https://cacert.org gives a big fat warning in chrome.


My browsers include CACert's root certificate -- therefore, I use it for securing any of my private websites. It's just like using a self-signed certificate, except not stupid.

If you need to support the general public, pay the $20 or whatever for a certificate with a widespread root.


Unfortunately there is more to this than server support. IE6 and every browser on XP, including firefox and chrome, don't support the tls extension. Even apache doesn't ship with it. For all practical purposes, name based virtual hosting still doesn't work for ssl. You'll need a seperate ip address for each name, unless you can use a wildcard.


SNI is supposed to fix the SSL + virtual hosts problem, but XP doesn't support it, unfortunately.

http://en.wikipedia.org/wiki/Server_Name_Indication

http://en.wikipedia.org/wiki/Transport_Layer_Security#Suppor...


Would it be possible to devise a caching system that works with encrypted data without weakening security too much?


Anything's possible. But if it isn't compatible with existing proxies, it isn't gonna happen any time soon.


How about Server Name Indication?


Now we feel safe knowing only Google's watching and analyzing everything we do!


"Google encrypted all Gmail accounts in response to the hacking incidents that prompted its decision to move its Chinese-language search operation from Beijing to Hong Kong."

Its a shame they couldn't do it because customers have been asking for it.


I believe gmail always had HTTPS as an option - in 2008 they added an option to selectively make it the default if you want - now they've reversed that and made it the default from the start. There is strong reasoning, backed up by user experience testing that the small delays introduced by SSL on web pages add up to big drops in retention... and business is business. HTTPS connection setup is significantly slower than a regular connection - you may not think that sub-second delays matter but they absolutely do in the competitive web-app world.


Do you have evidence that HTTPS results in user retention problems for gmail? I have heard that it may for search. These are two very different use cases. I can recall users and articles in years past asking for https to be default for gmail. Yes, us geeks knew its been an option for a while. My original post was simply pointing that the decision should be made based on what users want...not as an excuse so the media or Google can publish inflammatory digs in their issues with China.


I imagine they would offer it as an option in your Google account settings, similar to GMail before it was made default. That will certainly make censorship MUCH more difficult, and if they include Google cache in this (which was used for a time to get around the Great FireWall of China) we would have something of a winning combination.

On the other hand, the Chinese government might just ban Google search outright for such a move, although they haven't done this for GMail yet.


More difficult but still far from impossible... All China needs is a Chinese CA to issue them a certificate for Google and then they can do a nice MITM without the user seeing even a single warning.


There have been totally unproved speculations that they've been doing that. But if they did it on a scale like that their CA would be revoked from all default browser installations.

They could still force the issue and mandate that their CA be patched back in on all computers sold in mainland China.

Either case seems very unlikely, just outright blocking things they don't like seems to be their current approach. I don't see why they'd change that for Google.


How will this affect web traffic analytics? Will encrypted searches still be abled to be logged as referrers? For SEO/PPC/Analytics/Etc purposes


The only difference is in the transport - HTTPS instead of HTTP. Everything else about the protocol is the same - browsers can still send referrer information in their requests, cookies can still be set, and so on - it's just that a passive observer sniffing the network will not be able to see anything other than an encrypted connection to google.

This really is google just playing their cards at the right time and cashing in on the current privacy bandwagon in the media, casting themselves in a positive light. It doesn't alleviate any real privacy issues, whatever they are, with google.

There isn't a downside for us users - it's a good option for google to provide - it will cost them money in infrastructure, and will slow down the user experience by a significant factor - (you might not thing a half second matters, but it does, and google knows this - so this is possibly a convenient time for them to introduce this feature while people are focused on WHY privacy is important)


Actually... the browser won't send a referer to a HTTP site if coming from HTTPS. From the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15...

"Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol."

I wonder if Google will implement POST forms instead of GET to prevent caching of request-uri data by the browser as well as proper Cache-Control/Pragma/Expires headers to prevent caching of search results.


"Now, if you are on an airport Wi-Fi or other public network everything you search for is in the clear."

ok, and most search results you click on will take you somewhere not secure. There may be little value add here.

From re-reading this article, it actually contains no value expect the headline itself. cnet should start just doing tweets instead of bothering to try to fill a page.


Seriously? You don't see a use for secured searches?

How do you think the Chinese government will like this? Or any government who attempts to censor searches? How will they know what's being searched for? Also, a lot of value can be gleaned from just the results, and not all lead to insecure sites.


Me personally? I want https for gmail...ok, I've had it since around 2008. For web search? If I'm going to search on something top secret I wouldn't want Google to have identifiable info on me either so I wouldn't use Google or I would try to obfuscate my identity from Google through other means. So, for me, https on search isn't a common use case.

As to China: I think China will block https search on Google. I think any government that censors search will block https search on Google. Playing the China card on this new feature isn't interesting. Google and/or the press should not try to score points through inflammatory China comments with this new feature rollout. Https search should be added simply because its a feature users want; and I'm sure some do.


What we need is the equivalent of "privacy mode" that browsers have. Hit a switch somewhere or go to some URL, and suddenly your searching using an encrypted connection and no logs at all are being kept.


This is the obvious point, and I have to wonder if it would defeat the money-making potential of search data once it becomes common practice.

Maybe it's just another feature that makes google the best search provider. If they don't do it, someone else will.


Most people never change the default setting on anything, so I doubt it would have a huge impact on them. But it would certainly make power-users happy.


So if SSL has such little overhead today regardless of scale (and can be hardware accelerated) what the heck is PayPal's excuse for being so horribly slow? Gmail can run rings around it with far more data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: