Google Has Most of My Email Because It Has All of Yours

arb99 · on May 12, 2014

Same with Google Analytics.

GA is the standard analytics for a huge % of websites - so even if a website doesn't use GA for tracking traffic, Google still has the referral data. And things like Google Adsense (i'm pretty sure that sends back the referral data too, for tracking click fraud).

There is no way really to avoid Google knowing a lot about you/your website anymore.

Smerity · on May 12, 2014

You hit the nail on the head.

Google Analytics is on a substantial proportion of the Internet. 65% of the top 10k sites, 63.9% of the top 100k, and 50.5% of the top million[1]. My own results from a research project I did using the Common Crawl[2] corpus estimates approximately 39.7% of the 535 million pages processed so far have GA on them.

The real key to tracking is the referrer data. For the vast majority of clicks, you land on a site that has Google Analytics or you've just left one that did. As Google Analytics tracks your referrer, that means they still have your full browsing history if you jump from GA => !GA => GA => !GA => ...

According to my research[3], Google gets activity information on 51.43% of the 42 billion links analyzed in the 535 million page corpus as either the start or end of the link uses Google Analytics. This activity means they can accurately track browsing history on most sites, even those that don't use GA, simply as timing information, referrers, and knowledge of the web graph end up leaking user activity.

Used in an anonymized fashion, this is beneficial as it helps Google understand real world web traffic and hence rank search results accordingly (far better than simulated activity based upon PageRank or similar). In the theoretical situation you drop anonymization is where this gets troublesome.

If you're interested, there are more details at "Measuring the impact of Google Analytics"[3], though much of the discussion is on Hadoop + Common Crawl. For a privacy focused write-up (primarily worried about the NSA using Google Analytics), refer to "Google, make Google Analytics HTTPS by default"[4].

P.S. Everyone who notes "Google Analytics is easy to evade" are correct but missing the broader point -- the majority of web users will never do that.

[1]: http://trends.builtwith.com/analytics/Google-Analytics

[2]: http://commoncrawl.org/

[3]: http://smerity.com/cs205_ga/

[4]: http://smerity.com/articles/2013/google_analytics_and_nsa.ht...

tombrossman · on May 12, 2014

In the theoretical situation you drop anonymization is where this gets troublesome.

Here's something even more troubling. Take an 'anonymous' crime reporting site[1] and put Google Analytics on it. Put it on every single page, even on the page with the forms to submit anonymously. Not bad enough? How about a similar site, only this one aimed at reporting corruption[2] and try the same thing. What could possibly go wrong?

Both sites are well aware of the issue and have written me back when I pointed this out. This level of trust in an American ad company is curious. All I can do now is hope that whistle-blowers wanting to report corruption are savvy enough to avoid the web forms.

Imagine you are a government worker somewhere and you see evidence of corruption and report it. From the same machine you signed in to GMail with. Now consider that your local government can order Google to secretly hand over tracking data and forbid them from notifying the crime reporting site(s).

[1]: https://crimestoppers-uk.org/give-information/give-informati... [2]: https://forms.theiline.co.uk/integrityline

ronaldx · on May 12, 2014

> P.S. Everyone who notes "Google Analytics is easy to evade" are correct but missing the broader point -- the majority of web users will never do that.

This is an important point here because it's in stark contrast with Google e-mail. Google's e-mail servers are exceptionally difficult to evade.

I am (apparently) able to evade Google's attempts to track my web browsing habits with Google Analytics. I can take responsiblity for that myself.

This is a technically difficult opt-out process is IMHO against the spirit of "don't be evil", but at least there is an option.

On the other hand, I am totally unable to prevent Google from building an accurate profile of my e-mail habits. The only way I can opt-out here is by significantly curtailing my e-mail habits (or by insisting on PGP, which will have the same effect). This is significantly more frightening to me - I have no choice.

mike-cardwell · on May 12, 2014

Isn't it about time we dropped the HTTP referer header? If we lived in a World where that header didn't exist, and somebody came along today and proposed that we add it to Firefox, Chrome or IE, there would be absolute outrage. If the proponents then argued: "Yeah, but it will make tracking users easier, and we might be able to target advertising better to make more money", people would not accept that as a valid argument.

rubinelli · on May 12, 2014

The HTTP referer hearder was a very valuable tool for webmasters to correct broken links or prevent abuse years before ad targeting became a thing.

slashdotaccount · on May 12, 2014

I agree, cross-site referrer headers are harmful for our privacy, but same-host referrer headers can be legitimately useful.

dalek2point3 · on May 12, 2014

I'm a PhD student at MIT interested in doing research using the Commoncrawl dataset. Are you interested in working on this together or at least getting together and chatting about it? drop me a line: nagaraj@mit.edu

MattHeard · on May 12, 2014

Could a user "fake" additional site visits in between each real visit in order to obfuscate the actual visits to GA sites?

Smerity · on May 12, 2014

As Google Analytics is "self reporting" (i.e. your browser tells the GA servers what it's doing) you can avoid reporting or erroneously report whatever you'd like. It'd likely be easier for you just to block Google Analytics though if that is of concern to you.

In the unlikely event that fake activity became a problem, Google's well equipped to deal with it. They have a great deal of tech and brains in place to detect fake ad click activity, which is vaguely related.

andreasvc · on May 12, 2014

That doesn't really help if $incriminating_website is among the real visits. I think obfuscation is a waste of time given the machine learning techniques at their disposal.

atmosx · on May 12, 2014

Excellent comments and links!

eps · on May 12, 2014

Not just GA.

Next time you link to a .css with those wonderful free Google Fonts, ask yourself - what's in it for Google?

Then take a look at all those ajax.googleapis.com links pulling down jQuery libraries and wonder the same.

mike_hearn · on May 12, 2014

I wouldn't read too much into that. Google Engineering has huge budgets and all kinds of random projects get paid for with no better justification than "this is good for the web, therefore it's good for us".

I worked there for years. Seeing really deep, well thought out business plans there was a rarity especially for small projects like hosting web fonts or running DNS resolvers. Heck, even for very large projects sometimes the accounting was unbelievably carefree.

eps · on May 12, 2014

> I wouldn't read too much into that.

Pfft.. What was I thinking? It's the original Dont-Be-Evil company, right? Of course they are giving away tons of freebies just because they are awesome. They just run a money printing press for an extra minute and those huge budgets will materialize out of thin air. Yay.

mike_hearn · on May 13, 2014

Yes, that's pretty much how it is. Don't believe it if you like, but you won't convince me: I was there in some of these meetings, I read the design docs, I watched these sorts of projects get approved. They just give this stuff away because they're swimming in money and it's a place run by geeks.

yuvadam · on May 12, 2014

Google Analytics (and other tracking cookies/scripts) are actually very easy to evade, by using browser extensions such as Ghostery, Ad-block Plus and NoScript.

zerobyzero · on May 12, 2014

We should raise concerns about Facebook like buttons too. Its impossible to see a site without like button and it sends all our browsing info back to facebook.

digitalengineer · on May 12, 2014

I was under the impression FB also tracks all your other browsing even if you log out. So I use a different browser just for FB.

zerobyzero · on May 12, 2014

Even I keep hearing the same. I also noticed they they set 5 different cookies if you visit any facebook.com page, even if you are not logged in.

The amount of tracking google, facebook does is insane. And I hate the fact that none of my non-techy friends even understand it.

us0r · on May 12, 2014

While we are on the topic of FB - they don't delete data/pictures even when you delete them.

infinite8s · on May 12, 2014

This is why I only log into facebook through Chrome's "Incognito window" functionality. Of course, if they are tracking IP addresses then i'm screwed.

dan_bk · on May 12, 2014

> Same with Google Analytics.

I always use Piwik [0] - it's excellent, open-source and most of all: it respects your users' privacy by not letting any third party like Google track them.

My advice: Use it too and let your users know that you do so because you respect them.

[0] http://piwik.org

regecks · on May 12, 2014

It's still possible to avoid leaking information to GA, by rewriting outgoing links to go through a redirector. Many sites do this (including Google Encrypted Search, but its not perfect).

The email problem sucks though. We need end-to-end encryption, on all mail, today.

pestaa · on May 12, 2014

You are right about the outgoing links on your site to avoid leakage (this is what well implemented search engines like DuckDuckGo does as well), but I think GP talked about the incoming links of your site. Your pages will appear as exit pages in Google Analytics which you cannot do a thing about.

Sami_Lehtinen · on May 12, 2014

Using encryption still doesn't prevent massive leak of metadata, and wasn't it news just a few days ago that they do kill people based on metadata.

Btw. I've been also running my own mail servers for ages and have been fed up with Gmail users.

blueskin_ · on May 12, 2014

Block the .js file. Problem solved.

Or, just, you know, disable javascript entirely.

user24 · on May 12, 2014

> Problem solved.

Except it's not because the problem's still there for everyone else. Do you really that being shielded yourself but allowing your non-techy friends and family to be tracked is a 'solved' problem?

sillysaurus3 · on May 12, 2014

I've always wondered whether Google ever digs into communications in a situation where they're trying to decide whether to acquire a company. It seems like reading a company's email would be a reliable source of information about whether they're on a genuine trajectory or whether e.g. they're having trouble with their investors. I've never looked into whether it'd be illegal for them to do so. Surely in the EU it would be illegal, because privacy protection seems to be a serious concern there, but I don't know about the US.

If you use Google Talk, every conversation you've ever had will be recorded and indexed and tied back to you. If you use gmail, same deal. Even your drafts of unsent emails will be. If you use AIM, same deal: every conversation you've ever had on it will certainly be logged somewhere and tied back to you. Yada yada, same deal for almost every chat program, because almost every chat program has no clientside encryption. If it does, it's not very popular, or it's hard enough to use to where people will think you're paranoid if you ask them to go out of their way to "download this chat program that lets us talk without anyone logging it."

I think the endgame here is to watch what you say. It's safest to assume every text conversation is public. How many of us have said something in text to our families or friends that we'd be extremely uncomfortable saying publicly? It's a little unsettling.

Then again, hopefully when the TextSecure people ship their browser-based chat program things will improve somewhat, because you'll be able to talk to someone else without the conversation being duly noted. (There will probably still be metadata that ties you to the fact that you're talking to someone, but at least the content will be protected.) Hopefully it will be easy to use... I wonder if they need any help in that capacity.

paul · on May 12, 2014

No. That's not just evil, and most likely illegal, it's stupid too. It's too easy for something like that to leak, and the damage would be enormous.

sillysaurus3 · on May 12, 2014

Hi Paul. Sorry, I didn't mean to write a conspiratorial comment about Google. I meant to call attention to the fact that every textual conversation you've ever had has probably been logged, and just because our legal and cultural framework presently frowns upon digging through those logs, that may not always be the case in the future. The logs will persist even after our cultural norms change.

So it seems important to come up with a technological solution to the problem of how to communicate without all of it being logged. It's a difficult problem because it's hard to get other people to actually use whatever you come up with. That's why I'm crossing my fingers that TextSecure's browser plugin will take off, because if it's as easy to use as email and as powerful as email, it could have a very tiny chance of becoming the next popular communications platform. At that point no one would have to trust any company to preserve privacy, which seems valuable.

EDIT: I'm confused why my comments were moved to the bottom of this thread, because they don't seem offtopic. For example, the second topmost comment is also about encryption: https://news.ycombinator.com/item?id=7731216

nutjob2 · on May 12, 2014

"the fact that every textual conversation you've ever had is logged"

That's patently untrue. What are you basing that on besides your own paranoia?

sillysaurus3 · on May 12, 2014

The fact that if law enforcement demands access to your conversations, companies can readily produce them.

moultano · on May 12, 2014

Naturally you only hear about the times companies could produce them, and not all of the times they couldn't.

ErikRogneby · on May 12, 2014

How many people delete anything from their ever growing inbox these days? Not just logged but readily accessible in the cloud.

mike_hearn · on May 12, 2014

Email that you delete from Gmail (once it falls out of the trash) IS deleted permanently after a while. It's not immediate because it has to wait for backups to cycle around and rewrite old storage, but it does happen.

frozenport · on May 12, 2014

>>damage would be enormous

Would it? The NSA is reading everything and I haven't seen see many changes. I think we as a society need to sit down and figure out if these services should be in the private domain or instead implemented in the public domain that matches with our values. We spend billions, why can't we spend a few to make an internet that doesn't spy on us.

I don't think I could stop using Google. I don't know what would make me stop.

paul · on May 12, 2014

The government also kills people with drone strikes, but that doesn't mean Google could get away with that.

Where is this public domain that you trust so much and why are they more honest that Google?

frozenport · on May 13, 2014

>>that doesn't mean Google could get away with that

You piqued my interest, what would happen if they killed somebody? Probably whoever asked for it would get in trouble (unless they deleted the logs...), but I doubt I would or could stop using Google. We use web-search like a utility.

>>Where is this public domain that you trust so much

If it works or not is another story, but we have a de-jure mechanism for government accountability wherein nothing exists for corporations. We might be able to make a publicly accountable internet infrastructure, but there is no similar mandate for a corporation.

paul · on May 13, 2014

If Google killed someone, the people who ordered and executed the operation would likely go to jail. The government gets away with it because they are effectively above the law, despite the veneer of accountability.

hershel · on May 12, 2014

We've heard news about google and microsoft spying on user's email. Nothing happened. That's not that different.

icebraining · on May 12, 2014

What news? We heard that they have automated indexers that use the info for showing you ads, but that wasn't "news", it was part of the ToS from beginning.

nl · on May 12, 2014

There was an incident where Microsoft went through a hotmail user's email to investigate an internal leak. I'm not aware of any similar incidents at Google.

anon1385 · on May 12, 2014

Here you go: http://www.wired.com/2010/09/google-spy/ In brief, a sex predator was working at Google and accessing the email accounts, GTalk and Google Voice accounts of minors to manipulate them.

Considering the usual public reaction to that kind of thing I'm surprised it doesn't get mentioned more often.

icebraining · on May 12, 2014

When people say that a company did something, it usually means that it was an executive decision, not the unauthorized actions of a single employee. It'd be like saying that the NSA divulged its own secrets, just because Snowden was their employee.

anon1385 · on May 12, 2014

The issue that started of this thread was:

>I've always wondered whether Google ever digs into communications in a situation where they're trying to decide whether to acquire a company. It seems like reading a company's email would be a reliable source of information about whether they're on a genuine trajectory or whether e.g. they're having trouble with their investors.

Now there are a range of ways that could happen, from Larry Page looking up the emails personally, to some middle manager involved in the deal getting their friend on the gmail team to unofficially take a peak to look for some specific thing. Things happening inside a company can happen in all sorts of ways that aren't official executive decisions.

Individuals motivations do not even have to be aligned with Google's interests[1]. Maybe they want to be able to better position themselves if the deal goes down the tubes, or get information about any employees at take over targets that might be going to replace them in their role. Internal politics at large corporations is endless. From the outsiders point of view the actual mechanism doesn't matter too much.

Also I think the situation where a sexual predator was attempting to coerce minors into sex is (or should be) a much more serious public scandal than an executive decision to look at the emails of a startup most people have never heard of and don't care about would be. I imagine many people would blame the startup for being foolish enough to use gmail and Google would deny it and it would all blow over in a few days.

[1] https://en.wikipedia.org/wiki/Control_fraud

nowlnowl · on May 12, 2014

Microsoft read some guys mails to find a leak in the company.

deong · on May 12, 2014

And it's not true that "nothing happened" in that case. Microsoft was publicly beaten up a bit, and then changed their policy to say they will no longer do that.

throwaway7767 · on May 12, 2014

> Microsoft was publicly beaten up a bit, and then changed their policy to say they will no longer do that.

What? No.

They saw the PR pouncing they were taking, so they hired a former judge to rubber stamp such things in the future, and had the gall to call it a "judicial process" (even though the guy is just a microsoft employee at this point).

deong · on May 12, 2014

This (http://arstechnica.com/tech-policy/2014/03/microsoft-will-no...) is what I was referring to. Whether Ars Technica is wrong about it, I can't say.

sheetjs · on May 12, 2014

> It's safest to assume every text conversation is public.

Sadly, people who have been saying this for more than a decade were derided as paranoid.

zubairov · on May 12, 2014

Good point, however as you already noted, it's not a single app or single chat client that can change the situation - it's the acceptance of it among the bigger community. It's not a technical but an educational topic where ecosystem leaders should take the responsibility to educate ordinary people (not just HN readers) on that. I strongly believe this kind of initiatives should have a strong legal/governmental support and/or control.

Theodores · on May 12, 2014

> I've always wondered whether Google ever digs into communications in a situation where they're trying to decide whether to acquire a company.

Go upstream to the Snowden 'allegations' and apply that idea to Lockheed Martin. They can get every government contract they want or whatever war they want because they know what their rivals bids are (because they built the NSA). Plus they run all of the computer security in D.C. and there is no way that any communication in 'elected government' misses their eyes.

PeterisP · on May 12, 2014

Reading all gov't internal communication shouldn't be a master key to getting info on rival bids.

Even sloppy bidding processes in random third world countries include the concept of sealed bids - noone in government should have any info on any bid amounts before the bidding is closed; if Lockheed Martin wants to know my bid before making theirs, then they'd have to wiretap me.

Of course, there are many other options for fraud and espionage, but getting rival bid amounts before due time shouldn't be one of them.

nutjob2 · on May 12, 2014

Google paranoia deserves its own entry in the DSM V.

eps · on May 12, 2014

I played with an idea of an off-site delivery the GMail-destined emails.

Basically instead of an actual email the recipient would get a link to an https'd page on my mail server and a brief note explaining that due to delivery policy the message is available only at the link.

The reason why I started looking at this was that I was buying a house and the broker person was using gmail to handle the transaction. From negotiation to all the forms with all juicy details. I switched him back to the fax mode, but it got me thinking that it'd be nice to have a system in place that would try and offset such negligence, automatically.

I never got past a rough prototype though, but perhaps I should've.

andreasvc · on May 12, 2014

I don't see why you would single out Gmail at this point. You're basically rejecting email as a secure medium (I don't disagree).

claudius · on May 12, 2014

E-mail between secure servers is perfectly secure (and end-to-end encryption only adds content encryption but keeps the amount of metadata generated the same). The problem is that Google’s email servers are not secure; nor are those of any other email provider. Strictly speaking, not even hosting your own dedicated server somewhere will protect you from these issues.

andreasvc · on May 12, 2014

Uh no it's not perfectly secure because if you don't use e2e encryption you only get opportunistic TLS and you can't control whether your mail will be transported over unencrypted connections. Furthermore, the contents of the email arrives unencrypted at every mail server. So you're basically agreeing with exactly what I said ...

claudius · on May 12, 2014

You get the TLS you configure the servers to use and a server that only does opportunistic TLS is certainly not a “secure” server.

andreasvc · on May 12, 2014

A mail server that only talks TLS is not following the SMTP protocol and is not a part of the global system commonly understood with the term e-mail. Maybe it would be a great idea to migrate the whole world to such a configuration, but in practice it wouldn't give me much confidence. If my server A hands something off to B for it to be delivered to C, then I have no control over whether the link between B and C is secured, so e2e is the only way to be sure.

eps · on May 12, 2014

No, it's not about rejecting email as a secure medium. It's about denying Google access to the contents of emails I sent to people with mailboxes on their system.

andreasvc · on May 12, 2014

So why single out GMail? Why not worry about Hotmail, Yahoo? How about $LOCAL_ISP_UNDER_GOVERNMENT_SURVEILANCE?

dkersten · on May 12, 2014

Reminds me of a time I had to call a company about an order of hardware parts and they wanted my credit card details over the phone. Having worked in telecoms in the past (on carrier server software), this isn't something I like doing, so instead I gave them a URL to a https page containing the details they needed. It also self destructed so you could only view it once.

A general purpose email version of this (both with and without self destruct (which should have options such as timer, N views etc)) would be awesome, especially if it automatically intercepted emails and moved them there. Would have to be something I can host myself, of course.

gargron · on May 12, 2014

If you're sending a link to Gmail, Gmail could simply crawl that link, so I don't see how that would solve anything (except awareness, possibly? At the cost of convenience).

sspiff · on May 12, 2014

You can block IPs belonging to Google, Micorsoft, ... And make the link expire after a while (you can keep the content, just make it inaccessible).

Not foolproof or without problems (there are not only crawlers and bots at these companies, there are also people who receive email), it would solve the problem in most cases.

sgift · on May 12, 2014

The problem I see is that you could edit all your "mails" to me at any time, even after I've read them. If that could be fixed somehow I think it would be fine.

klez · on May 12, 2014

Including a digital signature in the actual email should do the trick.

im3w1l · on May 12, 2014

Include a checksum in the email.

blueskin_ · on May 12, 2014

Sounds great; I'd love to have this in an interface compatible with Postfix.

cromwellian · on May 12, 2014

I think if you're really concerned about the Feds snooping on email, you need to use end-to-end encryption. Any large ISP or portal is going to be a juicy target, and since the majority of people don't want to run their own email servers, the only recourse is not to depend on trusting the servers. Even if you managed to convince everyone to leave G-Mail, they'd still congeal back into another 2-3 big services that the NSA can target.

fphhotchips · on May 12, 2014

And let's face it, if you run your own email server, it would be a full time job trying to harden it such that the NSA (and others) couldn't get in.

ds9 · on May 12, 2014

I'm pretty sure that would be a wasted effort. Basically, you could be nearly NSA-proof only if you use your own cert and are prepared to shut down the whole thing when the Feds demand private keys. But then it wouldn't be useful for general email: it will get blacklisted by various parties you would want to exchange email with, unless you use a CA.

Really the only readily-practical approach for NSA-proofing will be something that's user-friendly in the UI and encrypts end to end. This prevents snooping in transit, but still exposes metadata, and of course there are endpoint attacks.

nutjob2 · on May 12, 2014

If the NSA is targeting you, or you think they're targeting you, then your email provider is the least of your problems.

dredmorbius · on May 12, 2014

As Bruce Schneier has commented: he's not against targeted surveillance, undertaken with cause and a specific warrant.

It's bulk surveillance of everyone, everywhere and at all times, that's a problem.

Taking measures which raise the cost of surveillance helps ensure that if they get your data, it's because they really want it. Not just because they can.

anilgulecha · on May 12, 2014

This standard response misses the nuance of end-to-end encryption: If the default everywhere was EoE encryption, it becomes significantly harder for anyone to target you (for an average value of you)

harlanlewis · on May 12, 2014

Targeted surveillance always gets its man.

Global EoE encryption's benefit is making it significantly (and perhaps prohibitively) more expensive to engage in bulk surveillance.

XorNot · on May 12, 2014

Well every second start proposition on HN is "you give us access to your..."

So if the default was EoE, it still wouldn't matter.

lifeisstillgood · on May 12, 2014

this is really pedantic but why is End To End Encryption shortened as EoE?

XorNot · on May 13, 2014

No idea - I was just grabbing some terminology used in the post I was responding to. EtE would make more sense.

uptown · on May 12, 2014

Perhaps helps to explain the mega price-tags being placed on platforms like Whatsapp. If future generations are expected to rely less on email, and more on messaging platforms, then owning the dominant network in that space gives you a competitive angle to take-on Google.

rmrfrmrf · on May 12, 2014

E-mail is not and has never been a secure method of communication.

autodidakto · on May 12, 2014

Correct. Running your own server, etc, doesn't matter. If what you're doing on the internet isn't encrypted by you and decrypted by a trusted and competent recipient... consider it more or less public.

blueskin_ · on May 12, 2014

"Peter pointed out that if all of your friends use GMail, Google has your email anyway."

Peter reminds me of the old "If you have nothing to hide..." fallacy. I'd have expected more from the EFF.

Yes, anything really sensitive should be PGP'd anyway, but using gmail still gives google the opportunity to do analytics's.

ronaldx · on May 12, 2014

I also found it surprising that Peter Eckersley would be satisfied to justify his use of Gmail in this way.

Peter seems to believe that using Gmail makes his friends' privacy incrementally worse, and yet he is contributing to this problem.

What he really means is: Gmail is more convenient than the other options - there is no better solution to this. And, that's the problem.

ilolu · on May 12, 2014

Why is google being targeted with all such write ups but Facebook gets a pass. Facebook has many of my photos because it has all of yours. Facebook knows my browsing habits because all of you have have like button in your site etc etc.

noahm · on May 12, 2014

Because facebook only has my pictures if I choose to post them there. This is easy to avoid. Facebook only has my browsing habits if I choose to allow content from their servers while viewing non-facebook content. This is also avoidable, albeit less easily. Avoiding sending email to gmail users is far more difficult. Avoiding receiving email from gmail users is even more difficult. Additionally, the cost (at least measured subjectively in terms of inconvenience) of avoiding all contact with gmail users is far greater than the cost of avoiding facebook.

So, to answer your question more directly, facebook doesn't "get a pass". Facebook simply doesn't get used.

ericleung · on May 12, 2014

Not quite sure I agree with that. I think the argument is that Facebook has pictures of me as long as any one of my friends has a picture of me, and has posted it on their own Facebook.

Similar to the problem with Google having access to a large percentage of emails, Facebook will have pictures of me regardless of whether I choose to personally have a Facebook account or not. Assuming that I do have a Facebook, with a network of friends, they can also tag me in their pictures (and therefore available to Facebook) even if my own privacy settings are turned all the way up.

camus2 · on May 12, 2014

> Because facebook only has my pictures if I choose to post them there.

Facebook has your pictures when your friends that are on Facebook post pictures of you.

And even if you are not on Facebook,i'm pretty sure facebook has a "shadow account" system to track people even if they dont signup.

ben1040 · on May 12, 2014

I have a Facebook account with zero friends on it. I used it to "own" an API key for an app I built for a freelance client.

The "People You May Know" screen on that Facebook account has plenty of people I do in fact know.

I imagine through people uploading their address books and then Facebook mining shared connections, they inferred a bunch of my network without me doing anything at all.

ilolu · on May 12, 2014

I use Ghostery and block Facebook content from other sites. But what I don't like about Google or Facebook's behavior is that that none of my non-techy friends know that they are being tracked every where. And you cant expect them to understand it too. I find that behavior bad.

icebraining · on May 12, 2014

Facebook only has my browsing habits if I choose to allow content from their servers while viewing non-facebook content. This is also avoidable, albeit less easily.

To most people, it's black magic. In fact, they don't even know that such tracking is possible.

dasmithii · on May 12, 2014

I find it incredibly odd that I've never considered this before. I suppose end-to-end encryption is today's only defense against top-down surveillance.

That said, I wonder if meshnet protocol could be utilized as an alternative. Although the traditional mesh network is impractical at scale, a virtual version, or an email-serving proxy network of some sort, could be beneficial.

Well, beneficial if you'd consider keeping email off Google's centralized servers a good thing.

dredmorbius · on May 12, 2014

We had something of a meshnet protocol with regards to email previously, or at least, it was generally tenable to run a mailserver on any arbitrary IP address at one time. That ended pretty much by the late 1990s due to the ever growing onslaught of spam.

Today it can be (and often is) frustrating even for established companies to get their mail delivered to all sources. I've had repeated frustrations especially with Yahoo, but also AOL (both continue to have a large number of addresses, if not active accounts -- problems in scrubbing old email addresses is another challenge). Larger companies may have their own idiosyncrasies regarding accepting email -- even with SPF and DKIM records, I've not infrequently encountered companies (some of which, granted, do things involved making littler things out of little things called atoms) who requested (and presumably require) the specific IP address of our outbound mailservers for communications.

More generally, email badly wants to have some sort of reputation layer put on top of it, though how to accomplish this has eluded general solution (SPF and DKIM are only band-aids, and already break a lot of legacy behavior). Total encryption would be good, including of headers. It's a bit of a mess.

mike_hearn · on May 12, 2014

All major mail providers already use sophisticated reputation systems. The difficulty of calculating global reputations for the entire internet, quickly and with statistically meaningful results is one of the reasons email consolidates under the control of a handful of big companies. You really don't want to try and replicate that on your own.

Source: I was part of the Gmail spam/abuse team for several years.

dredmorbius · on May 12, 2014

The approach I've been considering for quite some time is to focus less on the bad guys than the good guys.

Any given user, and often large groups of users (a company or organization) are going to have traffic patterns which strongly favor a small number of other hubs (mailservers), in general. That's going to be, generally, high-reputation and high-value traffic. You want to ensure that it gets through. That solves most of your problem right there.

Some of those sources are also spammers or low-value -- email marketing and the like.

Everything else is, well, everything else. Might be spam, might not. But as a first pass it tends to be less valuable. Which means you've got an immediate and low-cost option: deny first delivery on a nonpermanent basis.

If it's a well-behaved system, the delivery system will-retry the transmission in about 4 minutes. If it's a spammer, odds are that it will simply bail on delivery, or fail to honor the usual retry fall-back schedule. In the first case, problem solved, in the second, you've now got an additional datapoint for the source: it fails to adhere to conventions.

All of this is happening largely at the host-to-host level, not individual senders, so that you're both getting a large level of aggregation (a new user or service transmitting through a known host isn't a blank slate, you've already got a delivery history), and the overhead is smaller.

Yes, there are also reputation and other systems (IronPort / Senderbase, now part of Cisco, for example, as well as the DNSBLs), many of which are accessible via DNS queries, though the cost of those queries for a busy system is itself considerable (you probably want to cache results, fortunately, DNS allows for that).

And all of that logic can be rolled up pretty readily within an MTA. That's one of the powers of free software: aggregating brains and experience.

adrianN · on May 12, 2014

There is something like a "email serving proxy network". It's called "anonymous remailer", but nobody uses it and the low number of servers that participate cast a doubt on the level of anonymity that is reached in practice.

https://en.wikipedia.org/wiki/Anonymous_remailer

lewisflude · on May 12, 2014

I don't really see this as a problem. The kind of language makes me think of those that wear tinfoil-hats.

I always ask myself, who cares? Worst case scenario, Google will sell this data to a government and I'll go to jail. The effort required to secure email at this point isn't worth the time or effort it'd take to maintain.

reedlaw · on May 12, 2014

Why not offer to host the email accounts of those you contact most frequently? Probably most of them couldn't host their own email server, and if you've already gone to the effort to do so, you can help them increase their privacy as well.

deptadapt · on May 12, 2014

Because running a mail server can be a pretty big responsibility, especially if you're letting others send mail from it. The more people using your server, the more important it will be to monitor for abuse and deal with abuse reports.

When I first set up my mail server I was pretty excited about being able to help everyone I know get their email away from Google, Hotmail etc. But once I had it running, I quickly realized that I didn't really want to give to very many people. Even if I trust all of my friends not to abuse, I cannot trust all of their computers.

mlinksva · on May 12, 2014

They might rather unknown Googlers have access to their email than a friend or family member.

perlpimp · on May 12, 2014

Not if you use GPG. or PGP.

PeterisP · on May 12, 2014

What would that change?

In any GPG/PGP solution that I'd use, the encryption would be automated and transparent. In practice, the web-mail-client would anyway decrypt, store and index that email for convenience - no matter what you do, if your recipients/senders use some 3rd party email service, that email service would have access to your emails after the GPG/PGP layer is removed.

What GPG/PGP achieve is defense against MITM/phishing impersonation and secrecy while in transit between email providers; what it doesn't neccessarily achieve is secrecy in storage and defense from your e-mail client software developer. Coincidentally, these are the exact same security characteristics that a gmail user mailing another gmail user has - there are no third parties in transit; gmail can prevent insertions in the middle with a faked sender; but the stored emails are vulnerable to google itself and legal requests made to them.