Hacker News new | past | comments | ask | show | jobs | submit login
How your data is collected and commoditised via “free” online services (troyhunt.com)
217 points by matthewwarren on March 16, 2016 | hide | past | favorite | 68 comments



He's on the right track, but absolutely wrong about where the bulk of this data comes from.

It's not shady "Take a free trip!" websites.

It's your Dominoes Pizza order, it's your trip to Target, it's everything you do. (A sales person for one of the big companies listed those specific groups, but said it was basically everyone.)

The big companies in the industry (LexisNexis, Thompson Reuters' CLEAR) buy everyone's data. Anything they can get their hands on.

As far as I can tell, everyone in the country's data is being stored; everywhere you have ever lived, all of your phone numbers, voting records, debts, legal history, marriages, past names and aliases, email addresses, everything.

It's all in a handful of huge private systems. Systems that were originally marketed to law enforcement agencies, but are now being shopped to private companies as well. Insant CheckMate is just a tiny tiny fraction of what is actually captured by these data brokers.

Source: You'll have to take my word for it.

Interesting tidbit -- They have alerts setup in the system; E.g. if you search for the wrong person you'll get a call from headquarters asking why you're pulling up Kanye West's home phone number, etc.


OK, I'll take your word for it.

Everything I do is being tracked, recorded and passed around by companies big and small. Yep, got it.

Anyone care to speculate on how we actually change this situation? Or is it that there's really not much utility in privacy for most people, so we can just forget about it permanently. I realize that the paths to change are either regulatory or technical (and I find the latter more likely), but seriously: paint the picture.

Should we just accept this as the norm?

(edit: grammar)


It's not a dichotomy between being able to do something about it vs. not getting utility from privacy.

Privacy is extremely important. We get lots of utility from privacy and in fact require it to reach states of adequate mental and physical health.

But people with guns and money can prevent us from having it, despite how much utility it is, despite that it is considered a basic right, and regardless of whether we want to change it.


That's an interesting way to think about it. It's true that in our daily lives we absolutely depend on privacy and are aware of it being compromised.

Perhaps therein lies the real problem: for the most part these oversteps are barely noticeable. 'Doing something' amounts to a kind of activism. I can accept that as the necessary course of action at this point in time. Working for the right to privacy in the future could be analogous to protecting natural resources.


Not accept, but adapt and learn to live with it.

I'm saying this because I don't see any solution. Our data can be sold. Our data can be 'shared' by our less educated friends,etc. Our data can be 'leaked' through all kinds of breaches/hacks/etc from both private companies or even governmental(incompetence, old software).

Treat our public records as public, as an interface to the outer word. Why do we think that our names, emails, phone numbers or addresses are private? What's the point of them being private?


Avoiding abuse, that's the point, obviously. There is sensitive data very susceptible to abbuse, because their privacy is crucial to many privacy schemes, but not well protected.


Pay cash.

Use anonymous prepaid credit/gift cards (bought with cash) online.

Delete your cookies,

Browse in incognito.

Disable Javascript.

Block ads.

Don't install the social media app (and if you do, definitely don't give LinkedIn/Facebook/whoever unfettered access to your contact lists and email accounts!). Use the mobile website (in mobile Chrome, in incognito. Yeah, you'll have to log in more often...)

"Dress up. Leave a false name. Be legendary. The best PT is against the law, but don't get caught. Art as crime; crime as art." - Hakim Bey, "Poetic Terrorism", T.A.Z, 1991


Inject noise into your corner of the internet.

Or pay somebody to do it for you.


This isn't a popular question, but is there really a problem?

I've heard the big-brother nightmare scenario type answers before, but I'm curious to see how they are more likely to negatively affect me than a car accident or slipping in the shower.

Even with government agencies (NSA/FBI) overstepping or trying to overstep their limits, isn't the root problem with policy makers keeping them in line?


I will answer you with a lengthy anecdote. A couple of years ago I wanted to buy something from a popular online shop. The shop offered various payment options. I won't bother with details, so let's make it simple: option A was no risk on their side but time consuming, while B was higher risk but fast procedure. I was time pressed, so I was happy to see B in their FAQ. After signup and checkout there was no option B for payment. I thought it was a bug in their site and sent them an email.

I then decided to have it shipped directly to my parents (I wanted to use the item over the holidays) and set up an account in their name. After checkout, option B was suddenly available. Yay, they fixed it. I made the order.

Back to my account: option B was still missing. Huh.

I then made several new accounts using various addresses over the city. The pattern that emerged was that in more affluent zip codes option B was available, in poorer parts it wasn't.

Several days later, my email was answered: they claimed not to offer B to any new accounts on principle. This clearly wasn't the case.

My sample set wasn't big, and maybe it was a bug or A/B testing plus coincidence, but it made me think.

Big brother will not kill you or bite you with big fanfare. Instead, you will be struck by inconveniences. Online shopping will be more pleasant for some and maybe even impossible for others. That flashy item you want will be available for others but you will be told that it's out of stock. All of this will be explained away as mistakes.


What? It sounds like the problem is they didn't have enough information about you.

If they had more info, they could avoid blanket policies such as avoiding riskier shipping options with all new customers, and use with customers they can prove are not fraudulent.

But here's my anecdote:

I bought a metal collar extender at Macy's with my credit card. It was an impulse buy at checkout and I have never searched, emailed, liked, etc. this product in my life. I get on Amazon a week later and I have a recommendation carousel of metal collar extenders. I was so surprised I even checked my email for a receipt to see if that's how Amazon retargeted me [there was none].

But then I just stopped caring, because I realized that if anything Amazon was incurring an opportunity cost for not recommending me something more useful, and I have more important things to do with my time than to worry about how I'm being retargeted on Amazon.


> What? It sounds like the problem is they didn't have enough information about you.

That is part of my point. It will never be perfect, but perfect enough to get a benefit for businesses. If this leaves you as part of a small percentile at the road side, that is your problem not their's. In the end, you may be regularly unfairly disadvantaged because of it nevertheless.


This is an example of a Nirvana fallacy - where you are saying it should be abolished because the "perfect solution for everything" is unattainable. If anything, your anecdote proves that should be made even better, not abolished.

Besides, the business lost a customer with you, so it's obviously not perfect enough for them either.


It is incredibly easy to see where your anecdote could go wrong. What if you had bought something much more personal or embarrassing?

Condoms? Plan B? Dildos? Lube? Imagine having to explain to a family member sitting next to you why those items are showing up in your Amazon recommendations.

I recall an example a few years ago about a store's marketing system telling a father that his teen daughter was pregnant before she ever told him. Because the store's system had correctly inferred from her previous purchases that she was pregnant and started giving recommendations for stuff like diapers.

Suppose a homosexual in the closet was buying sexually explicit things or an atheist in a religiously strict home or country was buying blasphemous materials.

Ironically, your anecdote presents much worse scenarios than the parent comment's anecdote.


Slipping down the slope of embarrassing adult toys shows a lack of understanding of how these networks work.

Again, this isn't a problem with retargeting. What if I just bought sex toys directly on Amazon and they made product recommendations for it while I was browsing at work? Or what if Facebook and Google showed me adult ads while reading Forbes.com?

That's on Amazon et al for displaying adult sex toy recommendations. Same as your nightmare scenario of where I'm a persecuted homosexual in Saudi Arabia, and Amazon makes homosexual product recommendations to me while the religious police man is looking over my shoulder.


"That's on Amazon" but you're bearing the consequences. What consequences would there be in the nightmare scenario for Amazon? None. They get the power and not the responsibility.


They have your problem now and already address it. It isn't unique to retargeting.

They lose customers. Ad networks lose websites. Walled gardens lose users.

That's why you don't see promoted dildos on Amazon, Forbes, and Facebook even now - and why your nightmare scenario is just a bad daydream about rubber cocks.


I replaced a leaky kitchen faucet with one I bought online. For weeks afterwards, banner ads for that exact same faucet kept appearing everywhere. Did they think I had two kitchens?

I notice the targeted advertising all the time, and I've never been tempted to buy. Their predictive algorithms on what I'd like to buy next are hopelessly wrong and often (like the faucet) simply incompetent.


Is that same credit card saved in your Amazon account?


> Big brother will not kill you or bite you with big fanfare. Instead, you will be struck by inconveniences. Online shopping will be more pleasant for some and maybe even impossible for others. That flashy item you want will be available for others but you will be told that it's out of stock. All of this will be explained away as mistakes.

A boring dystopia? https://news.ycombinator.com/item?id=10787650


It is not enough for a cyberpunk action movie. The consequences for society are chilling, however. Picture a society in which everyone has even more reason to behave and live in a certain way just so they don't get misjudged by some algorithm.


Hasn't it always been like that? Just replace "some algorithm" with "social norms". I don't see a qualitative difference.

Wait, actually, one difference. The algorithm adjusts much quicker than social norms, which is why it seems to affect you more. Changing of social norms is usually only observed by people who are older than the average HN reader.


I had a similar thing just recently. I was sitting in a hackerspace trying to book a hotel. It wasn't possible there because they wanted a credit card (just as safety if I don't show up, I was going to pay in the hotel anyway).

Since I don't have a credit card, I IM'd a friend if I could use his credit card for the booking. I went over to his place, and when I did the booking process at his home, suddenly I could book without a credit card.

My hypothesis is that the hotel website didn't trust me in the hackerspace since I was connecting to them over a VPN.


That has nothing do do with collected data. You gave them address, the matched with census, they offered better option if your address was lower risk.


That has everything to do with data collections. It won't always be something obvious as poor zip codes. When data of many people is collected it enables the establishment of certain behavioural types. Then when you come along with the few data points they know of you, the gaps can be filled. What goods you buy, travelling behaviour, waking hours can all be used to characterize you. Regardless of whether this labelling is fair.


[flagged]


In the U.S., it is an illegal practice in the context of lending.

It's also leads to really ridiculous situations. For example, living in Brooklyn, NY made it impossible to rent a car in the 90s without an existing relationship with a rental agency. Living in a black neighborhood made it impossible to get a mortgage.


I'm aware that we have a multitude of (often confused and nonsensical) laws, which in some cases force businesses to ignore data and make bad decisions.

My question; what argument can you make in favor of enacting such laws if they didn't exist? And if you make such an argument, does it also imply that small females should ignore the fact that Delhi is the rape capital of India? Or that pizza shops should put their drivers in danger of robbery?


It's all a part of the governance process.

In NYC, a decision was made and actions taken to improve the safety and security of the environment so that a woman can safely walk the streets unmolested.


>I've heard the big-brother nightmare scenario type answers before, but I'm curious to see how they are more likely to negatively affect me than a car accident or slipping in the shower.

You aren't until you are at which point it is too late. "First they came for..." yatta yatta.

Power is often only given in small, seemingly harmless chunks. When someone tries to take too much power - people take notice and prevent it because they know what others often do with too much power.

The problem as I see it is that too few people notice the small chips and blocks of power they willingly give away until they realize they've given away too much and no longer hold the majority of the power. Only then do they see the foolishness in what they have done, but by then it is already too late: they are powerless.


We're possibly months away from a Trump presidency, and you're not worried that this sort of power exists in the world? Trump aside, the general trend is increasingly fascist. Maybe you're safely ensconced in the glitterati, but I am a known leftist with brown skin; I'm going to worry.


Come on, even this nightmare scenario isn't that nightmarish. The POTUS commands the nuclear arsenal, which is arguably more powerful than knowing my shopping habits.


Depends. The POTUS is not likely to use the nuclear arsenal against you. Your personal data, though, is much more actionable from his POV.


Yes, because I was implying Trump would nuke the U.S. /s

You have only to listen to his rhetoric on foreign policy to connect the dots about how giving him power is potentially harmful to the U.S. Unless you think he's going to try to sell everyone steaks and golf course timeshares as the POTUS, then I'm worried.


Find an alternative to ad-supported business models maybe. Isn't much of this data used by/for the advertising industry?


You mean, create an alternative to broadcast, mail, and free services, and migrate the entire world off of it?


If it's legal to sell, it's for sale. Credit cards definitely sell your info.

People also forget about "public data". Student organizations (sport tournaments, student bands, chess club, etc) and middle school/high school/college graduation lists are big ones for targeting young consumers. Facebook/Twitter is just the tip of the iceberg. Hyper-targeting is very real and companies know way more about you than you think.

Pay cash and be mindful of what data gets online.

* On a more positive note, I'm excited to see how digital personal assistants integrate this kind of data for personalization.


There's a lot more (Axciom, Neustar, etc.). Just check out one of these Lumascapes--the data companies are slightly off to the right near the top in the DB Marketing, DMP and Marketing Data sections [1].

Advertisers and publishers can also sell this data directly on exchanges (a relatively new thing) like Adobe's Audience Marketplace [2].

[1] http://www.lumapartners.com/lumascapes/marketing-technology-...

[2] https://blogs.adobe.com/digitalmarketing/advertising/source-...


A lot of how personal data is collected and sold is detailed in Bruce Schneier's "Data and Goliath" (http://www.amazon.com/Data-Goliath-Battles-Collect-Control/d...) and confirms what you're saying.


link to amazon

kinda ironic


I'd wouldn't say the bulk comes from large stores vs. 'win-a-trip' type sites. Its both. I buy these leads on a regular basis (many millions a year) and I'd say where the most comes from depends on country primarily. Also businesses like Dominoes typically dont sell identifiable customer data, they will often present your company as an offer either via EDM, newsletter or ever increasingly post-transaction via lead aggregators.

When Dominoes/Target type organisations do sell data it's typically done in a non-PII (personally identifiable information) format, for legal, ethical and brand reasons. This way companies can utilise the data do things like understand purchase personalities/habits by postcode type thing but they dont directly know who you are. There is another subset of companies that claim they can take this non-PII data and re-match it to people but I'm skeptical about how accurate this is.

Another an area not mentioned of personal darta storage that is every growing is programmatic media. In this ad servers attempt to identify you and call on stored information which is then used in a stock market like real-time environment to bid for ad views. Like when ads follow you around the internet, but for every sophisticated information/algorithmic rationale on who you are. Nothing nefarious but marketers want to show ads that are likely to be interesting and result in a sale (or other company goal).


This is pretty spot on. It's also good to note that marketers like me don't get your name and everything about you. We get anonymous and aggregated data. Eg. I have no idea if you John Doe personally buy dominos. I'm target 50k people that like dominos, also like cars, and live in the Bay Area.


I'm not talking about marketing systems. I'm talking about systems full of PII -- like Thomson Reuters CLEAR -- which was designed for law enforcement but can be purchased by private companies as well.

http://r1.officer.com/files/base/image/OFCR/2013/04/16x9/640...


I disagree. From what I've seen, Dominoes and Target sell your PII to companies who make systems primarily marketed to law enforcement, but are also available to private companies.


> everywhere you have ever lived, all of your phone numbers, voting records, debts, legal history, marriages, past names and aliases, email addresses, everything.

A lot of these things are publicly available from government entities. Voter records (available from sites like voterfind.com) include your address, and phone number in some locales. Marriage licenses are public record. If a debt goes to court, it becomes a public record. Any and all legal incidents are public record. Property taxes are public record.


Yes, technically. In the US your SSN, your mother's maiden name, and all of your school records are public data under many circumstances.

That doesn't mean they get used or treated as such.


Depends on the law. Most or all of these examples are not public record in Germany.


I saw a (I believe) 60 minutes story about WalMart a while ago and they stated they stored everything at line-item resolution for 2 years of sales. With their volume it's an amazing amount of data. UPDATE - did some quick googling and that was 10 years ago. I have no idea how big their DW is now (was 583 TB then)


The one thing about Walmart though, and many of the largest retailers like Kroger, is that that market basket data is far too valuable to allow it into the hands of anyone else - and they won't/don't sell it.

Other smaller retailers, including Target and hundreds of others, will pool their transaction data into blind coops like Abacus, or syndicate it via companies like dunnhumby.


If you play games with Steam, someone is recording your play sessions. It's not actually much different from graphing when your Facebook friends are asleep[1].

Valve, being an NIH shop, use a UDP call instead of an API like most groups[2]. At a minimum, you can get the Steam name of every player on any server.

[1] https://news.ycombinator.com/item?id=11130688

[2] https://developer.valvesoftware.com/wiki/Server_queries


That's true about "the bulk of the data ( … that is collected and comoditized by "free" websites … )" sense, but Troy's article is talking about a specific breach he'd been given and his investigation of it's providence. _This_ bunch of data had, with a fairly high degree of certainty, come from shady "Take a free trip!" websites. (Admittedly, the headline is misleading there, but that's typical of headlines...)


> Source: You'll have to take my word for it.

Fwiw, I agree with you. ;)

Source: Someone tried to sell me this sort of thing once as part of an age verification service.


Haha so why not sell this to dudebros worried about getting brought up on statutory rape charges? Or is there already an app for that?


Because I have something resembling a soul?

Iirc, their marketing said they were ~95% accurate.


Well I think it might be a valid public service to keep people from committing statutory rape. Or are your worried about the effect on your soul when 5% of the time someone has underage sex and another person goes to jail?


> Well I think it might be a valid public service to keep people from committing statutory rape. Or are your worried about the effect on your soul when 5% of the time someone has underage sex and another person goes to jail?

That is what I'd be worried about. I gave them false assurance and I'd feel responsible.


Honest questions: How do you explain to your potential buyers (or govt agencies) when they ask you where you get the data from? "Various sources"?

I mean, sure, you can collect data however you want, but I don't think any legitimate business want it unless you collected it legitimately. Or am I wrong?


I have found https://donottrack-doc.com [made by the NFB] to be helpful in thinking about data collection. If I have time at the end of this semester I'll be taking my first-year students through the first episode.


He says it in the article but it walkways bears repeating: if you aren't the customer, you're the product.


No, even if you are the customer, say at a supermarket, they're still going to take your purchase data and sell it. Supermarkets are especially sneaky, they'll charge you extra if you refuse to be tracked via a "loyalty" card, and then they market that extortion as savings!


Supermarkets will give you as many 'loyalty' cards as you want. And you can put false info on them.

I assume they can still track your purchases by credit card #.


Handy tip: in the US, enter (nnn) 867-5309 (where nnn is the local area code). There's a real good chance someone already has a card assigned to that number.


Actually, BevMo is now being much more brazen about it. They scan your driver's license under the guise of age verification.

I'm several decades past 21. There's no way in hell that I'm an underage drinker. When pressed the last time I was there, I was told that age verification up to the age of 50 is store policy.


Along the same lines --

In bars occasionally young women will come in who will give you free packs of cigarettes in exchange for scanning your ID. I've heard, mind you, this could be completely false, the cigarette companies sell that data to insurance companies.


Ha, I don't use loyalty cards because of the tracking but I never thought of the price difference that way, it makes sense.


Just disappear. Gradually, a little at a time. Diversify identities. Take your friends with you. Make it a game. Many games. What's left public is just the boring stuff. Make it as normal and unremarkable as possible.


I've toyed with the idea of vanishing completely, but nowadays employers often want to see a web presence.... if it weren't for that, and the convenience of Facebook Messenger, I might have disappeared already.

I was just thinking about how since one usually uses the same identity for many different accounts and interests online, it would be very easy to find someone by looking for people that match those specific sets of interests. You would have to make separate identities for every sort of interest, and never mention the other from a different profile.


Well, most people on Facebook just share links. So it's not hard to look normal. Maybe a little more boring than most, but hey, you work hard and all ;)

If you're going to use multiple identities, they shouldn't have overlapping interests, or common friends, accounts, etc, etc. And they shouldn't ever refer to each other. However, it's fine to have a bunch of identities that break all of those rules, just for the fun of it. You have unassociated clouds of associated identities.

Dig into Mirimir, if you're interested. There are several more-or-less associated identities. Some of them are obvious. Some would take a little work to find. But then, Mirimir is just a relatively superficial pseudonym.


most other countries listed have laws that completely drop the "we can sell your data" clause on the site.

If you can find a presence of the company in such countries (and for most you have to provide some sort of tax id to buy national domain sulfix) then you can enter unique data there, and call a lawyer. profit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: