Hacker News new | past | comments | ask | show | jobs | submit login
What Surveillance Valley knows about you (pando.com)
66 points by eggspurt on Dec 27, 2013 | hide | past | favorite | 19 comments



The Google bait in the lede and the failure to address it at all is wonderfully classic Yasha Levine. Here is the link from the article on the event that caused this- it's slightly more succinct:

http://blogs.wsj.com/digits/2013/12/19/data-broker-removes-r...


Hmm, would it would be possible to actually detect if Google's got information about a visitor - by creating a web page with google ads and then snooping what targeted ads are getting shown (which is possible through DOM traversal)? This way, information can leak.


Maybe users could install browser extensions, and coordinate to gather datasets of which ads are displayed to whom. Cross-correlating this with personal information on the users (volunteered to a trustworthy third party, vetted by EFF lawyers), you could make inferences as to what Google probably knows about which user. And may provide a service to warn the user, in a privacy-respecting way, and give advice as to how to avoid further privacy invasions.

Ironically enough, it might be illegal to do this, since it would injure Google's corporate privacy ("trade secrets").


If the ads are shown in an iframe on a different domain you won't be able to traverse them due to cross origin restrictions.


The browser itself and its installed plug-ins are not subject to same origin policy. You are thinking of the restrictions on the original site. If this were not the case then the browser woud be unable to download content from any site but the one in the browser's URL.

I came here to say that this article is dynamite. I had not followed this story before, but it is at least as important as Snowden / NSA. However, the article loses its force from the middle and the call-out to US liberal politics towards the end is just pathetic. It is as likely that those who need to take action on the data industry are from the right, libertarians, or simply not aligned with parochial US political divisions.

Edit: re-reading the thread, the SOP objection is right. A plug-in is required.


Malware that can hijack the users' cookies can do that, as described in https://news.ycombinator.com/item?id=6950758 at 15m40s (http://www.youtube.com/watch?v=xv6K2GqyijM#t=15m40)


"Malware that can hijack the users' cookies" is a little different to "which is possible through DOM traversal" though. I wanted to address the point in case some got the impression just using the DOM via JS would allow it.


Lets be realistic about how the "rape sufferers" incident happened. They started with some list of standard medical conditions joined against users:

    condition       user
    dermatititis    1
    rape            2
    athletes foot   3
Then they did this:

    SELECT UNIQUE(condition) FROM user_sufferers

    <h1> Lists for sale </h1>
    {% for condition in conditions %}
      <li> {{condition}} sufferers </li>
    {% endfor %}
Fundamentally no different from

    ["Keep Calm and {{verb}} a Lot" t-shirts for verb in list_of_verbs]:
http://singularityhub.com/2013/03/20/keep-calm-and-rape-a-lo...

The fact that things like rape, pregnancy [1], etc are detectible by computers makes great headlines. But we shouldn't act as if MedBase was deliberately targeting [{{condition}} sufferers for condition in hot_button_list] - that's fundamentally dishonest.

[1] See the case when Target knew some teenager was interested in products X, Y, Z, where the correlation was driven by the underlying causation of pregnancy.


Are you saying you know this for a fact or are you speculating a benign explanation without any genuine knowledge of this particular case (by referring to the keep calm t-shirts which are unrelated and irrelevant to this situation) ?

edit: Plus I've never heard of "rape" being a diagnosed medical condition - why would that be in a list of conditions ? I find your 'explanation' quite troubling.


I'm saying this is the most likely explanation. I should have been more clear on this point. Too late to edit, unfortunately.

Incidentally, here is a cached list of the things they sell (the site is now not displaying them):

http://webcache.googleusercontent.com/search?q=cache:dI3aEj0...

http://webcache.googleusercontent.com/search?q=cache:8T2YX7h...

http://webcache.googleusercontent.com/search?q=cache:aRARHll...

I stand by my claim that they took a list of medical billing codes or something like that. Really, you think a human decided to compile a list of "Bioterrorism sufferers", "Smallpox sufferers", "Bites and Stings sufferers" or "Polio Sufferers"?

However, I'm now convinced their first step was less sophisticated than I suggested. I think they didn't take a list of users joined against billing codes, I think they just took a list of billing codes.


> edit: Plus I've never heard of "rape" being a diagnosed medical condition - why would that be in a list of conditions ? I find your 'explanation' quite troubling

The SNOMED CT coding system has an incredible number of coded clinical terms. Including rape:

http://phinvads.cdc.gov/vads/http:/phinvads.cdc.gov/vads/Vie...


The condition it lists in your example is "Rape trauma syndrome" - that would be an acceptable medical diagnosis as it is the trauma associated with being raped (and medical support and intervention could be necessary to deal with this). The rape itself is not a medical condition, and I think that definition listed makes that clear.

I still would not like the idea of a list of rape trauma victims being sold as that almost by definition would add to the trauma they have suffered if they knew this happened.


Sorry, I should have dug a bit deeper for the exact code. SNOMED CT covers non-clinical terms along with clinical ones. It covers tons and tons of 'stuff' from measurements, drugs, finding areas, diagnoses, substances, social contexts, foods, etc. The point was to create a coding system which captured more than just clinical information.

SNOMED CT is used to build 'clinical statements' which are combinations of SNOMED CT-terms to help build a more in-depth patient record. Many of those terms are non-clinical, but can be used in combination with clinical-terms.

Here's 'Victim of Statutory Rape': http://bioportal.bioontology.org/ontologies/SNOMEDCT?p=class...

I run a company that develops a web-based medical practice management system. We use NLP techniques combined with the SNOMED CT database (and others) to extract clinical-statements from all text inputted against a medical record. Whether it's a complaint, treatment, diagnosis, or even an email to the patient. So I don't doubt for a second that there are other systems out there that do a similar thing and have auto-associated patients with various clinical and non-clinical terms.


So, criticizing that a company collects and indexes sensitive medical data of individuals without insight in what the are producing and selling is "intellectually dishonest" when we single out the fact that they are selling rape victim data collections?

Just explaining how things might have come to be, as you do in this strawman exercise, doesn't make it acceptable, or does it?


The article attempts to imply that selling a list of "rape sufferers" was deliberate: The company pretended like the list was a huge mistake. A MEDbase rep tried convincing a Wall Street Journal reporter that its rape dossiers were just a “hypothetical list of health conditions/ailments.”

Given the existence of "bioterrorism sufferers" and "polio sufferers" on the list, this explanation seems almost certain to be true. They stuck every medical condition on the list in order to boost their SEO, and they didn't realize that their list of conditions included hot button topics.

If you want to argue that data brokering is unacceptable, do it honestly and without pretending they are selling lists of rape victims and lepers.


"How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did" --http://www.forbes.com/sites/kashmirhill/2012/02/16/how-targe...

This event is referenced in TFA. I remember reading it when it first came out and was (a) amazed at the ability to analyze purchasing data and come to conclusions such as this and (b) worried.

Previously, I never really cared for carrying cash so I used a credit card for almost everything I purchased -- online, of course, but also in person. Nowadays, I pay cash anytime I can: at the gas station, the supermarket, at restaurants, etc. I also stopped using my "loyalty" card at the supermarket and obtained a new one that was not connected to me in any way.

I'm not buying anything I shouldn't be, of course. I'm just of the "it's none of their business" mindset. This has carried over into other parts of my life as well, such as logging as little personal information as possible at work (ISP) and retaining it for as short of a period as possible.


    > Normally, such detailed health information would fall under
    > federal law and could not be disclosed or sold without
    > consent. But because these data harvesters rely on indirect
    > sources of information instead of medical records, they’re
    > able to sidestep regulations put in place to protect the
    > privacy of people’s health data.
What "indirect sources of information" could be used to identify thousands of rape victims?


One simple way is search data. If you have the data from a browser toolbar, you could mine for searches like "chances rape victim getting pregnant", "what is a rape kit", etc. and visits to certain pages of WebMD. This will be plenty accurate to use for targeting.


Is it possible to purchase all the information they have on you? I'm curious to see the extent of what they have on me.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: