Hacker News new | past | comments | ask | show | jobs | submit login
Netflix Prize 2 Cancelled Due to a Lawsuit and FTC Inquiry (netflix.com)
103 points by gaika on March 12, 2010 | hide | past | favorite | 71 comments



Plus one to Netflix for making this the first sentence:

"This is Neil Hunt, Chief Product Officer for Netflix."

No vague hiding behind an unidentified team or blog. A person has chosen to identify himself with the decision, and the company chose to present it that way.

The general tone is also positive. This is a good way to communicate.

Contrast with Amazon's anonymous whine on their kindle blog when they gave in to MacMillan. Not signed by a person, not even "the team." And full of blame and "you'll see!"

http://www.amazon.com/tag/kindle/forum/ref=cm_cd_et_md_pl?_e...


Comparing Amazon's official communication with customers to Netflix I think Amazon wins.

Looking at Netflix's blog they have 3 posts for the entire year.

Looking at Amazon's discussion forums, there are a swath of official announcements, including one that informs customers how to access 2 million free books outside of Amazon.

Sure, having a name in front of the post is nice I guess, but in terms of consistent and valuable customer communication I think Amazon wins. Also here's an example of an apology from Bezos himself

http://www.amazon.com/tag/kindle/forum/ref=cm_cd_ef_tft_tp?_...


I agree that a person identifying himself was admirable. But I'd rather he identified himself, and then went on to say how he really felt about it (I'm assuming here they were more disappointed to close it then he let on).

This is professional - but it also lacked much backbone.


True, although I prefer if the real person has a real job title.


Another case of lawsuits stifling innovation.

And I agree with one of the comments on that post - why doesn't Netflix have people opt-in to have their data anonymized and used for this purpose?

Edit: replaced bureaucracy with lawsuits


I think bureaucracy had little to do with it. As I recall, the first Netflix contest did indeed expose private, and sometimes embarrassing, information.

This was a privacy issue, and something legitimate to address.


Right, but I find it astonishing that it was necessary to cancel the competition altogether.

I find it surprising that there wasn't a middle-ground which anonymized or didn't include personally identifying information.

I feel the lack of a middle-ground is probably due to the lawsuit plaintiffs refusing to back down (presumably for financial gain) and thus causing Netflix to hedge their risk.


privacy issues dont necessarily need to be dealt with by lawsuits. Most lawsuits are about money.


Not to quibble, but lawsuits are also not bureaucracy.


Good point, was at a loss for the right word when I commented.

This kind of behavior by lawyers is just like patent trolling and I couldn't pinpoint a word for that. Maybe privacy trolling? The lawyers are the same ones that sued Facebook for Beacon.


It breaks simple random sampling meaning that you're optimizing purely for some unknown subset. That becomes a problem when someone someone realizes that Netflix is spending millions of dollars to have a recommendation engine that ends up being racist.


While it's not entirely the same situation, perhaps you should read up on the http://en.wikipedia.org/wiki/AOL_search_data_scandal.


why doesn't Netflix have people opt-in to have their data anonymized and used for this purpose?

Selection bias?


Selection bias would mean people who opt in get better recommendations... sounds like a good incentive and a virtuous circle to me.


selection bias would only be a concern if you were trying make inference on all netflix customers (not just the ones that opt-in).


good idea...opt-in and we can improve our recommendations to you.

People worried that some geek will find out they watched Who's the Boss reruns will miss out.


You mean 'inquiry'


Is not "inquiry" and "enquiry" identical in all respects except spelling?


Enquiry is asking someone else a question. Inquiry is looking into it yourself.


What is the protocol for which words to begin capitalized in a headline?


I think the title used to say "enquiry."


Netflix Prize 3: How to create a meaningful database that is impossible to de-anonymize.


Why not just have the customer opt-in/out of 'algorithm participation' per movie selection?

Likely, Netflix has the power to start it's own social network platform based on it's existing users' movie preferences. Some user's might be proud to say that they rate specific movies highly, and correlate to particular algorithms based on collective opinions.

For instance: 'Click here to add user X's preferences to your algorithm for Y genre'


Funny, yes, but I honestly think they should consider this.


It's been a while since the flurry of de-anonymizing data papers back in 07-08, but my takeaway impression was that you can't truly de-anonymize large datasets without destroying its utility.


There's lots of ongoing work on the topic, new conferences, etc. One of the keywords is "privacy preserving data mining".


Indeed.

See this: http://godplaysdice.blogspot.com/2009/12/uniquely-identifyin...

I imagine gender and DoB factor in heavily to something like a recommendation engine, and I'm sure zip code would come into play when trying to get those last couple percent as is the case with the Netflix prizes.


Thats too bad. Sounds like an interesting research area.


How does sharing the Netflix data violate privacy? With the hundreds of thousands of records it would be difficult to trace information to an individual. The US needs more engineers and fewer lawyers.


It seems there are enough engineers, or at least there are enough to reverse the anonymization and reveal details that customers rightly considered private.

"Difficult" to de-anonymize is not enough. It must be impossible, and the burden of risk must be on Netflix, not the customer. We're sympathetic in this case because the contest is innovative and interesting. Imagine a slightly different story. In this one, the FBI asks Netflix for "anonymized" information, then de-anonymizes it and starts wiretapping people considered "suspicious." I think we would be rightly appalled at the idea of the government monitoring the movies we watch, and we would criticize anyone handing over the information they need to do it without asking or notifying us.

It's the same thing here, and Netflix should ensure opt-in with full disclosure of the potential for de-anonymization for the same reasons.

When I sign up for a service, I don't expect the vendor to publish data of my transactions to the entire world, even if they claim it is anonymized.


Netflix is also facing problems because there is a specific law (the Video Privacy Protection Act of 1988) that makes it illegal for them to reveal data that could be linked to service subscribers. In this case the burden of proof would fall on Netflix to prove the dataset could not reveal a single link between subscriber and rental history.


I agree. (although Id still rather see more engineering and less lawyering). Netflix does need to get people to opt-in.



It's almost always surprising just how much private data can be exposed after data has been "anonymized."

When you're using private data in any public way, or even with "partners or affiliates," you need to be very careful, watchful and responsive.


I get "Forbidden" for both those links. Are there any other links to those papers? I hadn't heard of this result and I'm very curious.



"Let us summarize what our algorithm achieves. Given a user's /public/ IMDb ratings, which the user posted voluntarily to reveal /some/ of his (or her; but we'll use the male pronoun without loss of generality) movie likes and dislikes, we discover /all/ ratings that he entered /privately/ into the Netflix system" (em. orig.)

One's "political orientation maybe revealed by his strong opinions about "Power and Terror: Noam Chomsky in Our Times" and "Fahrenheit 9/11," and his religious views by ratings on "Jesus of Nazareth" and "The Gospel of John." Even though one should not make inferences solely from someone's movie preferences, in many workplaces and social settings opinions about movies with predominantly gay themes such as "Bent" and "Queer as folk" (both present and rated in [one individual's] Netflix record) would be considered sensitive"


Interesting thanks for the link. However, seems like you need to be able to cross-correlate with another database in order to de-anonymize. If netflix made a good faith effort to protect privacy, how are they liable?


If I am your bank and I make a "good faith" effort to protect your money from robbery, how am I liable if I am robbed of your money?

For certain types of agreements, good faith is not enough. Netflix chooses to go into a business where it is privy to private information about its customers. The onus is on Netflix to protect that information.

I would say the same thing had hackers cracked their security and made off with the data. Good faith efforts that fail to secure the data are not enough, they must succeed in protecting the privacy of their customers.


>>If I am your bank and I make a "good faith" effort to protect your money from robbery, how am I liable if I am robbed of your money?

Banks take the precautions ahead of time (deposit insurance) against robbery-thats part of the "good faith" of protecting a users money. Nice try.


> Banks take the precautions ahead of time (deposit insurance) against robbery

Are you arguing in support of my point? Banks insure themselves against robbery precisely because they are held 100% accountable for it to the customer.


Insuring the deposits is not a good-faith effort to prevent the robbery. It is a guarantee that the customer will not be affected by a robbery. The difference is that the insurance guarantee is 100% solid--regardless of what occurs to the bank, the customers can get back the money in their account.

Netflix's attempt to anonymize the data is not a guarantee that the data will remain secure. It is merely an attempt to secure it, not a guarantee that the customer will not lose control of their data. In your example, this is comparable to locking the vault, not to buying deposit insurance. Note that banks are required to get FDIC insurance regardless of how tightly they lock their vault.


Again, I reiterate: the bank robbery example is irrelevant. Netflix anomyized the data from within their own db. You need to cross-correlate with another db to de-anonymize.

A better example: a customer loses their drivers license/bank card and a thief finds it calls a bank to do transactions, using the info on the card to verify identity.

A bank can only do so much to protect their customers. If someone is willing to leave their info lying around, there isnt much that can be done.


> A better example: a customer loses their drivers license/bank card and a thief finds it calls a bank to do transactions, using the info on the card to verify identity... A bank can only do so much to protect their customers. If someone is willing to leave their info lying around, there isnt much that can be done.

I don't understand this argument at all. Are we talking about customers leaving their information "lying around" or customers entrusting it to Netflix? I entrust my money to a bank: I give it to them. Customers entrust their transaction history to Netflix. Nobody is talking about customers leaking their private information to third parties, we are talking about the equivalent of the bank publishing the customer's driver's license information in a newspaper with their name obscured.

p.s. Now that I've established that I don't think this example is particularly relevant, I will share a story: Several years ago I returned from vacation to find a phone message from my bank. Someone had used what appeared to be my ATM card to withdraw $1,000 from my account while I was in Honduras. Of course I was the victim of some kind of skimming and cloning operation. The bank reimbursed me in full.


>> we are talking about the equivalent of the bank publishing the customer's driver's license information in a newspaper with their name obscured.

Its more like the DMV publishing driving record history with any identifying info removed, and then these people putting up some info on their driving record (along with identifying info) on a second website/db. Then someone can uses this second website (and the cross correlation algorithm) to id their record in the DMV publication and get more info on their driving record. Had those people not put any info on the second website their record could not be identified in the DMV publication.

Im not saying Netflix is innocent if they knew that such a cross-correlation algorithm existed. Im just saying that I dont want to live in society where a company can be sued for everything that can go wrong, atleast when that company is taking every precaution using state of the art knowledge. If netflix failed to hire db security experts to notify them on this possibility, then yes sue them. I dont know enough about db security to say if this hole was known when netflix launched the contest.

>>The bank reimbursed me in full. Nice.


> I dont want to live in society where a company can be sued for everything that can go wrong

Well, it isn't for me to argue with what kind of world you want to live in. Those are your authentic feelings and you're entitled to them.

> If netflix failed to hire db security experts to notify them on this possibility, then yes sue them.

I don't know how lawsuits actually go, but in principle at least the idea of a lawsuit is for Netflix to stand up and say "We did this and this and this" and for the plaintiffs or whomever we call them to say "yes but you failed to do this and this" and for a judge and/or jury to decide the case on its merits.

That's a nice fantasy, of course. In reality there's all sorts of backroom wrangling and juries baited into hating the big bad company and what-not. But in principle, a civil suit provides both sides and opportunity to make their case just as you say.


Car manufacturers have to make sure the brakes and air-bags work. Elevators makers have to make sure they won't fall. Parachutes can't have holes. Lots of examples.

Heck, imagine if companies could do things wrong and get away with: "Hey at least I was, like, trying reeeeally hard. I thought it was ok to do this".

Acting in good faith would become the excuse to use when the shit hits the fan.

PS: "Acting in good faith" in the real world means following the standard, industry adopted, government mandated: policies, process, regulations, laws, etc... That's what let you get away from problems. Not happy thoughts.


ditto my above comment. If netflix says data will be collected and then anonymized in its TOS, and it does so, then it acted in good faith. If a hacker reveals a weakness and Netflix pulls the plug, should they be sued?

If yes, then how can companies innovate, when they will constantly fear liability?


You think the amount of effort by Netflix to protect the identities of its costumers was enough. It was not. They intentionally released an amount of information, thinking none of its costumers could be traced, but guess what? They were.

Yes, it had to be used with other datasets to discover individuals, but Netflix ignored (you say acted in good faith) this possibility and decided to go ahead.

They were ignorant of the implications of the data they released. They didn't saw the possibilities that their costumers could be found. They were stupid and reckless.

The problem is that you're thinking about this situation as the researcher, the person who wants the data set to play with. Put yourself into the company's shoes. You want to improve the recommendation algorithm. You hold a contest, which needs the costumer data to work. But you know that your costumers won't be happy to have their info released, so you go and anonymize the data.

See where I am going? You had an idea, executed, but the consequences were bad. Imagine if car companies acted this way, one morning an engineer comes to work and puts a new brake system in the company's car already in production thinking it'll be awesome and work ten times better than the previous brakes. Without rigorous, government and industry trials, experiments and tests.

Good idea, poor execution. Netflix doesn't have "good faith", they wanted to improve their recommendation algorithm. They wanted to profit. Now, I don't have anything against profits. But it's naive to think Netflix did this for the benefit of mankind. They had their own reasons, and to achieve that, they've broken a promise to their costumers. They said: Hey, we'll keep on our database this information, but don't worry, none will ever know it.

But then they go and _relase_ costumer data, _thinking_ it's sufficiently anonymized. They were wrong. Double mistake there. The "hacker" wasn't alone in this, "he" had a direct help of the company which was supposed to not let this happen.


I agree that if Netflix knew the data could be de-anonymized then they are liable. Just like if I bank knows that an identity thief is the emptying a customers bank account and does not stop it, they are liable.


The same way they do now. (Successful) companies don't fear liability: they prepare for it but realize that they will never be 100% insulated.

You weigh your options: if the benefits of doing something outweigh the costs (including lawsuit outcomes, poor public perception, etc) by enough of a margin, then you forge ahead.


It's a balancing act. You can take away the right of people to hold companies accountable for their actions, and this would really free up businesses to innovate without fear of reprisal. This would also open it up for businesses to 'innovate' in 'cost-savings' where 100% of the risks are borne by their customers. Do you advocate living in a world where corporations are free from fear, allowing them to do whatever they want? Do you really believe that corporations always operate in the best interests of their customers? Do you really think that the general population are the customers for most businesses (i.e. most people feel that they are 'customers' of the cable/television companies, but in reality the advertisers are the customers and the general population is the product)?


>> Are you arguing in support of my point? Banks insure themselves against robbery precisely because they are held 100% accountable for it to the customer.

No, Im saying that your example was irrelevant. If netflix said in its terms of service that it would dutifully anonymize the collected data, then are they liable? If some hacker reveals a weakness and Netflix pulls the plug should they be sued?


I never heard of a bank that got robbed and passed the loss on to their customers. If a bank is robbed, they know exactly how much was the damage, and how much they will be in trouble when they have to repay their customers.

Netflix can't "repay" their users privacy, can they?


Sure we released that murderer from jail, but he needs to buy a gun to shoot someone, and we didn't sell him the gun. You can't blame us.

If they know that the information can be de-anonymized using publicly available information, have they really made a good-faith effort?


>>If they know that the information can be de-anonymized using publicly available information, have they really made a good-faith effort?

If your premise that Netflix knew the db could be de-anonymized is correct, then its not "good faith". Otherwise, Netflix could argue it did everything it said it would do in its TOS, and didnt foresee the hackers exploit. Whether that makes them liable or not is what Im asking. Im not a lawyer.

The reason the bank robbery example is irrelevant is banks say in their TOS that your money is 100% protected up to the FDIC limit. So, Netflix TOS said it would make its db internally anonymized, which it did. Clever cross-correlating made this not enough.


True, but then the question becomes: What does Netflix do now? Once they know that their efforts are not enough, what is their reaction? IIRC, they were warned about the fact that the second prize was revealing too much information, but went forward with it anyways.


>>IIRC, they were warned about the fact that the second prize was revealing too much information, but went forward with it anyways.

I did not know that. That changes my opinion about Netflix acting in "good faith".

The argument I was making is rooted in my belief that in order to have a healthy environment for business enterprise, your legal system cannot be setup to punish innovation whenever something doesnt go as planned. There needs to be balance, i.e. for medical innovation the bar is higher than for movie ratings.


Sometimes you release people (sometimes murders) because of a thing called a constitution...a case where there is liability for NOT releasing the murderer.


You weren't meant to take that literally. It's just to illustrate a point, not necessarily describe a real-world scenario. Notice how you're arguing my metaphor's accuracy rather than actually defending your original point.


Some people from the first dataset were deanonymized. Given age, gender, & zip code is enough to identify 87% of the US.

See http://arstechnica.com/tech-policy/news/2009/09/netflix-priz... and its links.


I'm assuming you mean given name - why would this be included in the data set in the first place? Is there some significant correlation between names and movie preferences?


http://www.freedom-to-tinker.com/blog/paul/netflixs-impendin...

This post, which is linked from the ARS article states a given birthdate, gender, and zip code can uniquely identify most Americans. So, while the parent should probably specify birthdate instead of age, a name is definitely not needed.


Ah, birthdate makes much more sense. My confusion stemmed from misreading the comment as "using these properties: [given age](?), [gender], [zip code]" instead of "given these properties: [age], [gender], [zip code]".


Yes the issue is Netflix's liability. The reason it can be de-anomized is because people reveal more info in some other database (for example idbm) - if people willingly reveal more info in some other db, is that netflix's fault?


Yes, it's netflix's fault unless they clearly obtain permission from people to share information that their customers could naturally assume was private.

As stated elsewhere Netflix should just have a clear opt-in for sharing the data and we can continue on.


Maybe its somewhere in Netflix's terms of service that data on users movie preferences will be collected, however this information will be kept anonymous (as best as can be done by netflix).


It is. They mention in their privacy policy that they may disclose anonymized histories, so they might have a decent legal leg to stand on.

I don't think this helps them much in court of public opinion, and it helps even less in not pissing off a segment of their customers. Making it opt-in seems like an easy compromise. The only downside I see is if they get a really low response rate.


they could incentivize it with reduced monthly rates.


The point is that it is actually fairly easy to trace information back to a user. People who are not familiar with this kind of data just don't realize how little you need to uniquely identify someone.

I am a little surprised to see that people who are generally uproar against companies not respecting privacy think it is perfectly ok to do it if it is for "science".

This is one of the big reasons why academics cannot release data BTW, especially in the social science: anonymizing data almost inherently means destroying valuable information.


Dear Lawyers,

Lame

Regards

The Tech Community




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: