Netflix Progress Prize Awarded

FiReaNG3L · on Dec 10, 2008

As per the prize rule, they are under the obligation to describe and publish their mode (as they did last time). I really wish someone would release an open source recommender package, coded in C, with PHP bindings, based on the latest / greatest algorithms, or any algorithm (just plain SVD if anything).

Interesting links (from the winning team(s) websites about their latest research) : http://public.research.att.com/~volinsky/netflix/kdd08koren.... http://www.commendo.at/index.php?lang=0&_0=2&_1=3

icefox · on Dec 11, 2008

Well I am not going to do everything for you, but here is something to start with:

http://benjamin-meyer.blogspot.com/2006/10/netflix-prize-con...

It is a little framework I put together to play with the netflix prize framework. I implemented SVD as one of the examples.

On a side note I did think about the idea of putting together a small company that would sell recommendation systems to companies for their website. The value would not be in the algorithm, but in the ability to integrate with their software.

FiReaNG3L · on Dec 11, 2008

Very nice, I saw many Netflix Prize derived frameworks but I somehow missed this one.

Did you test RMSE of your SVD on Netflix Dataset?

icefox · on Dec 11, 2008

Nope, it is just a stock SVD, it wont win any awards, just get the job done and be an example. There are plenty of people who have tweaked SVD up and down so I didn't spent time doing it too.

aswanson · on Dec 10, 2008

It may be simply mathematically impossible to get to 10 percent given the observable variables. Maybe netflix knows this and just gets synthesized research on the cheap.

pg · on Dec 11, 2008

I'd guess the machine learning guys by this point understand the data set at least as well as Netflix does.

jderick · on Dec 11, 2008

The machine learning guys have only a subset of the full dataset to work with. They have to send their algorithms to Netflix for testing on the full dataset. So it may be possible Netflix knows something they don't. It does seem quite strange how close and yet how far that 10% mark has been.

Retric · on Dec 11, 2008

Because they are testing it vs a fixed data set teams can gain information about that data set based on their submissions. Which is why several teams mix and mach several different approaches. They can increase the weight of each approach over time and game the system.

Edit: I am not saying they are intentionally cheating rather it's one of the reasons that approach works.

pg · on Dec 11, 2008

Hmm. What could Netflix know, though? Is it even possible to quantify how hard it is to improve a recommendation algorithm by a certain amount?

aswanson · on Dec 11, 2008

Maybe there is an inherent noise floor for the optimal estimator for the dataset they have found or bounded.

aswanson · on Dec 11, 2008

Wouldn't argue with that, but they may have run up against a bound, multiplied it by a constant, and said to themselves "highly unlikely that anyone can beat this, so lets put it out there and see if someone out there can beat it."

So far no one has and they have gotten a ton of research on the cheap, like the current team's for 50 k USD. Smart.

wheels · on Dec 11, 2008

They are getting it on the cheap. They've got many teams of top-notch folks working on building an algorithm tailored for their data-set and problem definition. For what they're getting out of this, $1m is a stupidly low price. If they've incorporated the stuff that's come out so far, they've already boosted their sales by several million based on the results of the contest.

Haskell · on Dec 11, 2008

If the goal had been 5%, you would say: - It may be simply mathematically impossible to get to 5 percent...

And they are already at 9.56%.

aswanson · on Dec 11, 2008

No I wouldn't have, because I could only make a statement like that after seeing performance information. Since at least 40 teams beat that performance the first year, I would have said they set the bar too low.

Haskell · on Dec 11, 2008

And why do you think 10% is impossible but 9.5% is not?

aswanson · on Dec 11, 2008

I said they may have found a bound. I didn't say it was; it said there may be a limit. Re-read the ancestor.

Haskell · on Dec 11, 2008

You are playing with words.

The point is that there's really not much difference between 10% and 9.56%.

Netflix's Grand Prize could very well had been 9.64% and it would already had been won, despite everybody saying one year ago that 9.64% might be impossible given the dataset.

We will talk again in one year to see if 10% may be impossible.

aswanson · on Dec 12, 2008

Barring either of us getting hit by a beer truck or moving on to a new site,or the apocalypse you're on. Catch you 12/11/09!

Haskell · on Dec 11, 2008

Ops... substitute 9.64% by 9.56% in that comment. :)

Eliezer · on Dec 11, 2008

They're over 9% already. Sheesh.

aswanson · on Dec 11, 2008

http://en.wikipedia.org/wiki/Asymptote

light3 · on Dec 11, 2008

Have you seen how long they've been at over 9%? ;p

madmanslitany · on Dec 11, 2008

I started working on this about a month and a half ago. My RMSE is pathetic thus far(just started tweaking a kNN with Pearson correlation as a distance metric), but I don't really care. It's a great way to brush up on a lot of CompSci concepts at once you may be rusty on. Besides obviously machine learning, when you're trying to process 2 gb of training files on your personal computer, O(n) time/space complexity REALLY starts to matter, as does choice of implementation language...using mostly Python with C extensions for the heavy math right now...

tocomment · on Dec 11, 2008

I'd love to win just so I could settle arguments by saying "anyone who has won a million dollar data mining competition raise your hand"

staunch · on Dec 11, 2008

I keep hoping someone will swoop down with a crazy solution and snatch the prize from the people trying to solve it conventionally. So much more romantic than this slog!

tocomment · on Dec 11, 2008

cool thought. any ideas what the solution might look like?

inerte · on Dec 11, 2008

Each movie rating will be represented by a star in the sky, the constelations are the rating clusters, and on 09/20/2009 they'll align to achieve the 10% mark.

Hey, it works for astronomy!

jmtame · on Dec 10, 2008

2 years in and they still haven't given out the prize money. How's that crowd sourcing working out for you NetFlix?

boredguy8 · on Dec 11, 2008

A 9% improvement represents a substantial increase in quality of recommendations. If I remember correctly, a 10% improvement meant the difference between a complete star rating, on average. (There was a fairly substantial write-up about what the 10% meant last year.) I'd say for the people-hours put in to the project, they've gotten more than their money's worth.

jmtame · on Dec 11, 2008

Oh I bet they have. Have you calculated the number of hours the 35,000 teams put in relative to the amount of pay they received?

I bet they've gotten more than their money's worth.

mhb · on Dec 11, 2008

At what rate should we value their hours? Do you charge more or less for doing something you like?

Have you considered that voluntary trade is not zero sum?

jmtame · on Dec 11, 2008

Voluntary as in they don't get paid?

Because I'm sure that every single person has no intention of receiving any of the prize money. None whatsoever, it's purely voluntary, and nobody deserves any money for making Netflix more profitable.

Wow you guys really want to bring it out of me on this thread. That's alright though, I dig everyone's energy (and I don't mind taking a bit of a karma beating over this either).

Actually, at the time of writing this, as a result of my discussion, other people have been awarded 28 karma points (where I have suffered a loss of a few). Outside discussion accounts for 35 karma points. So relative to the overall karma, I'm helping a lot of people out by being the grumpy, grouchy, mean old Mr. Scrooge in defense of the researcher. I do it for you guys! :D

daveungerer · on Dec 11, 2008

You don't think the people entering this competition can at the very least calculate the odds of winning?

Now if only it was possible to blacklist trolls...

jmtame · on Dec 11, 2008

I don't think you can silence people who disagree with your views on HN, but goodluck trying.

siong1987 · on Dec 11, 2008

Hey, it is not easy to develop a better recommendation engine. They have problems like the "Napoleon Dynamic" which is still remained unsolved right now.

Computer can only predict human behavior which is unchanging and static. But, this is always not the case.

jmtame · on Dec 11, 2008

I didn't say it was easy, I'm not sure where you're drawing that assumption from.

I did essentially say that over 2 years, I'm glad that the collective research efforts of 35,000 teams, one of the teams has seen $100k from a corporation who will ultimately profit off their research.

inerte · on Dec 11, 2008

One of the teams mentioned in the article started they own company based on the notoriety achieved, so yeah. It seems to be working pretty well for everyone.

By the way, your post was just crawled by Google, and it probably represents a couple cents into their billion dollar quarter. How do you feel helping the big guys, huh? Working for free for the man... you're disgusting, sir!

jmtame · on Dec 11, 2008

Yes, I've been bad! I'm going to sit in time out bbs

siong1987 · on Dec 11, 2008

LOL. I misunderstood what you have said. Since the code will be open sourced, it will definitely benefit thousands of companies.

In fact, there was one graduate student who was in our campus last semester was doing research in this area. He did a very good job on this. And, he won the first ACM machine learning prize for the paper. If you are interested in this area, you can actually go to the library and find out the paper. Now, I think he is now working in the Microsoft Research Team(the usual career path in our campus).

I actually did a tiny recommendation engine using one of the collaborative filtering algorithm - slope one predictor last two days. It will be up on the web very soon. I am having problems updating my server because rubygems.org is down!

jmtame · on Dec 11, 2008

It's all good, I think I need to stop hanging out on YC today. I have to catch up to Matt on this Cocoa Programming book, and then I've got a final project to prep for, and finally a meeting tomorrow with the company to talk about some marketing stuff.

Clearly YC is not where I should be right now :)

schtog · on Dec 11, 2008

You are wrong that it can only predict static and unchanging behaviour. It might just be that there isn't enough data to draw any relevant conclusions about Napoleon dynamite.

sysop073 · on Dec 11, 2008

Terrible! We so badly want to give away a million dollars, and yet all these people will do is give us a massive improvement on recommendations without taking our money! We are pissed

- Netflix

gambling8nt · on Dec 11, 2008

Successfully giving out the money--especially if there is (for example) the romantic, swooping in ending that staunch stated that he wanted (in his post above)--will generate a lot more positive PR for Netflix than the remaining 0.44% improvement will help them at this point. Regardless of whether you think of this as a genuine research prize, or simply a corporate ploy, you have to admit that Netflix is probably genuinely unhappy that no one has managed to complete the competition yet.

zack · on Dec 11, 2008

It's working great. In fact, I'd say it's fucking awesome. The recommendation algorithm has improved greatly, and contributions to the science of statistics have been made because of this corporation.

Your comment strikes me as obnoxious. You try to gibe NetFlix, but the truth of the matter is that there is a market. The considerable prize sum is enough to interest a competent individual.

jderick · on Dec 11, 2008

You have to remember that many of these researchers are funded by government money (eg, university researchers) or shareholders (eg, corporate labs) that may not be seeing any return on their investment. In some sense, Netflix is stealing cycles that may have been put to better use elsewhere. This could be seen somewhat akin to bribing a public official, only in a 'steal one penny from a million officials' kind of way.

kleneway · on Dec 11, 2008

In addition, just think of the massive amounts of cash these researchers are stealing from the network TV execs and advertising agencies, since they may be working on this in their spare time in the evenings instead of watching TV. Despicable.

jderick · on Dec 11, 2008

Funny, but obviously my comment doesn't apply to anyone's spare time.

jmtame · on Dec 11, 2008

Sorry, you'll have to excuse that I'm not immediately a huge fan of corporations who give small amounts of money for significant amounts of research that ultimately go back to benefit the corporation.

Apparently, they're going to give all of the code out, which is good to hear. I'm proud of the machine learning community for that. As for Netflix, it's difficult for me to believe this is really about research and not about increasing their profitability.

I could be way off. Maybe $1 million over the course of 2+ years is a heck of a lot of money for a team of developers (selected from 35,000 teams), which will be used to generate more profit for NetFlix. Who knows. I'm glad that so far over 2 years, $100k was given out. That's good to see.

byrneseyeview · on Dec 11, 2008

As for Netflix, it's difficult for me to believe this is really about research and not about increasing their profitability.

Are you really more concerned with the purity of their motives that the results of their actions? I would rather interact with people who can accomplish stuff for selfish reasons than people who fail hard but feel good for trying.

jmtame · on Dec 11, 2008

I'm not concerned, I just said it's great that after 2 years of work, we've seen $100k come from it. That is wonderful news.

plinkplonk · on Dec 11, 2008

Why is this downmodded? I can understand people disagreeing, but it is a well articulated point.

unalone · on Dec 11, 2008

I downvoted after reading the first sentence.

Sorry, you'll have to excuse that I'm not immediately a huge fan of corporations who give small amounts of money for significant amounts of research that ultimately go back to benefit the corporation.

While the statement that followed was at least reasonable, that line struck me as really obnoxious.

zack · on Dec 11, 2008

Colorless green ideas sleep furiously.