As per the prize rule, they are under the obligation to describe and publish their mode (as they did last time). I really wish someone would release an open source recommender package, coded in C, with PHP bindings, based on the latest / greatest algorithms, or any algorithm (just plain SVD if anything).
It is a little framework I put together to play with the netflix prize framework. I implemented SVD as one of the examples.
On a side note I did think about the idea of putting together a small company that would sell recommendation systems to companies for their website. The value would not be in the algorithm, but in the ability to integrate with their software.
Nope, it is just a stock SVD, it wont win any awards, just get the job done and be an example. There are plenty of people who have tweaked SVD up and down so I didn't spent time doing it too.
It may be simply mathematically impossible to get to 10 percent given the observable variables. Maybe netflix knows this and just gets synthesized research on the cheap.
The machine learning guys have only a subset of the full dataset to work with. They have to send their algorithms to Netflix for testing on the full dataset. So it may be possible Netflix knows something they don't. It does seem quite strange how close and yet how far that 10% mark has been.
Because they are testing it vs a fixed data set teams can gain information about that data set based on their submissions. Which is why several teams mix and mach several different approaches. They can increase the weight of each approach over time and game the system.
Edit: I am not saying they are intentionally cheating rather it's one of the reasons that approach works.
Wouldn't argue with that, but they may have run up against a bound, multiplied it by a constant, and said to themselves "highly unlikely that anyone can beat this, so lets put it out there and see if someone out there can beat it."
So far no one has and they have gotten a ton of research on the cheap, like the current team's for 50 k USD. Smart.
They are getting it on the cheap. They've got many teams of top-notch folks working on building an algorithm tailored for their data-set and problem definition. For what they're getting out of this, $1m is a stupidly low price. If they've incorporated the stuff that's come out so far, they've already boosted their sales by several million based on the results of the contest.
No I wouldn't have, because I could only make a statement like that after seeing performance information. Since at least 40 teams beat that performance the first year, I would have said they set the bar too low.
The point is that there's really not much difference between 10% and 9.56%.
Netflix's Grand Prize could very well had been 9.64% and it would already had been won, despite everybody saying one year ago that 9.64% might be impossible given the dataset.
We will talk again in one year to see if 10% may be impossible.
I started working on this about a month and a half ago. My RMSE is pathetic thus far(just started tweaking a kNN with Pearson correlation as a distance metric), but I don't really care. It's a great way to brush up on a lot of CompSci concepts at once you may be rusty on. Besides obviously machine learning, when you're trying to process 2 gb of training files on your personal computer, O(n) time/space complexity REALLY starts to matter, as does choice of implementation language...using mostly Python with C extensions for the heavy math right now...
I keep hoping someone will swoop down with a crazy solution and snatch the prize from the people trying to solve it conventionally. So much more romantic than this slog!
Each movie rating will be represented by a star in the sky, the constelations are the rating clusters, and on 09/20/2009 they'll align to achieve the 10% mark.
A 9% improvement represents a substantial increase in quality of recommendations. If I remember correctly, a 10% improvement meant the difference between a complete star rating, on average. (There was a fairly substantial write-up about what the 10% meant last year.) I'd say for the people-hours put in to the project, they've gotten more than their money's worth.
Because I'm sure that every single person has no intention of receiving any of the prize money. None whatsoever, it's purely voluntary, and nobody deserves any money for making Netflix more profitable.
Wow you guys really want to bring it out of me on this thread. That's alright though, I dig everyone's energy (and I don't mind taking a bit of a karma beating over this either).
Actually, at the time of writing this, as a result of my discussion, other people have been awarded 28 karma points (where I have suffered a loss of a few). Outside discussion accounts for 35 karma points. So relative to the overall karma, I'm helping a lot of people out by being the grumpy, grouchy, mean old Mr. Scrooge in defense of the researcher. I do it for you guys! :D
Hey, it is not easy to develop a better recommendation engine. They have problems like the "Napoleon Dynamic" which is still remained unsolved right now.
Computer can only predict human behavior which is unchanging and static. But, this is always not the case.
I didn't say it was easy, I'm not sure where you're drawing that assumption from.
I did essentially say that over 2 years, I'm glad that the collective research efforts of 35,000 teams, one of the teams has seen $100k from a corporation who will ultimately profit off their research.
One of the teams mentioned in the article started they own company based on the notoriety achieved, so yeah. It seems to be working pretty well for everyone.
By the way, your post was just crawled by Google, and it probably represents a couple cents into their billion dollar quarter. How do you feel helping the big guys, huh? Working for free for the man... you're disgusting, sir!
LOL. I misunderstood what you have said. Since the code will be open sourced, it will definitely benefit thousands of companies.
In fact, there was one graduate student who was in our campus last semester was doing research in this area. He did a very good job on this. And, he won the first ACM machine learning prize for the paper. If you are interested in this area, you can actually go to the library and find out the paper. Now, I think he is now working in the Microsoft Research Team(the usual career path in our campus).
I actually did a tiny recommendation engine using one of the collaborative filtering algorithm - slope one predictor last two days. It will be up on the web very soon. I am having problems updating my server because rubygems.org is down!
It's all good, I think I need to stop hanging out on YC today. I have to catch up to Matt on this Cocoa Programming book, and then I've got a final project to prep for, and finally a meeting tomorrow with the company to talk about some marketing stuff.
You are wrong that it can only predict static and unchanging behaviour. It might just be that there isn't enough data to draw any relevant conclusions about Napoleon dynamite.
Terrible! We so badly want to give away a million dollars, and yet all these people will do is give us a massive improvement on recommendations without taking our money! We are pissed
Successfully giving out the money--especially if there is (for example) the romantic, swooping in ending that staunch stated that he wanted (in his post above)--will generate a lot more positive PR for Netflix than the remaining 0.44% improvement will help them at this point. Regardless of whether you think of this as a genuine research prize, or simply a corporate ploy, you have to admit that Netflix is probably genuinely unhappy that no one has managed to complete the competition yet.
It's working great. In fact, I'd say it's fucking awesome. The recommendation algorithm has improved greatly, and contributions to the science of statistics have been made because of this corporation.
Your comment strikes me as obnoxious. You try to gibe NetFlix, but the truth of the matter is that there is a market. The considerable prize sum is enough to interest a competent individual.
You have to remember that many of these researchers are funded by government money (eg, university researchers) or shareholders (eg, corporate labs) that may not be seeing any return on their investment. In some sense, Netflix is stealing cycles that may have been put to better use elsewhere. This could be seen somewhat akin to bribing a public official, only in a 'steal one penny from a million officials' kind of way.
In addition, just think of the massive amounts of cash these researchers are stealing from the network TV execs and advertising agencies, since they may be working on this in their spare time in the evenings instead of watching TV. Despicable.
Sorry, you'll have to excuse that I'm not immediately a huge fan of corporations who give small amounts of money for significant amounts of research that ultimately go back to benefit the corporation.
Apparently, they're going to give all of the code out, which is good to hear. I'm proud of the machine learning community for that. As for Netflix, it's difficult for me to believe this is really about research and not about increasing their profitability.
I could be way off. Maybe $1 million over the course of 2+ years is a heck of a lot of money for a team of developers (selected from 35,000 teams), which will be used to generate more profit for NetFlix. Who knows. I'm glad that so far over 2 years, $100k was given out. That's good to see.
As for Netflix, it's difficult for me to believe this is really about research and not about increasing their profitability.
Are you really more concerned with the purity of their motives that the results of their actions? I would rather interact with people who can accomplish stuff for selfish reasons than people who fail hard but feel good for trying.
Sorry, you'll have to excuse that I'm not immediately a huge fan of corporations who give small amounts of money for significant amounts of research that ultimately go back to benefit the corporation.
While the statement that followed was at least reasonable, that line struck me as really obnoxious.
Interesting links (from the winning team(s) websites about their latest research) : http://public.research.att.com/~volinsky/netflix/kdd08koren.... http://www.commendo.at/index.php?lang=0&_0=2&_1=3