Hacker News new | past | comments | ask | show | jobs | submit login
Categories of deep recommendation systems (jameskle.com)
148 points by le_james94 on Dec 28, 2019 | hide | past | favorite | 37 comments



The recommendation systems (i.e. Netflix) for movies and music are still just awful in terms of guessing my tastes.

I still find that human based recommendations for things like music and movies consistently beat the algorithm at predicting what I'll like vs what I won't like just about every time.

IMHO it will be many years before these recommendation engines consistently pick media I like to watch or listen to.

They are still far inferior to human based recommendations.

Guessing what someone will want to watch at any given moment is probably a harder problem than getting a car to drive autonomously.

EDIT (source): I work on a mostly human powered recommendation engine/expert system @ lazyday.tv. It does not use ML to make recommendations. It takes some creative approaches to solving the movie recommendation problem using an expert system and optimizing search results.


Glad to hear that someone has still found a need for expert systems.

Am I correct in assuming Lazyday has adopted a rule-based approach, using information about the movies (e.g. release date, genre, director, cast, etc.) and users (e.g. age, sex, nationality, current mood, etc.) that collaborative filtering systems don't use, and that you have evaluated the system against a collaborative filtering system which uses the ratings your users provided?

I developed an online collaborative filtering movie recommendation system in the mid 1990s ( http://www.fmjlang.co.uk/morse/MORSE.html ), and analysed its performance, for my MSc project. The only information it used was how each user rated the movies they had seen. It used no other information about the movies or the users themselves.

A surprising result I found was that using the ratings of the person whose tastes most closely matches your own to predict your ratings was very inaccurate. (Note that this is not necessarily the same as someone recommending you a movie, as they might know and take into account your tastes, and even if they didn't, they could still improve their recommendations simply by recommending the most popular movies, rather than the movies they liked.) Expanding from a single nearest neighbour to the average of the N nearest neighbours (for N = 25) gave quite good results. It seems that you're more similar to an average of all your friends than your closest friend. Using the algorithm I developed (described in detail in the paper) resulted in a further big improvement, which was unsurpassed until the Netflix prize. My data set was significantly smaller, though.


Very insightful. Thank you.

Yes Lazyday is a rules based search engine. It makes best guesses based on what's currently popuplar, the movie's meta data and user search preferences and history. We have not evaluated it against a CF system in an A/B setup. Of course we have years of user frustration with Netflix / Prime recommends that has taught us a few valuable lessons about how to solve the problem of what to watch next.

And yes it seems intuitive to assume that two people with the same tastes will like the same titles. The problem is, I might not be in the mood for movie A even though my virtual twin is, sometimes I would be in the mood for movie A if I had company that was also in the mood for movie A. So many edge cases when a recommendation engine is trying to guess what you want to watch next.

Using the averages of your friends is very interesting. The issue would be building your dataset. Attaching it to a network like FB or building it yourself which would take even more time.


Is there any evidence that the recommendation systems are actually intent on putting something you would like to see in front of you instead of some other incentive?

My YouTube subscriptions are all engineering/programming/space/guns/machining/electronics. I even see the same commenters in many of them. There is clearly a set of people with these interests that could be used to cross-pollinate interesting content. Yet my recommendations are all pop culture and sports.

Spotify is the same way but i blame my kids for polluting their model of my interests,

I’ve given up on Netflix. I can’t even find stuff i want to watch on there any more.


We had a bunch of people who worked on the recommendation system at netflix at the stat seminar. Their whole movie catalogue after hashing comfortably fits into a dataframe in R on your laptop. Even the audience-like matrix is not very big. They simply don’t have that much permanent content due to licensing issues. Parks & Rec for example is like one of their top 10 most streamed content - today - even though that thing aired like a decade ago! on top of which, its going to move to Peacock next quarter! So not only are they lagging in content contemporariness, they are losing what little they have because the networks who make the content are pulling the plug.

An unrelated problem with movie recs is that the kind of movies & tv episodes being made today have homogenized because of international distribution issues & things like metoo. Your villains can’t be clearcut russians & chinese anymore, we sell to those countries. So it has to be some amorphous bad person with ambiguous intentions, which doesn’t work so well for james bond type stuff. Many of the older shows I love, I watch them on youtube & the comments there are mostly along the lines of “this would never ever get made in 2019. this is so bigoted. this is so mean spirited. this is openly sexist racist” etc etc. These are shows from 70s-80s, so its not like we had some genetic mutation and became a different species. We are still the same people, but those kinds of movies & shows aren’t happening anymore.


Your second point is very interesting and something I hadn't really considered. I wonder how all of the DeepFake-type capabilities will allow a sort of 'dial a villain/hero' in future movies.


> Is there any evidence that the recommendation systems are actually intent on putting something you would like to see in front of you instead of some other incentive?

Something I've been wondering about recently is how infrastructure costs factor into this. Obviously videos get served from a local CDN node when possible, but I'm guessing sometimes the file you're requesting is uncommon enough that the local CDN node doesn't have it. Maybe this isn't the case with Netflix, but it's probably the case with youtube, just because they have that much more content. Adding more storage to the CDNs would cost money, and cache misses on the CDN also cost money, so does youtube have a financial incentive to bias their recommendations towards videos already cached near you?


Interesting. It would be a very Googly thing to do, so my guess is yes.


the cost of cache misses, even at google scale, is still trivial compared to their revenue


The infrastructure costs might be low, but transit performance back to Google is terrible for many ISPs (looking at you CenturyLink), so there might be 'quality' costs to consider as well.


In my experience YouTube recommends based on watched videos (not subscriptions). The recommendations are not great, but for me they fit the topics I usually watch. Maybe you are blocking some script used to send your views to the recommendation system or something?


It's a fair point. Netflix for example has a subpar catalog to choose from (maybe 5000 titles max) and of course they are pushing their own content to the top.

By comparison the known movie/tv universe of global movies and tv selections ever made is upwards of 500K.


Also taking the cost into account, it’s much more reasonable hiring 10 content curators dishing out 10 lists a day, as opposed to using a team of experts creating ML models.


Netflix is a bad example. After Netflix Prize they stopped trying to make a better recommendation system. They only used a few of the models from the winning ensemble, and now are using something even weaker instead.

That said, the competition is hardly any better. Most use the same collaborative filtering from the Netflix Prize (Koren et al), which by design never recommends anything new. Most new "big data, machine learning" personalized recommendations just give you most popular plus some random noise. They perform well on some very artificial evaluation functions, but rarely A/B tested on actual users. When they are tested, they barely beat naive models like IMDB top 250.


Do you have evidence that they rarely hear naive models like IMDB top 250.

I mean they have entire divisions of engineers dedicated to this so I assume they will disagree with you. So I am wondering if you have evidence that all those people are not doing anything to earn their pay


Netflix does such a bad job with recommendations for me that I can’t understand what they are doing.

For example, I watch a lot of cooking shows. I just scrolled to the bottom of the recommendations on my tablet and they only suggest two—-and one of those I’ve watched every episode!

They repeatedly fail to tell me about new series of shows I’ve watched, or really obvious connections like “Breakfast, Lunch & Dinner” as I’ve watched “Ugly Delicious”.

The problem seems less of a ML one to me and more of a (lack of) metadata one.

Edit: I plugged “Ugly Delicious” into lazyday.tv and the “Similar Titles” are very random.


Yes definitely lots of fine tuning still needs to be done for the similar titles playlist. It should be ok with most titles. I noticed that's a Netflix title so the algorithm maybe constraining the similar titles to the Netflix catalog. I have to look at that code again. Since there's less selection in Netflix the recommendations won't be as good because there's less to choose from. Some titles with less meta data are harder to get right as well.


I think a lot off preference towards humans is an issue of trust and a measure of your relationship with that person. Usually we get recommendations from friends and give them more thought and weight than some random ad we scroll by on Facebook.

https://opensourceconnections.com/blog/2016/08/21/recommenda...


Most recommendation systems are collaborative filtering + some fine tuning.

I don't see how collaborative filtering can not work. Especially in the case of modern media providers where they have data about whether you actually finished watching something, how many videos of a series you watched etc.


Collaborative filtering cannot recommend new stuff. By the time all your friends have watched and liked the new hotness it's already too late to recommend it to you.


By the time a single person with highly similar tastes to me has high engagement to a new item, that can be suggested, by providing a boost to new items.

I'm not familiar with what the best approaches for the cold start problem of getting that first (few) interaction(s) are, but in the case of Netflix it's easily handled by a new on Netflix section.


The problem of recommending something (especially something genuinely new) is actually quite complex, which you realize by studying inferential/causal statistics.

A statisticians would say that you are making "heroic assumptions" about both the taste of viewers, and the dimensions of the product, especially given that the products you have are actually quite sparse. Heck, there's even interesting literature in empirical economics (called empirical industrial organization) trying to estimate prices/demand/costs (which is, essentially the same issue as Netflix trying to estimate what you like) and dealing with the inherent curse of dimensionality. This becomes really obvious if you are actually write down explicitly what you are doing.

Of course ML engineers are engineers and not really interested in causal inference, so Netflix wants me to watch 12 Hitler documentaries in a row and Amazon recommends that I better buy that second fridge right after the first.

Where _does_ collaborative filtering actually work?


What do you mean "human powered" rather than ML? How are the implicit/explicit judgements of relevance (e.g. likes, clicks, views, star ratings) that are used in rec algorithms any less human than, say, a recommendation from a single trusted expert?

Personally I think it's amazing that even basic recommendation algorithms can instantly select 10 mostly relevant items from corpora of millions of irrelevant items. I think people are too used to receiving excellent recommendations and do not realise what receiving bad recommendations would be like. Hybrid systems such as Spotify's are really amazing when you think about it.


Human powered in the sense that the expert system we developed uses those human likes, views, star ratings and implements recommendation features to facilitate humans recommending titles to other humans in a very straightforward way.

But there is no ML compiling a profile of every genre/title you've ever expressed interest in and then using that profile to guess what you want to see next like Netflix et al.


At some point you presumably have to assess user-similarity in order to let similar users recommend items to each other? How's this different from e.g. user-user CF in terms of compiling everyone's info into a single model? I'm not criticising but rather just curious


I haven't found enough of a need for it at this point to make the application more useful as it stands without making great effort in training the necessary models to assess user-similarity for example.

It is certainly faster to build an expert system that makes a lot of assumptions and iterate from there. There are plenty of use cases for ML and I hope to integrate it more into app some day to help automate repetitive tasks.


There are recent works that question whether "deep" learning algorithms, and in particular neural networks, are really advancing the state of the art of recommender systems: https://arxiv.org/pdf/1907.06902.pdf (Best Paper Award, RecSys 2019). It is strange that the blog post fails to mention this.


Yeah, this has been a well known problem in the field for a while now, and not just with DNN approaches https://en.wikipedia.org/wiki/Recommender_system#Reproducibi...

These systems almost never work in production. The publications are usually very deceptive.


I recently found a particularly rare/esoteric song on Google Play Music, specifically because I wanted to hear more like it. The song was "Light" by Idyllic.

I clicked "Start radio" and the first song was "light", of course. Then all the next 10 songs were by Idyllic only, no other artists. After that half the songs were Idyllic, the other half were....songs from my "Thumbs Up" playlist (any song I had previously liked).

I repeated the process several times thinking there was a bug mixing in my thumbs-up playlist.

Eventually I realized I'm the first person to ever listen to this song on Google Play Music and it used only me as the first bayesian filter - so I just got my own music suggested back to me.


IMO this article does not deliver on its title. I was expecting a list of systems that we should "pay attention to"- that hold promise in the coming years or now.

Instead, this was an index of current literature from pretty much all deep learning methods that have been applied to recommender systems. It's more useful to a lay person than a researcher.


Many film recommendation systems suffer from a strong gravitational pull towards mainstream choices with no clear association to the input (”people who enjoyed The Thin Red Line also enjoyed Pulp Fiction..”). But I wonder what better results deep learning could even accomplish given a limited feature space such as user like/dislikes.

A few years ago I made a recommender based on critics’ inferences in reviews. For the right kind of film (indie or festival circuit stuff and the like) it yielded quite interesting results.


The article's original title is much more readable than this post's title.


> The article's original title is much more readable than this post's title.

Probably too long to fit, op did a decent job of compressing imo.

The original is:

RECOMMENDATION SYSTEM SERIES PART 2: THE 10 CATEGORIES OF DEEP RECOMMENDATION SYSTEMS THAT ACADEMIC RESEARCHERS SHOULD PAY ATTENTION TO


The “that” being left out makes it tough to parse


The "to" missing at the end totally breaks it for me, I read it as "Categories of Deep Recommendation Systems: Researchers Should Pay Attention"


Omitting the "that" is fully standard English grammar. Omitting the "to" is not, and radically alters the meaning of the title.


Read the title and thought it was going to be a Paper about how to use Attention based neural networks for deep recommendation systems...




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: