One of the very best recommendation engines I've encountered is the "Discover Weekly" playlist from Spotify. It's helped me reconsider my relationship to music which I basically thought was dead since I had hit a rut on exploring new artists.
The addition of Discover Weekly really confused me. Shouldn't the features that create a radio station from an artist or a playlist fill this need already? Why is it only updated weekly? I haven't tried other services much but it feels like Spotify isn't doing as much as they can with recommendations.
I think it's because people are used to listening to their Playlists. It feels more natural to check your "Discover Weekly" playlist then a whole new section within Spotify. And I think the decision to only update it ever Monday was pretty genius. Most people aren't particularly excited when Monday rolls around.. but when they think about the fact that it's Monday they are likely to remember they have a brand new Discover Weekly playlist to listen to. It's one of the good things about their Monday and becomes a habit over time.
I have Android Auto in my car, and my favorite part of the terrible, traffic-ridden commute from Redmond to Bellevue is listening to my Discover Weekly playlist. Even though I only really like about 10% of what it picks, usually in that 10% I find an artist I really enjoy and dig into that.
I wonder if there's any data on how common this is. I listen to large shared playlists or the radio feature the vast majority of the time to try to find new music.
I suspect that the reason Discover Weekly is so much better than radio, and is only available once per week, is that it's computationally costly.
Either way, radio-from-artist or radio-from-song can't utilize your listening history. It's quite possible you listen to an artist for different reasons than the reasons most people listen to that artist - in that case, you will get recommendations based on the majority's reasons.
You would expect personalized recommendations to have potential to do much better, and I'd argue that's exactly what we see with DW.
This is one of my very favourite features of Spotify, because they consistently serve up songs that I like listening to. (Although I'm sure there's the psychological factor of me thinking "these songs were picked just for me, therefore they are good"...)
I'm skeptic about recommendation feature. My reaction to them, in these rare occasions when not get ignored all together, in all sites ranges from 'huh?' to 'wtf?'. Specifically Youtube gives geographic location too much weight -- just because I live in a certain country, it doesn't I'm interesting in all these trending local shit-pop music or stupid 'fun' videos.
I can confirm this. I saw a lot of region-based, really stereotyping videos when I got a new computer and visited YouTube. While I understand they need to fill up the home page with something, I just prefer they can be a little more creative about the whole thing.
After glancing through the top few, I quickly went for the search box. To their credit, once I do a few searches, the recommendations drastically improved when visiting next time.
A side effect of this, of course, is that you can study all kinds of stereotyping and biases by repeating my experiment in various regions I suppose.
if they don't have any good information about you other than your location what else would you recommend rather than the most popular in that location?
When you lack the information to make a good recommendation, "the best we can guess from really generic or sparse information" tends to be annoying. That's really bad.
Yes, I watched a bunch of Dota replays during a recent tournament. No, I don't normally go on youtube. So I watched a daily show video clip that was linked. All my "watch next" and "recommended" are Dota. That's not smart, that's aggravating. I would have been ok watching a couple more ds clips, but instead exclusively bad recommendations were made based on poor data.
I dislike the idea that my world gets filtered by algorithms, but I really hate when they're obviously bad at it. Although I suppose I should be grateful that it's easily spotted when it's bad?
Author here - happy to answer questions about the techniques in the paper. We're super excited to finally share this work externally. Feedback about YouTube recommendations in general also welcome.
Do you study the phenomenon of information bubbles at Google? Let's say, a German user just happens to watch some right-wing populist video claiming that we need to stop Merkel's refugee politics. The next day the user might receive plenty of recommendations in their feed that confirm the message in the first video. They happen to stumble upon a video of some party convention by an uprising German populist party, and everything makes sense now! Video by video the user gets dragged into a right-wing ideology.
That is an information bubble. The algorithm cannot detect low quality or populism, neither can it recommend opposite standpoints, and at the end of the day it has a real effect on a country's politics and the well-being of many people.
Do you have means of quantifying such effects? What are possible countermeasures?
If you cannot talk about that, then this would be my feedback: Perhaps you could train a language model to find opposing views in video titles and tags and then diversify the video recommendations based on that.
What about the reverse scenario, though? Should someone who watches videos about refugee suffering be given anti-refugee video recommendations, lest they be dragged into a 'left wing ideology'? I don't see how that would be acceptable. Would Holocaust documentaries be 'diversified' with Holocaust denial videos?
'Information bubbles' have existed as long as people have had a choice of newspapers to buy and TV channels to watch. Calling for Youtube to artificially 'balance' videos seems like political interference.
While information bubbles have existed as long as people had a choice, at least with a newspaper or TV channel, you have to read and watch a little to understand whether it fits your liking. You have to put some effort in.
With recommendation engines, your bubble, without effort, ossifies.
There is no such thing as unbiased. So they have top pick how they are going to be biased. My guess is the way that makes the most ad revenue irrespective of ideology.
I think personalized media might bring information bubbles to a whole new level because recommendation systems are more efficient and achieving millions of YouTube views has arguably fewer intellectual hurdles than achieving similar reading rates via print or TV. The information passes though fewer filters (none), e.g. proof readers, team discussions. If that causes an increase of misinformation, then it is on Google to fix it, i.e. to reduce the amount of misinformation at least to pre-YouTube levels.
If you think relativism is fine. Then "As opposed to planting your flag in the ground that your camp is always right and the outgroup is evil?" is also fine.
See how silly and immediately self-contradictory relativism is?
"Trying to think a bit" would be on the side of having more information to think about, I'd imagine. Absolute relativism is a strawman you've brought into it, political tribalism is on the other hand a very real thing.
Why? Leftist people would also get more diverse recommendations. They'd be exposed to arguments such as the importance of maintaining western values etc. I actually did an experiment of wading through videos recommended based on anti-Merkel stuff. There are very few reasonable discussions and mostly it's horribly populist and poorly researched stuff.
What exactly are western values, other than a not-so-subtle way of asserting hostility to Muslims? The people who go on about the importance of opposing immigration in the name of maintaining liberal and tolerant values (a code word for 'gays and feminism') seem to be exactly the same people who hate gays and feminism in the first place.
For example that no death threats are spoken when a daughter defies her farther's will who she is supposed to spend time with. That caricaturists, satirists and atheists are safe. These are things that western cultures have established, and which could arguably be endangered by letting in refugees by the millions and by prohibiting cultural criticism at the same time. I am myself not convinced of the urgency of this threat, but I think these are some of the more convincing arguments against Merkel's refugee politics. Other arguments are for example second order effects or equilibrium effects, e.g. that conservatives, professionals and business folk amount to a counter-reaction that is worse than letting in refugees in a more controlled way (i.e. Brexit and brain drain).
I have no idea about the numbers, but I am pretty certain both groups exist. Those who use these arguments as pretense and those who are honestly concerned about the efficiency, safety and trust our culture has established (which e.g. allow us to focus on education, art and science).
I look at the Recommended section daily, and I find it very disappointing for several reasons:
- the recommendations are very often not interesting to me because
+ they cater to the lowest common denominator (you won't believe these 10 hilarious fails, PewDiePie picks his nose, etc.)
+ I have already watched the video
+ a video has been in the Recommended section for weeks and I haven't clicked on it. What makes you think I'll change my mind after several weeks? If I don't click a video within a couple of days of it appearing in the section, it's a dud. Don't keep showing it
+ the video is from a channel I am already subscribed to. That's not a recommendation, it's trivial and not helpful
+ most or all of the videos in the section are sometimes matching the same key word. I once clicked on an Amy Schumer video, and for many days every video in the Recommended section was a Schumer video. This is terrible. The same thing happened after I clicked a Craig Ferguson video.
- the feedback UI is not streamlined. I have to click through multiple menus to be able to say: not interested in this channel
- there should be list of key words that I can specify where if the video matches one of them, don't add it to the section. Conversely, there should be a list of key words that when I specify them, the recommendation engine goes out and looks for videos matching them, and then adds some of them to the section
I love watching interesting and creative how-to videos (DiResta, Tested, etc.), but even after several years of watching them, the recommendation engine seems to not have caught on to that.
Is the deep learning approach already deployed for regular users? I have not seen a change in the quality of the recommendations.
Sorry to sound so negative, but I think this is a huge wasted opportunity. There is tons of amazing content on youtube, and it's often very hard to find.
Despite the negative connotation, I wholeheartedly agree with this assessment. I usually ignore YT's Recommended videos for the same reasons you describe.
Does this mean something different from feeding the age of the video, relative to when the training example was recorded? Feeding in the age of the video seems like a fairly obvious idea and like it should train the network to favor newer videos. If it actually means how long ago the training example was recorded that is rather strange, as I don't see how that would be needed on top of the video age. Neat graph, there.
I am often annoyed at how overly focused online recommendations systems are for my overly specific recent trends, rather than broader interests I display over months or years of using a product (looking at you Amazon). It seems like it should be relatively easy to learn 'this guy likes little video essays about art and science and sometimes fun talk shows' and yet YouTube has been pretty bad at recommending such video-essay style content to me. Perhaps this will improve it, although I wonder how much the recent history features end up overwhelming overall years-long type data about what interests me broadly and not just yesterday.
As an aside, is it really "Deep Neural Networks for YouTube Recommendations" if you are using 5-ish layers of embedding, ReLu units, and output? A bit humorous, that.
Your intuition is correct - there are other ways to capture the non-stationary nature of this particular problem. We thought that the example age approach is neat because it is a general technique for removing bias inherent to any machine learning system. Since examples always come from the past, you often have to be careful to prevent any system from being overly biased towards historical behavior. You don't need any additional metadata about items (what's the age of a search query?) and it's more resilient to predicting in regions the model has never seen because you fix serving to the very end of the training window.
I tend to think the focus on recent behavior is an artifact of underfitting. Research into richer temporal modeling is needed and recurrent networks seem promising.
We debated internally whether to use the "deep" moniker - Alexnet was 8 layers, so maybe the threshold is 8? The depth seems sort of irrelevant since stacking layers is trivial once the basic architecture is in place.
n conjugation with other product areas across Google, YouTube has undergone a fundamental paradigm shift to- wards using deep learning as a general-purpose solution for nearly all learning problems.
Can you talk about how this works in practice? Is the deep learning group separate from other teams and then tackles problems from different areas as needed, or are there deep learning engineers in each project area that are building nets for each different area? Is the ML team also redesigning product architecture by building products around reinforcement learning?
There are many close collaborations between product and research, as well as direct exchanges between different product areas. Close collaboration is key because those working directly on the product understand best the data, serving system and fundamental constraints.
A recent article [1] revealed how engineers are trained in ML across Google.
What I've heard from Google employees is that if you work there, you are getting training on Deep Learning for sure. It doesn't matter what team you're on, Google is now essentially a deep learning company (which sells ads).
9 out of 10 recommendations for me are terrible. They're either videos I've already watched or they're garbage designed to entice 13 year olds like "Top 10 Boobs In Movies." I assume this is caused by watching a lot of let's play and other game videos.
The recommendation system can't seem to handle outliers but maybe that's asking too much of current technology.
This is probably offtopic but I unfortunately have to agree with few others here, saying that my recommendations have never been that great.
The system suggests me lots of click-baits and low content quality videos (with massive views though). It's very rare that i get a great video that i eventually really enjoy in my recommandations.
My guess is that the system can't really tell if the video itself is made of good quality, brings good and fresh content.
Is that the case? how do you guys rate and measure the intrinsic video quality?
I am very interest in this. Deep NN are quite an interesting subject and something I'm personally quite curious about.
I also use youtube recommendations quite a lot for some fairly specialize interests [which I'll keep unstated for now]. My current impression has been that the recommendation system has only gotten worse in the last ten years and is now nearly broken (I get recommendations from third party websites now).
As I recall things, Youtube removed most user recommendation controls 5-10 years ago and the guesses it makes still haven't made up for this loss.
But there are other things I find even harder to understand. I find that when I'm not logged in, after choosing 5-10 videos, youtube will start to recommend good stuff, indeed things that I'd like on my regular recommendation list but which I never do see there.
My impression of my regular recommendations is that serves nothing but crudes averages, videos that I just assume someone pays Google to recommend. ("Sports" "celebrity fails", etc).
Which brings me to shock that the cream of the cream of AI somehow deploys this to me. I get that Convnets have made quantum leaps in image recognition competitions. AlphaGo was a clear advance. But where is the progress here? If the recommendation engine is categorizing videos, either the categorizations don't correspond to my experiences or its using the categorizations incorrectly. Broadly, my impression is the algorithm is swayed by whether a video is broadly popular rather than whether its in a given category. And I work hard to prune every off-topic suggested video or suggested topic, yet I get what seems like poor to worthless quality recommendations.
Please make it so I can block specific YouTube users from EVER being recommended to me, or showing up in sidebars or whatever.
I feel these features might actually start being useful to me if there was a way to tell YT "Please, please stop showing me this users videos, I absolutely never want to watch them at all".
Thanks for making your hard work available. It is very interesting from a technical point of view. I'm struck by just how huge a challenge this is given the enormous corpus size.
Are you familiar with Joe Edelman's work?[0] He specifically uses YouTube recommendations as an example of many algorithms designed to use the wrong metrics which leads to undesirable outcomes for users.
Have you ever looked into attributing reasons to users' visits? It seems likely that many users aren't looking for general recommendations that blend their entire use of YouTube together but want specialized recommendations linked to why they visited YouTube this time.
I use Youtube every day and find the recommendation engine extremely predictable. Whatever video(s) I've watched recently all the way through seem to dominate the recommendations. Which on the surface seems logical, but sometimes I watched a video and at the end decided I didn't really like it. I end up having to do a LOT of "Not interested" -> "I don't like this channel" / "I don't like this video" to clear them out.
I wish the recommendation engine had a better idea of what I liked based on the fact that I've been using Youtube for years, and I've thumbs-upped a lot of videos, and told it a lot of channels and videos that I don't like. But maybe that's just asking too much?
Thank you for sharing this paper, it's the first attempt I've seen at using Neural Nets for the candidate generation portion, which was cool to see.
How do you decide what the N in Top N should be?
I see you guys scaled features yourselves, why not use BatchNorm?
Do you think you could have eliminated the manual feature engineering with some learnable feature engineering? I'm mostly thinking of some sort of parametric activation functions, but I'm curious if you've thought about it.
Any thoughts on the Wide & Deep paper, did you try incorporating similar ideas?
Did you experiment with LSTMs for turning watch/search histories into fixed vectors?
You guys trained a regression model, whereas the common wisdom is that neural nets aren't so hot at regression, did you try training this as a bucketized classification problem?
Again, thanks for the paper and taking time to answer questions :)
First question: Are your innovations being used on Youtube today? At what date did you "pull the switch" and go over to the DNN-based recommender systems?
I'd say recommendations have gotten somewhat better, if a little too clickbaity still (you saw one video with a squirrel? here are ten squirrel video compilations!).
Second question: Did humans at some point assign names to DNN-established clusters or vector elements or what they're called? Sometimes I get OK recommendations, but with a really bad label (for instance a 100% minecraft LPer recommended as an example of a "shooter").
Could you elaborate on "we learn high dimensional embeddings for each video in a fixed vocabulary and feed these embeddings into a feedforward neural network."
So, each video is mapped to fixed size vector of floats? A user's history is now a matrix of size [number of videos, embedding size]? What are the other parameters in this sentence "Importantly, the embeddings are learned jointly with all other model parameters through normal gradient descent back propagation updates."? And how do you concatenate all these into a "wide layer" when users would have histories of different length?
Figure 3 illustrates that the variable sized watch history is combined with an average operation. This is partially why the embeddings need to be so large - in order to retain information after averaging, you need lots of dimensions to spread out disparate items.
This is of course not optimal, as the network should be able to learn how best to summarize the sequence. In the paper, however, we emphasize the importance of withholding certain sequential information from the classifier.
Have you experimented with replacing the averaging operation on the vectors with a recurrent network such as an LSTM. This way you can not ignore the temporal nature of the feedback (I have had success improving metrics doing this on implicit streaming video feedback).
My experience with YouTube's recommendations have been consistently near-sighted for as long as I've known them. Always recommends the most recent topic/theme in general.
Would love to see some more details on how you represent videos as feature vectors. Do you only use metadata provided by the uploaders (e.g. title and tags), or do you also analyze the raw video/audio somehow to augment the metadata?
The video embeddings in the paper are learned purely based on observing what users co-watch in sessions. In this sense, they can be thought of as latent factors in more traditional collaborative filtering approaches. When we inspect them, nearby vectors have a surprising amount of semantic similarity.
Features about the videos such as titles and tags, as well as features derived from audio and video, are introduced in the ranking phase.
word2vec did inspire earlier iterations of the model, but the key insight is that embeddings are learned jointly with all other model parameters. There is no separate source of embeddings. This way, embeddings are specialized for the the specific task.
In general what could be a separate source of embeddings? Also, how do these embeddings compare against traditional CF based latent factors?(I ask this in terms of a recommender metric and not complexity)
Great paper! How do you guys deal with new users and new items with little or zero historical data? Seems like the model wouldn't have good latent factors for them
Thanks! This model handles new users gracefully because it can fallback to demographic/geographic priors and gradually specialize as the user watches videos. New items are difficult because of the fixed output vocabulary and batch training. In practice, this model is best suited for the head of the distribution and other specialized recommenders handle extremely fresh/low viewcount items. Feature engineering is key for new content during the ranking phase.
Yes, it's a reasonable proxy. It was challenging to set up similar experiments with the old system because it was trained to approximate a different "surrogate" problem. We've also found that recommendation systems are very difficult to evaluate offline.
I noticed that YouTube's recommendations had suddenly gotten better! I wondered if they were using a new statistical approach, or had just started really optimizing at all because the old recommendations were extremely naive. I'm actually a little disappointed to find out that it might just be another deep learning thing. (Yes, it works, but I feel like you learn a little less about problem structure when what you read is mostly "we threw a generic function approximator and 30,000 hours of GPU time at it".)
That's curious, because in my opinion YouTube recommendations haven't been any good since 2009. Instead of getting interesting, strange and niche content, I'm bombarded with videos that have >100k views, feature clickbait titles and thumbnails and are generally incredibly low effort content.
Methods that work better for a population as a whole might not work better for a large subset of that population, and might even cause users to stop using features entirely. The lack of transparency in recommendation algorithms combined with the homogenizing effect of distributing low-quality content this way is something I find somewhat depressing.
Precisely. But that might be a local minimum. "Show him boobs and action trailers" is guaranteed to make him stay another 40min.
But perhaps there is a more risky strategy that takes longer to craft and actually delivers hours and hours of content to the user (but needs to fail longer before getting there).
It seems like reinforcement learning would be useful, i.e. at a high level, forming a policy for recommendations would require balancing exploration (experimenting with more risky recommendations) vs. exploitation (showing you recommendations that it knows will likely lead to clicks) and using the click-throughs, time spent watching the video, etc. as reward signals.
Does anyone know whether RL is used for recommendation in practical settings, and if so what is the current state of the art?
If people fall for the click bait(they do), then they get more views from optimizing for it. It would be more surprising if an optimization algorithm for views didn't favor this.
Please. Deep learning is cool and all and has its applications but state if the art inplies that it,s the best for everything when in reality is that they are just very good for a very little subset of problems.
During the last few months, YouTube has consistently recommended me videos I wasn't interested in (to put it in polite terms), in spite of the fact by now Google knows enough about me to answer quite reliably what I'm likely to be interested in. The only explanation that I can find is that their need to show me specific videos (what do they call it nowadays? “sponsored content”?) prevails over other considerations.
I don't feel like anyone has gotten recommendations right, even though one seemingly obvious approach has not been tried by anyone: allow ratings of favorite works across all media: movies, tv shows, books, music, radio programs, youtube videos. Make a very easy, efficient UI to add ratings. This way you will avoid superficial matches: if I just watched an excellent steampunk cartoon, let's offer a zillion of throwaway crap steampunk. It's not the steampunk part that I liked, it's that it was amazingly done.
If I was a huge fan of books, movies, music, youtube picks of another user, it may be there is a deeper connection of the kind of quality we are both looking for, and so his or her recommendations would be highly relevant.
Recommender systems moved from explicit feedback (like ratings) to implicit feedback precisely because users are less likely to actually rate stuff and also because ratings are subjective; by which I mean your interpretation of 3 stars(good) may not agree with mine(average). I have watched tons of movies/shows on netflix or videos on YT for that matter but have not rated a single video.
To address the other part of your suggestion i.e collapse ratings/feedback across media like movies,books,etc. usually it is very difficult to have a dataset that spans multiple media across the same set of users. Even if it is present it would be too sparse (more sparse than usual for a site like YT with a continuously changing content library) to actually help. Though I agree that if anybody can get the recommender right, it is Google with the sheer amount of info it has on each user.
You're kind of suggesting people go in reverse. Ratings were the initial way these things worked but then they moved to more implicit signals. Netflix used to be all about star ratings back in the day; now they want to measure what you're actually watching.
I think the issue of a system determining whether you like the steampunk genre vs the quality of only that particular steampunk video is separate from the issue of ratings.
But also view-time or view-count don't tell the full story of how much you liked that video.
I am not happy with YT recommendations because they suggest crap videos to me and not the finest one available for that topic, just as he said.
The system should rather suggest me a different topic but with the best quality/content available, rather than a super similar video with crappier quality/content.
Recommendation systems are a really interesting topic to study/engineer on. I think there's a lot of unexplored/undiscovered techniques still remaining.
Oddly... I have grown rather weary of them. I have yet to get a surprise recommendation that I cared for. At best, I have seen people surprised that some good recommendations came out of a system.
Which usually just leads to curation systems being key. And they work well, until they are gamed. And they will be gamed.
That's what makes the problem so interesting! Most recommendation systems are terrible, and those that aren't, are good only for the first 1-2 recommendations.
And then there's Google Search, which so thoroughly demolished existing search result recommendation systems (remember Altavista?) that they now own the market and are one of the most valuable companies in history.
When you finally solve a recommendation system problem in a way that actually works, it's a huge freaking deal!
I felt the same way until I tried Spotify's Discover Weekly recommendation system. I don't know what they're doing, but I've found many songs I really enjoy that way.
Heh. Me too. I suffer from joint/ligament issues and buy supplements for those.Now, Amazon thinks I am a geriatric and recommends me 'helping hand' sticks and incontinence products :\
I am currently looking for a decent comprehensive study on them. So far I have found this slide deck from netflix to be pretty good [1]. Do you have any suggestions?
Ahh...This happens to me also, there is one day, my recommendations is flooded with this CandianMum, and I didn't really watch that much, if any, video of relationship on Youtube.
I hate that one day I happened to click one video and watched it, then youtube starts to recommend videos on the same topic day after day, even if I marked them as not interested, they still show up time to time...
I'd be more interested if YouTube recommendations took into account users I've blocked.
Sometimes YouTube recommends me videos with clickbaity gross thumbnails or from YouTubers I dislike or have no desire to watch but there is nothing I can do to to stop it recommending these to me, why can't I just go to these users profiles and block them and have them removed from my YT experience?
Block just seems to stop people from messaging you, not from you being shown their videos by an algorithm.
I use YT mostly for new music, interesting documentaries and the occasional fun. Beside that there are one-off searches for random topics. Regarding the latter recommendation won't help, because it's not fast enough to tell me which aspect I'm missing. Regarding the former three I'd love to know whether there are users who liked the same videos. So please, recommend users not videos and let me do the rest.
Your recommender seems to be trying to predict (from logs) which video the user will watch next/soon, and how much watch time it will lead to.
If you used to have a bad recommendation system, and then you switch over to this system, then it will still be trained with data generated by users who saw the old recommendations, leading it to have a bias towards the same bad predictions.
Like others here I am also disappointed by the youtube recommended videos. So I was investigating building a better recommender myself. I was actually searching for how the youtube recommender works yesterday but could only find the 2010 paper. Now I am starting to believe that it is not the recommender that is the problem. It is that youtube consists of 99% low quality videos.
Is this the source of all those Recommendations that I look at Ben and Holly cartoons in the middle of the day when my daughter is at school? Or more Italian daytime television when I've just watched a video my wife never would? In short, is this the reason why there's never anything interesting for me when I go to the frontpage of youtube?!?!
You seem to be doing all the right things in this paper, yet user sentiment seems to still be negative. Do you think that's because maximizing watch time and impressing users are conflicting goals, or are there perfect recommendations out there which both impress users and maximize watch time, yet they haven't yet been found?
