In that subthread, I posted a heat map data visualization I made of IMDB rating vs. release year, which is suspiciously similar to the one OP made: https://news.ycombinator.com/item?id=20099344
It's not plagiarism, but if the OP got the idea for checking for recency bias from the HN thread, they should explicitly credit that.
Hey - not OP but I wrote the article. There was a lot of discussion about recency bias in that thread, including the graph you shared and I decided to expand on the analysis. Note there are a few other graphs in my post. I did just draw this up quickly yesterday. But you're right, sorry - I should have cited your comment in my post. Adding it now.
Hi, there are what seem to me (not a statistician) to be a number of serious errors in the article. The most egregious is the first conclusion, which threatens to undermine the whole thing.
> There does not seem to be any strong connection between number of votes and a movie's IMDB rating.
Well, this just isn't true. I happen to know this because I've looked at this data before myself. The issue is that you're cutting off the graph at a maximum of 60k votes with no explanation or even pointing out that you're doing it! I'm sure this is just an honest mistake, but it cuts out basically every movie that is actually popular!
This slowly ate at me until it was enough to get me out of bed to redo this graph myself. I've uploaded my quick and dirty result in R here: https://i.imgur.com/TTuCFEL.png
As you can see from the graph, there's a direct and obvious link between the number of votes and the average rating. The reason I say this threatens to undermine the whole thing is that I also know from previous experience that more recent films tend to have a lot more ratings on average. This (naively) suggests to me that we should see some recency bias if only because more recent blockbusters get a much larger number of votes than others and the sort of people who vote on blockbusters have different standards for what makes a great movie. (Avengers: Endgame was briefly the #1 movie of all time on the top 250 list.)
I don't have time to plot all the different graphs you did, but perhaps you can recheck your results after fixing at least this issue, and get back to us?
Follow up: I redid one of the graphs to satisfy my own interest in the question. Instead of averaging over all the movies ever released like the article did, I averaged over all the votes. This was an attempt to answer the question of whether the average viewing experience of a film from year x is better or worse than that of a film from year y.
I found that there has been a noticeable decline in the average rating of over half a point (out of 10) since ~1930 or so. https://i.imgur.com/bY9vPvk.png
I think this should be explained in a combination of two ways:
* History acts as a filter. If I choose to watch a movie from 1947 it's probably because a lot of people over the years have said it's good.
* To some extent, older movies may really be better than more recent ones.
I still suspect that the blockbuster effect means that the movies people are mostly to watch are going to receive higher ratings than the average film overall. And this is mostly because people rating blockbusters are less critical than the film-viewing community as a whole. So while there might be no "recency bias" of the kind this article was looking for, there might be blockbuster bias where in the last several decades studios have figured out how to capitalize on a less critical / cynical audience. (That hockey stick graph of 8-10 ratings in the article is certainly suggestive.)
Actually -- one more follow up for the zero people who will read this -- there's a simpler explanation, which other people who do these kinds of statistics should probably note.
The average rating between 1920 and 1980 fluctuated at about 7.5. Then the number of movies exploded, and the variability in quality of them exploded even more than that. Movies that were better than the 1980 average had their ratings compressed (you can only do a little better than 7.5 on IMDB). Movies that were worse were much less compressed (you can easily do much worse than a 7.5). So the average gets dragged down since there are now both more movies getting 0-4s and movies getting 8-10s.
I realized this after plotting the median, not the mean, and observing that it stays remarkably constant, and possibly even shows a little recency bias for the last 2 years.
Part of the reason I make data vizzes is so people can expand on the ideas, so there's no issue there! (as long as proper credit is given to the original work)
In that subthread, I posted a heat map data visualization I made of IMDB rating vs. release year, which is suspiciously similar to the one OP made: https://news.ycombinator.com/item?id=20099344
It's not plagiarism, but if the OP got the idea for checking for recency bias from the HN thread, they should explicitly credit that.