I love that people still build beautiful things on top of RSS. The author writes...

jacobobryant · on Dec 11, 2020

First I get the text of each article using Newspaper[1] (no rss involved). Then I used tf-idf and k-means to cluster the articles (I followed this tutorial[2]). Then I combine the clusters with my collaborative filtering model using feature augmentation: for each cluster, I generate N "fake users" who like each item in their cluster, and I add that to the rest of the rating data.

So in effect, content-based filtering is used to handle cold start (when a new article is submitted which doesn't have many ratings yet, the fake user ratings will dominate it), and then as real user ratings are gathered, it will switch gradually to relying only on collaborative filtering.

[1] https://newspaper.readthedocs.io/en/latest/

[2] http://brandonrose.org/clustering_mobile