I love that people still build beautiful things on top of RSS.
The author writes "I supplement this with content-based filtering, which involves analyzing the text of each essay". How is this implemented? Will he harvest the site for every rss item featured? Or use the shallow description provided?
I am asking cause I am heavily in the RSS topic aswell [0] and [1]
First I get the text of each article using Newspaper[1] (no rss involved). Then I used tf-idf and k-means to cluster the articles (I followed this tutorial[2]). Then I combine the clusters with my collaborative filtering model using feature augmentation: for each cluster, I generate N "fake users" who like each item in their cluster, and I add that to the rest of the rating data.
So in effect, content-based filtering is used to handle cold start (when a new article is submitted which doesn't have many ratings yet, the fake user ratings will dominate it), and then as real user ratings are gathered, it will switch gradually to relying only on collaborative filtering.
The author writes "I supplement this with content-based filtering, which involves analyzing the text of each essay". How is this implemented? Will he harvest the site for every rss item featured? Or use the shallow description provided?
I am asking cause I am heavily in the RSS topic aswell [0] and [1]
[0] https://github.com/damoeb/rss-proxy [1] https://github.com/damoeb/rich-rss