Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Gorse – An Out-of-the-box open-source Recommender System (gorse.io)
212 points by zhenghaoz on July 17, 2021 | hide | past | favorite | 41 comments



I wonder whether a really good experience can be built by abstracting out recommendations to this degree. I think in the context of an actual product, there would be other considerations specific to the particular domain, which is partly why organizations often eventually grow up teams around recommendations or discovery.

For example:

- Perhaps there are relationships or similarities among items which are known prior to any feedback (e.g. different options of the "same" product are represented as distinct items in your catalog). Recommending to the user many closely related items may be a poor experience in some contexts.

- Perhaps you have other relevant actions, and care about more than maximizing the probability of a click. E.g. purchases, shares, etc. In an ecommerce context, you may want to recommend items at a range of price points; showing only very expensive items may get you clicks but not purchases.

- This has a concept of whether an item was "read" (which I interpret as an impression, or an opportunity to be clicked). But not all impressions are equal. Perhaps you have other knowledge about the context in which something was displayed. Was it "read" in the context of a search? Was it 8th in a list of 10 items?


The Universal Recommender (part of the Harness project) allows contextual and multimodal behavioral information to be used in recs. https://github.com/actionml/harness


Your consideration is absolutely right. The abstractions in Gorse do lose lots of informations. Prior knowedges, contextual informations is important to further improve recommender systems.


Could these issues be solved by providing connectors to different data stores to Gorse can use multiple signals in a configurable way?


The problem is how to utilize these signals. The annoying thing is we can't solve these issues if we are not in this situation. So, feedbacks from Gorse users are important.


So are there any plans for context aware recommenders?


Once I collect context aware dataset, context aware recommenders could be implemented :)


Really interesting project, thank you for sharing! Looking at the documentation [0], it would appear that negative feedback for an item can only be given by giving feedback that the user has read the item, and not then providing positive feedback. Is that the case? Or is there a way to provide explicitly negative feedback that I'm not seeing?

0: https://docs.gorse.io/ch01-02-recommend.html


There are no explicit negative feedback yet. When a item is seen by a user, the read event is recorded. If this user likes this item, the positive feedback is recorded. It seems a natural way to track user's preference. I will try to add explicit negative feedback if someone really need it.


"ignored by 84% of visitors" is a metric


If you start with an algorithm that accepts any type of user behavior (is multimodal) then using user-dislikes is easy.

We can easily do this with the Universal Recommender as shown in this article (reprinted from the IBM dev blog) https://actionml.com/blog/making_dislikes_predict_likes


I think this looks very good, and I love seeing that it has the capability to automatically select the recommendation algorithm with the best performance.

I think there are two major areas where this can be improved. A quick read through the docs doesn't show much info about the algorithms themselves, and from what I can see, it doesn't look like you can customize the hyperparameters of the algorithms (correct me if I'm wrong). You have to just wait for the automated hyperparameter search to finish. I think this is mostly very good but the ability to freeze them would be even better to save compute time in subsequent retrainings. The other thing I notice is that there's no system to detect when the offline model is too out of date and trigger retraining. It would be nice to have a way of automating the retraining based on performance evaluations rather than on a clock schedule.


Thanks for your advice


Congratulations & Great work on the project!

I just wrote in my problem validation platform about the need for personal recommendation system which can create a model based on the data of our activities stored locally and distribute the model to the services when needed[1].

Then came across your project, Although its meant to be integrated in products and is a replacement for propitiatory recommendation system I wonder whether it can be used to build a local recommendation system?

[1] https://needgap.com/problems/269-keeping-the-things-i-like-w...


I think NLP technology should be used for your project :) I plan to integrate Gorse with NLP, but it might take years to implement it.


NLP is what I was thinking too! To be frank right now I'm just fixated on the problem and the approach to solve it is still in nascent stages.

I think NLP Gorse would be a very valuable addition and agree that it would a huge effort to do it properly.

You're welcomed to comment about Gorse or anything else in the aforementioned thread as well if you have some suggestions to address the problem.


Great project!

Are you thinking about offering this as a service as well? It feels like you could have a lot of interest with a run-it-yourself or we-do-it-for-you-as-a-service model. I couldn't see that after a quick look :)


I use recombee[0] and it seems pretty nice, albeit not open source. Not affiliated in any way. Not even a paying customer, just using the free plan for a personal project.

[0] https://www.recombee.com/


Thanks. I will take a look at it :)


Good idea! But before that, we have to take a look at GDPR :D


I feel you :D

It might be worth sharing the project with some privacy-oriented forums to get some feedback and ideas on what a good privacy model would look like for it. Good luck! :)


I will try it :)


A challenge with naive recommender systems in an ecommerce setting is that they tend to recommend the most popular items on your site rather than surfacing the long tail of SKUs, which doesn't generally add much value (you could achieve a similar effect by showing users recommendations from a hardcoded list of top items rather than leaning on a model). If Gorse is generic and unaware of the domain, how does it avoid falling into this trap?


When I worked for a place with an e-commerce recommender, about the first customer request feature was editorial control on the recommendations, so they could actually sell the things they made money on (or had in stock), rather than what the algorithm thought was best…


Now you've made me curious. If the point is to recommend stuff that people are most likely to buy next then recommending the most popular items is likely correct. So what metric do you have in mind that leads you to the conclusion that recommender engines shouldn't recommend the most popular articles?


Seller satisfaction will be the highest if the user buys the item. Basing the recommender on sales will make it a seller-focused recommender.

If the recommender somehow can be based on buyer satisfaction - it would be a system that focuses on the buyer. It would then perhaps not please the seller.

Taking it one step further, if a buyer is VERY satisfied, the buyer is likely to return for MORE purchases, ensuring the satisfaction of BOTH seller and buyer with the first recommendation. This is the ultimate recommender, measuring not first-purchase buyer/seller satisfaction, but rather measuring "so satisfied with the first purchase that the buyer returned for a second purchase".


To be honest, popular items have more chance to be recommended, and it is hard to avoid. Popular items have higher probability to be liked. In addition, popular items have more data collected, it helps to locate potential users.


Not sure this is true but in any case, the Universal Recommender uses a technique based on the TF-IDF algorithm of search to get long-tail recs. This de-weights popularity so that relevancy is more important.


Great job! Gorse reminds me of the now-retired Apache PredictionIO recommendation engine: https://predictionio.apache.org/templates/recommendation/qui.... Have you evaluated why it took off and why it was later shut down, so that your project can avoid its mistakes?


Apache PredictionIO was designed for data scientists (I was a committer so am quite familiar with design decisions). It was built to allow new "engines" to be implemented for any arbitrary ML/AI algorithm. These came in "templates" of various flavors.

However it turned out that only the "Universal Recommender" template (that I wrote) had real demand and virtually no one was developing or using other "engines"

As designed it was hard to deploy and was not multi-tenant. I could go on about the shortcomings but suffice it to say that after I did the Universal Recommender and saw the PIO shortcomings, we (ActionML, my consulting company) decided to write a from-scratch ML server, called Harness, to solve the PIO issues and act as the host for continuing work on the Universal Recommender.

https://actionml.com/docs https://github.com/actionml/harness

So the best part of PIO (The Universal Recommender) lives on with a clean new design and modern architecture.


Hey, it's nice to e-meet you! We use Universal Recommender altogether with SimilarProducts template in PIO for a while.

PIO is/was a great product, it's sad Apache decided to retire it.


cool project, i was curious which algorithms are used for recommendations/ranking, couldn't find it in the docs, so had to go through the code, may be add a section on algorithms and models?

it seems that one algo used is "Bayesian Personal Ranking" based on a paper from 2012 https://arxiv.org/pdf/1205.2618.pdf, here is a related blog post https://towardsdatascience.com/recommender-system-using-baye...

another one is Weighted Regularized Matrix Factorization/ALS (by the Netflix Prize winner team): http://yifanhu.net/PUB/cf.pdf , also see https://www.ethanrosenthal.com/2016/10/19/implicit-mf-part-1...


I list papers here: https://docs.gorse.io/ch01-02-recommend.html#online-evaluati...

I will introduce algorithms briefly in documents in the future :)


Does this handle real time recommendations as described here: https://eugeneyan.com/writing/real-time-recommendations/


The reason to use real time recommendations is there are contextual informations. However, Gorse doesn't use contextual informations, it doesn't handle real time recommendations yet.


Contextual information is not the only reason. New content/new users is the primary reason my current work place wants more real time recommendation especially when we mostly want to recommend content created in the last day or two. Items that are more than a week old should rarely be recommended. This is common for social media platforms.


Thanks for your insight. In the recommendation scenario that freshness is important, real-time recommendation is more in demand.


What do you think about using a graph DB like neo4j, redis-graph or the chinese competitor Nebula Graph?


Graph databases are good at locating related items/users, which is useful for recommender systems :)


I misread the title as "Gource".

If you're not familiar, it's a fun history visualizer for git repos: https://gource.io/


Yeah, I know "Gource", another cool project. :D




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: