I wonder whether a really good experience can be built by abstracting out recommendations to this degree. I think in the context of an actual product, there would be other considerations specific to the particular domain, which is partly why organizations often eventually grow up teams around recommendations or discovery.
For example:
- Perhaps there are relationships or similarities among items which are known prior to any feedback (e.g. different options of the "same" product are represented as distinct items in your catalog). Recommending to the user many closely related items may be a poor experience in some contexts.
- Perhaps you have other relevant actions, and care about more than maximizing the probability of a click. E.g. purchases, shares, etc. In an ecommerce context, you may want to recommend items at a range of price points; showing only very expensive items may get you clicks but not purchases.
- This has a concept of whether an item was "read" (which I interpret as an impression, or an opportunity to be clicked). But not all impressions are equal. Perhaps you have other knowledge about the context in which something was displayed. Was it "read" in the context of a search? Was it 8th in a list of 10 items?
The Universal Recommender (part of the Harness project) allows contextual and multimodal behavioral information to be used in recs. https://github.com/actionml/harness
Your consideration is absolutely right. The abstractions in Gorse do lose lots of informations. Prior knowedges, contextual informations is important to further improve recommender systems.
The problem is how to utilize these signals. The annoying thing is we can't solve these issues if we are not in this situation. So, feedbacks from Gorse users are important.
Really interesting project, thank you for sharing! Looking at the documentation [0], it would appear that negative feedback for an item can only be given by giving feedback that the user has read the item, and not then providing positive feedback. Is that the case? Or is there a way to provide explicitly negative feedback that I'm not seeing?
There are no explicit negative feedback yet. When a item is seen by a user, the read event is recorded. If this user likes this item, the positive feedback is recorded. It seems a natural way to track user's preference. I will try to add explicit negative feedback if someone really need it.
I think this looks very good, and I love seeing that it has the capability to automatically select the recommendation algorithm with the best performance.
I think there are two major areas where this can be improved. A quick read through the docs doesn't show much info about the algorithms themselves, and from what I can see, it doesn't look like you can customize the hyperparameters of the algorithms (correct me if I'm wrong). You have to just wait for the automated hyperparameter search to finish. I think this is mostly very good but the ability to freeze them would be even better to save compute time in subsequent retrainings. The other thing I notice is that there's no system to detect when the offline model is too out of date and trigger retraining. It would be nice to have a way of automating the retraining based on performance evaluations rather than on a clock schedule.
I just wrote in my problem validation platform about the need for personal recommendation system which can create a model based on the data of our activities stored locally and distribute the model to the services when needed[1].
Then came across your project, Although its meant to be integrated in products and is a replacement for propitiatory recommendation system I wonder whether it can be used to build a local recommendation system?
Are you thinking about offering this as a service as well? It feels like you could have a lot of interest with a run-it-yourself or we-do-it-for-you-as-a-service model. I couldn't see that after a quick look :)
I use recombee[0] and it seems pretty nice, albeit not open source. Not affiliated in any way. Not even a paying customer, just using the free plan for a personal project.
It might be worth sharing the project with some privacy-oriented forums to get some feedback and ideas on what a good privacy model would look like for it. Good luck! :)
A challenge with naive recommender systems in an ecommerce setting is that they tend to recommend the most popular items on your site rather than surfacing the long tail of SKUs, which doesn't generally add much value (you could achieve a similar effect by showing users recommendations from a hardcoded list of top items rather than leaning on a model). If Gorse is generic and unaware of the domain, how does it avoid falling into this trap?
When I worked for a place with an e-commerce recommender, about the first customer request feature was editorial control on the recommendations, so they could actually sell the things they made money on (or had in stock), rather than what the algorithm thought was best…
Now you've made me curious. If the point is to recommend stuff that people are most likely to buy next then recommending the most popular items is likely correct. So what metric do you have in mind that leads you to the conclusion that recommender engines shouldn't recommend the most popular articles?
Seller satisfaction will be the highest if the user buys the item. Basing the recommender on sales will make it a seller-focused recommender.
If the recommender somehow can be based on buyer satisfaction - it would be a system that focuses on the buyer. It would then perhaps not please the seller.
Taking it one step further, if a buyer is VERY satisfied, the buyer is likely to return for MORE purchases, ensuring the satisfaction of BOTH seller and buyer with the first recommendation. This is the ultimate recommender, measuring not first-purchase buyer/seller satisfaction, but rather measuring "so satisfied with the first purchase that the buyer returned for a second purchase".
To be honest, popular items have more chance to be recommended, and it is hard to avoid. Popular items have higher probability to be liked. In addition, popular items have more data collected, it helps to locate potential users.
Not sure this is true but in any case, the Universal Recommender uses a technique based on the TF-IDF algorithm of search to get long-tail recs. This de-weights popularity so that relevancy is more important.
Great job! Gorse reminds me of the now-retired Apache PredictionIO recommendation engine: https://predictionio.apache.org/templates/recommendation/qui.... Have you evaluated why it took off and why it was later shut down, so that your project can avoid its mistakes?
Apache PredictionIO was designed for data scientists (I was a committer so am quite familiar with design decisions). It was built to allow new "engines" to be implemented for any arbitrary ML/AI algorithm. These came in "templates" of various flavors.
However it turned out that only the "Universal Recommender" template (that I wrote) had real demand and virtually no one was developing or using other "engines"
As designed it was hard to deploy and was not multi-tenant. I could go on about the shortcomings but suffice it to say that after I did the Universal Recommender and saw the PIO shortcomings, we (ActionML, my consulting company) decided to write a from-scratch ML server, called Harness, to solve the PIO issues and act as the host for continuing work on the Universal Recommender.
cool project, i was curious which algorithms are used for recommendations/ranking, couldn't find it in the docs, so had to go through the code, may be add a section on algorithms and models?
The reason to use real time recommendations is there are contextual informations. However, Gorse doesn't use contextual informations, it doesn't handle real time recommendations yet.
Contextual information is not the only reason. New content/new users is the primary reason my current work place wants more real time recommendation especially when we mostly want to recommend content created in the last day or two. Items that are more than a week old should rarely be recommended. This is common for social media platforms.
For example:
- Perhaps there are relationships or similarities among items which are known prior to any feedback (e.g. different options of the "same" product are represented as distinct items in your catalog). Recommending to the user many closely related items may be a poor experience in some contexts.
- Perhaps you have other relevant actions, and care about more than maximizing the probability of a click. E.g. purchases, shares, etc. In an ecommerce context, you may want to recommend items at a range of price points; showing only very expensive items may get you clicks but not purchases.
- This has a concept of whether an item was "read" (which I interpret as an impression, or an opportunity to be clicked). But not all impressions are equal. Perhaps you have other knowledge about the context in which something was displayed. Was it "read" in the context of a search? Was it 8th in a list of 10 items?