Hacker News new | past | comments | ask | show | jobs | submit login
Directed Edge sees a post-search web where recommendations rule (techcrunch.com)
108 points by GVRV on Aug 6, 2009 | hide | past | favorite | 30 comments



"We can take data sets with millions and millions of data points and figure out what’s related to a given item in a few milliseconds. Most recommendations engines pre-compute stuff rather than generating the recommendations in real-time like we do"

I've been looking into recommendation algorithms recently (started with the excellent book "Programming Collective Intelligence"), and this sounds lightyears ahead of the way we currently do things. I suppose you could take one of the algorithms that requires pre-computing and throw resources at it, but it seems like they are talking about something new.

Since I'm just getting started, I'd like to find some academic (or blog-faux-academic) articles on whatever recent advances behind recommendations without the need for precomputing. Anyone know where to look?


There is no magic. I don't know what Directed Edge are doing, but simpler Amazon-style recommendations (people who bought X also bought...) doesn't need to precompute anything if you choose your data structures properly: get a list of things people bought along with X, count them (or actually store them counted), optionally normalize for general popularity, sort in decreasing order, show top results.

I'd love to hear something from wheels though.


Amazon's algorithms are simpler than our own (at least as far as I can tell) and most recommendation engines use some sort of embedding to reduce the dimensionality of the problem.

Amazon's related products do in fact seem to be, or very near to, a simple counting structure. Our ranking algorithm builds a large subgraph around an item and then does a few passes with a couple different ranking schemes to try to figure out the hot items within that subgraph, prunes "noisy" connections (i.e. links that are "hot", but don't actually pack much semantic meaning) and then tries to scale things so that the results returned aren't simply those with the largest overlap, but those that are most relevant within that subgraph relative to the larger graph. In that sense, it has some similarities to web-search algorithms.

In user-visible terms, that means that our results are often less obvious than Amazon's recommendations -- for a long time we called that the "tell me something I don't know" problem. It's no good to do a search for "Miles Davis" and have "Jazz" com back as a related item. If you know about Miles Davis, you already know about Jazz.


Do you have any kind of results that demonstrate how accurate your recommendations are? I checked your site but couldn't find anything


You mean examples or metrics?

If you mean examples, you can see our algorithm applied to link structure analysis on the related pages here:

http://pedia.directededge.com/

If you mean metrics, the only one that I find really meaningful is a feedback loop to see what users are in fact interacting with, and we'll have something in place for that shortly. Synthetic metrics on recommendations quality don't really impress me because they ignore that recommendation algorithms are solving an human-computer interaction problem as much as they're solving a k-nearest-neighbors problem. I've got another article in the pipe on some of the interesting problems of ranking on real data, but it keeps getting pushed back since there's you know, a lot to do at the moment. :-)


How would you compare your offering, both in terms of technology and business application, to someone like CleverSet (now ATG Recommendations)?


So, you presume doing all this does not take any time for millions of objects?


All I'm saying is that it's possible with the right data structures. It's not more than what a typical search engine does.


They have a couple of articles on their site: http://directededge.com/tech.html


Thanks, you replied right before I was going to edit in the answer from http://developer.directededge.com/article/Introduction_to_Re...:

"Graph-based recommendation systems".

Something new to learn!

edit: the linked page mentions the book I'm reading! Looks like the answer was a few pages further in :)


"Directed Edge truly believes that we’re about to see a shift on the web away from search and towards recommendations."

The difference is somewhat arbitrary when you think about it. When I Google, I'm asking it to recommend me stuff related to what I'm looking for. Google is nothing but the world's best recommendation engine.

There are about 1,000 sites that could use good recommendation technology to enhance their profits though, so I like this company's monetization chances. Easy elevator pitch too: recommendations as a service.


It's a continuum, as hinted below, but the results are pretty different in the polar cases. I'll fall back to my Miles Davis example.

Searching for "Miles Davis" returns 10 pages about Miles Davis. Hitting our engine with Wikipedia data for Miles Davis gives you John Coltrane, Herbie Hancock, Wayne Shorter, Thelonious Monk, Sonny Rollins, Canonball Aderly, and so on.

http://www.directededge.com/?Miles%20Davis

The search end of the spectrum is about finding something you're looking for -- recommendations are about discovering things you didn't know about.

The two meet in the middle with "personalized search". If I type "python" into a search engine, do I want snakes or code? You can probably figure that out based on what I've done in the past.


Yeah, and that's awesome, and I think will totally converge with the type of search we have today. Imagine you type "Miles Davis" in Bing (which I'm using because they'd be more likely to experiment with altering the dominant paradigm than Google) and they show you a column of pages on the left about Miles and the stuff you're showing on the right. That'd be bad ass.


You are right, there is a continuum of possibilities between pure search and pure recommendation systems.

The difference is in the input data that the search/recommendation engine considers: the input in classic search is only the few words in the query, while a recommendation engine considers some history of your interactions with a site or the internet (and nothing that indicates your intent in this particular instance).

Google, of course, keeps search history for people logged in. But it doesn't seem to affect their relevancy algorithm much, unlike, say, your location and language.

I'd speculate Google Checkout is mostly about gathering data that would allow for meaningful recommendations.


I was about to post this too. Surely all the possible nodes include all possible search terms, as well as all web pages. Then search and relations are synonymous. Any thoughts on how your system would be different from a search engine in that sense Wheels?


Wow, congrats to wheels & co for the TC hit and YC - I don't think anyone knew about that!


Hacker News user "Wheels" is a founder - congrats on the write up.

Didn't realize you were YC funded


Question:

Greg Linden, who worked on Amazon.com's recommendation engine, has referred to what he calls the "harry potter problem". To quote from his blog:

'...this calculation would seem to suffer from what we used to call the "Harry Potter problem", so-called because everyone who buys any book, even books like Applied Cryptography, probably also has bought Harry Potter. Not compensating for that issue almost certainly would reduce the effectiveness of the recommendations, especially since the recommendations from the two clustering methods likely also would have a tendency toward popular items.'

How did you compensate for this problem? Do you simply ignore vertices in the graph that have a large degree?

Or, are you using non-linear weighting functions, such as a perceptron's sigmoid function?

With regard to Wikipedia, almost everyone who has edited an article has also edited the article on Bill Clinton. So, if you are using the edit-history metadata to compute recommendations, you would have to compensate for the "Bill Clinton problem".


I have been following the "reccomendation algos as a service" for about 2 years now. This definetely seems interesting but a side opportunity could be to do aggregator/optimizer of recco algos for merchant/publishers similar to what Rubicon Project/Pubmatic does on aggregating/optimizing ad networks.

The reccoAlgo aggregatot would take all the various recco algo services such as DirectedEdge, Aggregate Knowledge, Loomia, Minekey, Persai (now dead) and many others and keep running tests (similar to the netflix prize) and whatever is better is given more airtime on suggesting related products/pages for retailers/publishers. The compensation model would work on a percentage of revenue for additional clicks/purchases on the suggestions.


"....we’ve gone from having a graph-store to having a proper graph database..."

A graph database and a "triple store" in semantic technologies are essentially the same thing. This company makes some very aggressive claims that allegrograph, Jena, Oracle (with Spatial), sesame and others (including the korean arm of my current company) have also made. Typically, such claims fail to live up to the marketing. I wonder how this solution compares to these traditional triple stores?


Are there any objective measurement of the quality of recommendations across various recommendation systems?


There is more than one, depending on the problem and what exactly do you want to improve. For example RMSE is the measurement used for the Netflix Prize.


I really liked Greg's commentary on RMSE -- summed up my thoughts better than I could:

http://glinden.blogspot.com/2009/03/what-is-good-recommendat...


Also congrats for DirectedEdge for doing the animation without flash on the front page. Nice!


You can actually search in the little bar there for any wikipedia article to show related stuff, and as kind of a little easter egg, you can specify a starting page via the query string, e.g.

http://www.directededge.com/?Y%20Combinator


This little wikipedia search bar is hiding nodes from the graph most of the time. It bothers me a little.


We tried a few combinations -- and may try some yet. If it was further right it threw off the balance of the layout. If it was up at the top it drew too much attention to itself. We tried messing with the z-index on hover too, but that looked funky.


I thought it would be nice if the user could drag it around.


Haha Paul was disconnected from Jessica for a bit.

A few more clicks and they were back together. Awww.


cool, have you thought about entering it in any online recommendation contests? ( like netflix or the github one )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: