Hacker News new | past | comments | ask | show | jobs | submit login

Does Google still use PageRank anymore? The search results are so bad these days, I'm dubious on how many other websites reference the top results they show.



PageRank was never as important as people thought it was.

Think of it this way: a search engine needs a relevance score that connects a query to a document. If the number of documents is vast (e.g. billions and billions) a search engine also benefits from a document-dependent quality score.

The first is more important than the second. You'd rather get a poor quality document that is relevant to the topic than a high quality document which isn't relevant.

It took several years before papers in the literature came out that found PageRank useful in search results, the key thing is that you need a real excess of documents. With millions of documents you are better off without it (being more effective at finding relevant documents improves performance), you really need 100 million + to reach the point where you have so many relevant documents for typical queries that filtering on quality doesn't get in the way of relevance.

PageRank can be thought of as simulating a Markov process where a user clicks a random link on a page most of the time but with some probability jumps to an entirely random page. PageRank is proportional to the probability that a user visits the page, or alternately how much traffic a page gets.

Google very quickly developed a few ways to sample this directly, such as (1) making Google analytics almost ubiquitous, (2) making Google ads almost ubiquitous, (3) analytics from the Chrome browser.

Google denies using the above for ranking, but they've been known to lie about Google's relevance factors before. Even a small sample from the above 3 could be used to calibrate models based on other info.


There are known link structure metrics besides PageRank that both scale better and are harder to game, though I'm not sure how their effectiveness in ranking compares to PageRank in the un-gamed case.

I used to work on Google web search indexing, almost 20 years ago, and (1) it has been public knowledge since around the time I left that PageRank was just one of a plethora of ranking signals going into the Learn To Rank page ranking ML and (2) I strongly suspect that PageRank itself has been replaced by a somewhat similar reverse link weighting algorithm.

I strongly suspect that something roughly PageRank-like is still there deep in the bowels of ranking, and due to the non-linear nature of ML, its importance probably varies greatly page to page.


>There are known link structure metrics besides PageRank that both scale better and are harder to game

Any examples?


For one, the TrustRank paper discusses some variations on link structure ranking, some of which are less computationally intensive than PageRank for incrementally updating. Back 20 years ago, Google News and the more frequently updated web pages would have an incremental estimated PageRank patched in for updates that came faster than PageRank was re-calculated.

https://en.wikipedia.org/wiki/TrustRank


Thanks, this is new to me.

paper link:

http://ilpubs.stanford.edu:8090/770/1/2004-52.pdf


Did you see the news that has been published today about Yandex? Maybe there is something that could be interesting to you https://searchengineland.com/yandex-search-ranking-factors-l...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: