Hacker News new | past | comments | ask | show | jobs | submit login

One idea that comes to mind to deal with the wikipedia / stackoverflow problem is result clustering. With Google News, they have done a pretty good job of clustering articles on a single story. They are getting better at deriving the original source in many cases. The simple act of duplicate detection should enable them to identify sites that scrape content and show them as duplicate results.

In the interests of results diversity, you don't want the same content repeated ten times on the first page, although this has the side effect of pushing the original source onto the second page if you guess wrong.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: