Hacker News new | past | comments | ask | show | jobs | submit login
Searching without PageRank (blekko.com)
45 points by wumpus on May 11, 2012 | hide | past | favorite | 16 comments



This reminds me of an aggregation site idea I had a couple of years ago: something similar to HN or Reddit, but with blocks up for all the most popular sites. That way trend voting wouldn't be such a problem. Once a site or an author got popular enough, they "graduated" and would no longer be available.

I still think there is a lot of merit to this idea. Too many things are decided simply on the momentum of popularity instead of actual value to the consumer.


Ooo, I like it.


I found searching minus the top 10k websites to be most effective for finding new stuff. It seems that after that point, a lot of the 'crap SEO' stuff is gone. One problem, the sort of baby with the bath water issue, is that the quality of content also goes down. The search engines still do a great job for the most part in surfacing relevant content, mostly because the barrier to entry to 'gaming' them is high enough ( that most people dont' really notice spam ). I don't know if seaching this way will be an effective strategy for me going forward, or even if for the most popular categories the spammers start a new SEO campaign, ( let's optimize these queries for after the top X domains. ), but it was definitely an interesting thought/search experiment. And only in an hour? impressive.


I bet there are types of searches for which excluding popular sites works well -- for example, trying to find less popular stuff, which is something that popular websites don't do that good of a job of covering. "Indie" topics, if you will. Would you try to discuss an indie movie on a website chock full of discussions about the latest blockbuster? Then when you search, don't you want to look at less popular movie websites to find the good discussion about your indie movie?


Already showed up here:

http://millionshort.com/


You're right -- I mention that right at the top of this blog posting as what inspired me to do this 1 hour hack.


'doh!


Aw come on, if HN's UI would be "advanced"[0] enough that it would collapse low-scoring comments[1], it would actually make sense to downvote this. But now it just seems petty.

[0] you know, like Reddit.

[1] or if its HTML wouldn't be so horrid to make me give up in despair every time I think to script myself such a thing (as none of the existing 3rd party solutions seem to have gone through the trouble of testing on Opera)


I found searching minus the top 10k websites to be most effective for finding new stuff. It seems that after that point, a lot of the 'crap SEO' stuff is gone. One problem, the sort of baby with the bath water issue, is that the quality of content also goes down.

What if some search company curated the top 10k websites, blocking those with crap SEO stuff? This might work for a niche player.

I don't know if searching this way will be an effective strategy for me going forward, or even if for the most popular categories the spammers start a new SEO campaign, ( let's optimize these queries for after the top X domains. )

What if the top cutoff varied randomly over time?


One challenge of getting rid of spam this way is that spam sites are both big and small.

blekko essentially already curates the top 10,000 websites for spam and terrible content. That's not good enough to get rid of marginal content at an overall acceptable website.

I suspect this technique is most useful for finding niche stuff.


I find the really strange / off-topic / ranty stuff, the dust bunnies of the Internet, fascinating in its own way.


This isn't searching without pagerank. (As far as I know, blekko uses pagerank)

It's just telling the search engine to ignore the first X results, which still relies on the pagerank algorithm.

Searching without pagerank would be similar to using a search engine pre-google.


First of, I agree that it's not quite "searching without pagerank".

> It's just telling the search engine to ignore the first X results

Nope, a lot of people seem to be confused about this:

It doesn't drop the first X results, but it removes the top X most popular websites from its index--the same ones for every query.

That's why this technique works so well, even if you just drop the top 1000 most popular websites, most of these sites you have probably already seen many times and/or know about. So by dropping them, you're ensured to get results that are fresh and new.

Of course how well this works for you really depends on what kind of ~seeker you are. If you're the kind of person that searches Google only to click on the first Wikipedia link instead of searching Wikipedia straight away (assuming you got a shortcut for that), these shortslashed searches will turn out disappointing.

"But before I did the search, I didn't know yet that Wikipedia would be the most useful result!", some might say. But they probably knew whether they were looking for a download page, company address or a definition/explanation, right? In the last case, how often did your top 10 Google results contain a useful result that was not Wikipedia? Not that often, in my experience. Usually you get some other links from About.com, Dictionary.com, and some stores like Apple or Amazon. Usually there's just one obvious correct result among the 10, or it's just not there and your search is going to be a frustrating one because tweaking keywords to improve your results has become impossibly unintuitive with Google's recent "I know what you really mean" approach.

So that's another thing, as long as you know when you want to head straight for a tried-and-true reference (wikipedia, dictionary, HN search--I find DDG's !bang searches useful for these when I don't have them configured as shortcuts yet), not only does removing the top X sites give you fresh and new results, also your first page of results will be a varied palette out of the possible interpretations of your keywords (which you can use to refine), instead of a unity sausage (aka, a UNOX).


My search did bring up some interesting stuff.


and now it's between blekko and the duck


Loving it, great to be able to exclude annoying sites pay for content sites. Keep up the good work




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: