Hacker News new | past | comments | ask | show | jobs | submit login
Google May Let Users Personally Blacklist Domains To Fight Spam (searchengineland.com)
61 points by w1ntermute on Jan 23, 2011 | hide | past | favorite | 42 comments



This article is speculative at best, but I can't say my heart didn't skip a beat at the prospect of never seeing experts-exchange.com in my search results ever again.


I hate how experts-exchange does their pages, trying to pretend you need to sign up to see the content. They even had an ad on the side that said "Tired of scrolling? Sign up!". One would hope such misleading tactics would be penalized in search results...

But on more than a few occasions, I've gotten some pretty high quality answers off their site.


If I recall correctly, it used to be the case that they only showed the answer to Googlebot and if you wanted it, you actually did have to pay (or perhaps forge your useragent)


I think Google added to their policy that if your site returned different content specifically to the Googlebot, it would have an impact on whether that site would appear in search results.

Experts-exchange got around that by making the answer appear at the bottom of the page and required a lot of scrolling.


It used to be the case that if you weren't Googlebot, you'd see some blurry scrambled text where the answer ought to be. The blurriness was created by superimposing a partially transparent image over the text, and the scrambled text was really the rot13'd answer, so it was possible to get the answer by grabbing the relevant chunk of html source and running it through a rot13 tool.


Even worse is some of the sites that have popped up duplicating content from stackoverflow. They often come up above the actual stackoverflow page for a given question and then don't even have the entire set of answers and comments that are present on the real site, or they just have a terrible layout that is hard to use. I dislike experts-exchange a lot, but at least they have original content and aren't just leaching off other sites.


This userscript can do the job in the meantime: http://userscripts.org/scripts/show/33156



Chrome users can also just use the userscript, thanks to Chrome's greasemonkey-devouring magics :)


Unfortunately, as much as I'd like to just ban them, I've totally gotten answers to questions from there before (at first using Google cache, until someone showed me the now-stupidly-obvious-in-retrospect scroll-down-to-the-bottom technique), so I know banning them from my listings will make my life slightly worse, and I can always ignore them.

Which, I guess, is a mindset that I think generally makes this concept not terribly useful: you can always personally ignore some domains that suck. The real issue is how many places have duplicate content, or even stupidly "reblogged"/"syndicated" content, that show up when you search for something: the domains for that are often very search-specific.


Same here. I hate that every click I make ends up at experts-exchange. I didn't even know you could find your answer by scrolling to the bottom until I read someone mention it on Hacker News.


Of course, this might just make the spammers buy up more domains ...


That's just fine, today many spammers are building up domain reputation to exploit, so if we force them out of their domains it will slow them down quite a bit.


And then the GBL gets deployed by Spamhaus.


Better than making them not buy up more domains.


An alternative that you can implement right now, is to create your own custom search engine powered by google that gives you very deep control over which sites are excluded from the results. google has made it easy and surprisingly powerful: http://www.google.com/cse/


Sorry, but how do you add all websites to the whitelist?


i think you are confusing google's custom search engine (CSE) and their site search. "Google Site Search" is for specific sites, but a CSE starts with the whole web, and you can tweak the weightings for certain sites or blacklist them if you wish.


Thanks for being helpful. Here's where I run into a problem: http://www.google.com/cse/manage/create

What do I put in the "Sites to search" box?


if i recall correctly, you used to be able to start with the full web and refine by just blacklisting... now i see that if you try to make it past the step you linked to without entering any websites in the "sites to search" field, it gives the following message: "You must enter 1 or more websites."

i haven't read the docs recently, but it does appear that they have restricted this product since it first launched.

sorry if i have led you astray.


Sounds like it would be a great way for Google to experiment with crowdsourcing results quality. If there's a ton of people blocking, say, experts-exchange.com – that should feedback into their search result quality ranking.

Of course scammers would attempt to game this system too (by, for example, blacklisting stackoverflow.com) :)


I think this would help with domain squatters, in some situations. Who'd buy a 'used' domain that's already built up a reputation for being filled with garbage?


For those who might have missed it, the Blekko (http://blekko.com/) search-engine already allows one to do this in a quite powerful way, and Google's plans might have been a response to it.

(the beauty of improvement through competition...)


Can't you already just use -site:example.com if they're spamming you? I usually only get spam results when I'm searching for something that doesn't actually exist, in which case spam is the only thing around for it to show me.


Sure, but they don't let you block dozens of sites and it's annoying to have see your queries cluttered up with -site:ehow.com. It's not just classical spam that I want to see blocked in my SERPs, but widely low quality content like eHow, Yahoo! Answers, Cha Cha, Mahalo, HubPages, Experts Exchange, Associated Content, Examiner, etc etc.


That only works for a small number of domains.


That would be wonderful. After the 3rd or 4th SPAM farm I usually decide the information I was looking for wasn't really that important and give up. The Instant Search feature was kind of the breaking point for me. It showed me how useless most of my carefully crafted search terms really were. You can sit there and type it in 30 different ways but you're going to keep getting the SPAM results over and over again.


Prediction: this will not happen. But, I would love to have to eat my words.


If they allow it done for individual users, they will have to prevent users from trading lists/scripts or they risk seriously cannibalising AdWords revenue.


Users who are smart enough to trade those lists (and care), are probably already using ad-filters.


I'm still holding to my argument that black-listing is a fools errand. Spam is in unlimited supply. What we REALLY need is a white-listing system, which is probably easier said than done...


Once I block sites I should be able to export my list of blocked sites which anyone can import. And there can be several personal search profiles. Makes sense to anyone?


For some queries, not necessarily all, I'd also like to be able to block all sites containing AdSense.

An obscure advanced search operator would be fine. I don't mind having to type it, or having it clutter my query-text. Options for blocking other ad networks, or all, would also be appreciated.

If Google offered this, it would be a strong indicator that search functionality is offered without any contamination by revenue concerns.


I've seen lots of high quality blogs that have AdSense just to pay the traffic costs -- you would potentially be filtering out a lot of good content. (Not to mention the fact that since Google makes money off AdSense, you'd be asking Google to essentially cut off its own foot).


One of my own blogs has AdSense! So I'm not against it in all cases.

I just want to option of retrying some heavily-SEO'd-by-MadeForAdSense-sites queries without all those sites in the results.


> For some queries, not necessarily all, I'd also like to be able to block all sites containing AdSense.

What kind of queries would you want to do that for? I can't imagine any.


Because when was the last time you saw a spam site that didn't use AdSense?


I don't currently have exact examples in mind. (For the last few months, NoScript has blocked me from seeing lots of the most annoying ads, and I've been trying to avoid clicking on some of the worst domains, like ehow.com.)

I just recall that on some queries – especially related to diet/health, or product/household how-tos – after clicking several of the top results, I'd seem lots of AdSense, and very little useful content.

Other ads can be just as bad; I mentioned AdSense specifically because it's so widely used, and has at times seemed the first choice of low-quality filler sites (though that may be changing). Also, implementing an AdSense filter would be the acid test of Google's willingness to offer search functionality with no concern for revenue effects.

I don't need such a setting to be persistent; I certainly would not suggest it should be the default. So that the functionality is not spun as a specifically anti-AdSense step, I would be happy if it worked against any <SCRIPT>-included ad-codes, based on originating domain(s). For example, [-adsfrom:doubleclick.net].

That is, if as this article speculates, I may soon be able to block by content-serving domain, I'd also like to be able to block by content-subsidizing domain, as evidenced by the page's ad inserts.


Block sites with Adsense? Large part of Google's revenues come from Adsense.

Advertising is also the only monetization for most sites.

What can you contribute to Google and other sites if you want to cut off their revenue?


Not everyone is out to make a buck, some people are just having a discussion or have a point, and they might appreciate being found.


Being found means traffic, and traffic means expenses. Someone who is able to provide a useful (and findable) resource won't necessarily be able to figure out non-AdSense ways of making money.

It's not always about profit.


You could already remove sites from your search results when you are logged into gmail. This would be an extension of the feature, like if many people are removing results from their search then maybe put them at the bottom or give them lower rank.. interesting




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: