Hacker News new | past | comments | ask | show | jobs | submit login
A guide to the latest Mahalo spam techniques (blogsblogsblogs.com)
81 points by dazz on June 17, 2010 | hide | past | favorite | 44 comments



Disclaimer: I used to work for mahalo.

You mentioned these sites having no unique or original content but after a few quick searches on questions on the star wars answers sites it was pretty clear that all the ugc is original. True there is not much of it, but thats because the sites are only a few months old at the most.

You've proven that the category structure across the sites is similar and that new sites often don't have much content but Google already knows not to grant page rank power to new undeveloped sites. The backlinks from these domains will only carry weight if users engage it and significant content is built...and if that happens then your entire argument goes away.

If there are actual users building niche content, I don't really see the problem here. If you don't agree with that then you're making the case against all niche forums and communities just because you don't like the guy.


Google, for whatever reason, thinks that a very light touch with regard to spam sites is appropriate. (Although Cutts is right that an inurl: or site: query doesn't show anything - he's saying, "Show me a real-world query that has nothing but spammy results and then I'll do something").

I think and have thought that a Google competitor could come along and beat them by having a heavy hand against spam sites. If it's designed to attract search engines, NewGoog doesn't index it, period. Any business generating datafog for search engine consumption gets deindexed entirely and permanently.

Go ahead, type any question you want into Google. ANY question at all. You get a list of 500 search engine question spam sites, most of which are written in broken English and contain more or less nonsense strings of words thrown together. NewGoog would wipe all that shit out.

Attention Google: offer an option (like SafeSearch) that applies a HEAVY HAND to spam sites. We will love you for it.


What's the best way to pass a site over to Matt Cutts?

This site has GOT TO GO! http://www.codeweblog.com/ It is polluting my Google search results.

As an example, search for "android intent activity pass". The search result titled "android intent pass variable activity" looks promising, but clicking on it brings you to this site, which contains machine generated nonsense content!


I've also been seeing it regularly in technical search results, and it seems to have come from nowhere as I don't remember ever seeing it before (the domain's associated with a Chinese address, created 2006).


I submitted a spam report about that site the other day. I've been getting up to scratch on ruby and a lot of searches for gem names return that site. I have gotten so used to trusting the top results in Google that I never think to check the domain - I need to stop that.


My favorite is bigresource.com and others that are scraping forums ... fucking hate those sites so much, reported them but of course those emails go nowhere.


Are those Markovian chains, or is someone actually writing that stuff?


This site was killing me just last night! (I was looking for info on CKEditor plugins)


I do this at DuckDuckGo, that is crawl, identify and delist over 60M domains. I think it works pretty well and is certainly a value add, but it is not immediately obvious to new or even many old users.


Just a heads-up for you, and I'm sure you get this all the time, but I'm a new DDG fan.

The first time I tried it some months back, it was eh-OK, but not compelling enough to pull me away from Google. In the meantime though, I've gotten more and more frustrated with Google's results -- due in part to link-spamming sites -- that DDG has become a lot more compelling. It's my default search provider now.

One question though: I commented to a good old friend of mine that search was starting to feel like the old AltaVista days again. I spent a lot of time trying to find a site that I had seen just a couple of weeks previously, and no amount of Google-fu (or Duck-fu) could locate it. I kept getting completely generic results that weren't helpful at all. I finally found it by remembering where I had seen the link, and using a custom search feature on that site in particular.

So ... is there a way for a search engine to rank pages by "freshness"? i.e., some combination of recently-updated and recently-mentioned?

Anyway, thanks for DDG.


Google exposed some sort of recency-based filtering UI with their latest update, although I'm not sure how it gets its results.


Thx for the heads-up. This is very helpful feedback.


Remedy for this is to bookmark a site you like :)


That isn't really fair to Matt. What he's saying is, show them ranking for a query they shouldn't be, and then he'll start to be concerned.


The essential question I ask is whether Mahalo is gettingthe same treat ment as any other site. So, if you and I launch a site that uses the same "thin affiliate" sites for crosslinking and disregards the other Google rules that Mahalo disregards, will we get a pass or will we get blacklisted?

If Cutts ignore our site, then he's being consistent. However, the author of the post contends that Mahalo is getting away with stuff that is getting other sites penalized. That's a very different thing to the question of how well Mahalo ranks.


I don't think Google would give Mahalo preferential treatment - although sites like TechCrunch sometimes give us the impression that Jason Calcanis, Google Brass, Kevin Rose, etc... are all part of some sort of untouchable Silicon Valley illuminati, that's really not the case.

Google cares more about the quality of its search results than any personal relationship - I don't think they'd hesitate to burn some of Mahalo's domains if they thought it was negatively impacting users.


It seems, from the later demonstrations in the article, that the content is at least somewhat unique on these sites. I think that alone would keep it from being penalized on G.


That's been done though. Pretty much everything they rank for is from self-voting, manipulation that is contrary to the "democratic nature of the web" that Google wants to leverage. There are plenty of queries that demonstrate that, ones that Matt has indeed been shown.


I'm all for stopping Google Spam, and I don't doubt it is happening in this case but I'm not entirely certain that the case in point is Google Spam.

If I am only interested in Star Wars questions a secondary portal is going to be far more interesting to me. I have no problems with the concept.

Yes, the execution itself can be questioned with all of the links but can't blame a guy for trying can you?


I am having a hard time understanding what he's complaining about. Is there supposed to be something wrong with having a general information site (mahalo) and a bunch of specialized sites (starwarsanswers.com, for example) that contain information on specific topics with links back to the general site?


What purpose does this serve? Maybe you could explain to me why someone would do this? What is the benefit of using several different domains instead of subdomains or subfolders on the general site?


"Hey Bob! I found the answer to that Star Wars question we were discussing at lunch yesterday on this great site, generalsite.com/topics/starwars".

A couple days later, Bob wants to go there to check out another Star Wars question. All Bob remembers is "starwars", and fails to find the site.

Compare to "Hey Bob! I found the answer to that Star Wars question we were discussing at lunch yesterday on this great site, starwarsanswers.com".

A couple days later, Bob wants to go there to check out another Star Wars question. Bob has a good chance of remembering "starwarsanswers".

Separate domains for different topics is more friendly to users. That's probably why Google has no problem with it--as the Google guy said, its impact on users they care about.



These are all empty pages. They look characteristic to pages that were not designed to be there or linked to. A bug rather than a feature.

How do you navigate to these pages from the starwarsanswers.com homepage? I don't see any of those listed on the right hand margin. Sure, they may have been exposed because of a deployment error early on that has now been fixed.

Looks to be a basic website error than something attributable to malice.


For the same reason StackExchange breaks up their Q/A sites into niches. It's a better experience for the user. It creates a community around the niche. A brand. It builds momentum faster. What if StackOverflow, MathOverflow and Startups were all part of the same general site? It would get messy. It would lose a lot of its value from being domain specific.

Also it's better for advertisers.

It's a good idea. There's nothing wrong with what Mahalo is doing here.


Would HN be better or worse if it were just a category on Digg?

The specialized domain/community allows users to get into much more detailed discussions about their topics and it helps filter out contributors with just a passing interest. This increases the reliability of the information shared and the quality of the discussions.


Hackaday is not a site that was created in the last year.


I'm sorry, what do you mean?


In the list of urls that are in the search results includes hackaday. It's not the only one. I was actually curious about how some of those sites had the mahalo content on them. Some of them are older than mahalo.


I'm gonna go out on a limb here: Who gives a shit?

Everyone treats Google like this super police agent who's job is to keep our minds free from digital pollution. Yet 97% of Google's income comes from their ad network, which thrives on having publishers and SERP pages to serve up their advertising customer's creatives. Ultimately, I'm sure Google is fine with this Mahalo stuff, because it means they have more ad space to publish on.

What Mahalo has going on is not much different than the major article directory sites like EZine and Associated Content. Search on one of those sites for corn on the cob and I bet you'll find a bunch of similar articles, all plastered with adsense. In fact, I have tools right now that can rewrite an article and allow me submit more corn on the cob articles in under a minute. But you never hear anyone bitch about those sites, because it's User Generated Content. Ultimately, how are they any different?

Honestly, knowing what I know about Black Hat SEO Tactics (I know a good amount), I'd rather see someone building out massive crappy content sites than spamming their backlinks everywhere. At least they aren't polluting everyone else's sites by getting XRumer blasts and Blog Comment Spamming.


Look: either they're going to control spam or they're not. But they can't reasonably own all of Internet search and navigation and be capricious about enforcing their intentionally obscure ranking rules. Google demands the benefit of the doubt from the entire Internet. They set this high standard for themselves, and they profit e n o r m o u s l y from the perception that they live up to it.


Not specific to Mahalo, but I certainly care about search results quality. My recent experience searching out answers to my Android programming questions (doing my first app) have led to frustration as the Google results appear promising, but upon clicking through, I find garbage aggregator websites. This may be a problem on all search engines, but its driving my domain-specific searches to sites like Stack Overflow.

I really wish Google would address this issue.


I agree with ryanhuff. Recently I spent time searching for camera reviews - smaller, entry-level ones where there's less press. About 70% of the sites I got were these dumb contentless websites plastered with camera ads and affliate links (to stores like Amazon and Adorama). That's well and good, but after a couple of tries I began to be really pissed off at the lack of useful information.

I see the same issues while searching for certain bits of news. Earlier today I ran a search for Spain's loss to Switzerland (soccer, in the World Cup). Amongst the top results was a small section dedicated to Google News. Two of the three on display led me to ugly blogs, plastered with ads, that had essentially copied content from other news organizations and reposted them.

My point: somebody searching for actual angel cake recipes will be greatly annoyed every time they come across Mahalo's (content-less) pages. Sure, some of their pages have content, and are probably useful - but at what cost? I wish Google will address this, and soon.


I wish Google will address this, and soon.

I wish anybody would address it, and if they did, I would switch all my searching to them in a heartbeat.

There is no doubt in my mind that search quality has gone downhill in the years since Google first came on air, and these pseudo-sites have a lot to do with it.

A lot of people switched from Altavista (remember them?) to Google overnight - because of better search results. They can do it again.


> I wish anybody would address it, and if they did, I would switch all my searching to them in a heartbeat.

Honest question, because I'm a recent DDG convert (and I'm not exactly an "early adopter" of anything) ... what do you think of its search results for this?

A DDG search of "camera reviews" gave http://www.dpreview.com/reviews/ for its top hit, and the site actually seems pretty useful to me.


Dpreview is a great site. So - yes, I'm going to go check DDG out soon. Thanks, thaumaturgy.

PS: Whether or not DDG returns the specific Dpreview site for a specific camera model is still up in the air, I suppose. But I'll remember DDG as an alternative the next time Google results are getting in my way.


You know, I keep forgetting about those guys.

I have just switched my firefox search to them so I can give them an honest try.


"There is no doubt in my mind that search quality has gone downhill in the years since Google first came on air, and these pseudo-sites have a lot to do with it."

Wait...what? Come on, you don't honestly believe that, do you? Google's algo is a huge step up from the days when you can stuff your meta keywords to get up to the top of the previous SE's.

It's in Google's best interest to find the best content to return for given keywords. Now best to techies implies high quality, unbiased, trusted content. Good to an SEO implies highly optimized, well targeted content. Good to Google is high converting, ad-serving and user-clicking worthy content.

It's not as simple as serving perfect, good quality content for Google. They have an ad network to cater to first, searchers come second.


What point are you actually arguing?

That MFA sites are wrecking Google's search quality or that they're not? You seem to push both points in the one post which is even more confusing given "Wait...what? Come on, you don't honestly believe that, do you?"


They have an ad network to cater to first, searchers come second.

I'd like to think that isn't true. But if it was, my point stands - as a searcher, I am willing to go somewhere where the searchers come first. Search is not a lock-in. Changing search engines is almost painless.


Everyone wants high quality search results, of course. But like I said in my OP, Google is a for profit company that depends on high traffic keyword searches and high volume publishers to maintain their livelihood. It's not that far of a stretch for them to not necessarily be dying for "quality" search results as much as we are. If the lower quality results are making them money, it would be against their own business model to squelch them.

So it really is a question of who this issue is affecting. For us, it's annoying. For Google, it's a profit source.


You are absolutely correct. However, at that point it becomes an ethics issue. If these are financially motivated search results without disclosure then it becomes a whole different ball of wax. If they are deliberately screwing up their relevancy algorithm in the name of a bottom line then people need to know that.


The reason people trust and believe the spam search results that come up and blindly make purchases on those sites is the same reason they can't realize that Google isn't some non-profit public service but is in reality an ad-littered money machine: people are stupid.


I don't think you're right here - Google's stated aim has long been to provide the best search experience for its users. Nothing it has done in the past, empirically, has shown that it wants anything otherwise. Even Cutts's response in the article above exposes such thinking.

To Google, what's best for the company is best for its users. Bing (and now DDG) are hot on their heels, and so it doesn't make sense when you claim that there's a split between what Google wants and what we want, which are better search results, etc etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: