Hacker News new | past | comments | ask | show | jobs | submit login

Sadly, quite true.

"True" - because my current understanding (which Matt_Cutts can elucidate on if he chooses to) is that Google has looked into - but does not currently incorporate - the presence of advertising as a spam signal.

"Sadly" - because my independent research has shown that advertising - most notably the presence of Google AdSense - is a reliable predictive variable of a page being spam.

All things being equal, a page with AdSense blocks on it is far more likely to be spam. Yet as of a few months ago, that does not appear to weigh very heavily into the equation.




I agree but don't you think that from an algorithmic point of view Google would be better looking at what the user wants and what monetization models they prefer to see versus the averages in terms of monetization models on spam sites.

That way their focussing less on removing spammers and more on user quality and thus removing spam.


This really made me think. But I took exception to your comment:

I agree but don't you think that from an algorithmic point of view Google would be better looking at what the user wants and what monetization models they prefer[...]

No. In fact, I am a rather loudmouthed opponent to Google's somewhat clumsy attempts to measure this ala "Quality Score".

In addition to webspam detection and machine learning, I have spent way too much time in marketing (I have a master's degree in marketing, in fact.)

A neat thing I learned along the way was the value of market research.

There are so many nuances in every line of business. Segments, preferences, pricing, even down to minutia (now well studied) such as fonts, gutter widths, copy styles, and so on.

You can learn a lot by combining large amounts of data and well chosen machine learning algos. But even with a few thousand businesses in most categories in a particular country (far less outside of the US), that doesn't give an outsider enough data to truly distinguish what can be a winning formula from a spammy one. This knowledge is hard won through carefully executed experiments and research.

A few years ago I was researching the topic of landing page formulas by category. One example that stuck out most in my mind was mortgages. There were a few tried and true "formulas" that significantly outperformed the rest. Two stuck out:

1) Man, woman, and sometimes child standing on a green lawn in front of home. Arrow pointing down from top left of landing page to mid/lower right positioned form. Form limited to three fields.

Edit: http://imgur.com/90VmB

2) Picture of home/s docked to bottom of lead gen page. No people. Light/white background. Arrow pointing down from top left of landing page to centrally located form.

Edit: http://imgur.com/JkLlH

These sites were incredibly successful. More than a few of them had to contend with quality score issues over the years. Can an algorithm capture nuances such as the ones I mentioned? In theory... they could. But today, they don't. All of Google's QS algorithms to date have been failed attempts and have caused an incredible amount of harm and distrust.

You finished that sentence with:

to see versus the averages in terms of monetization models on spam sites.

I'm not at all sure what this means. Could you explain? Is it even possible to directly model the monetization model of a site without having direct access to their metrics?


Well we know Google can classify what spam sites are.

And assuming they could (despite your arguments against) determine the method of monetization on a site, then simply compare the models they see on spam sites versus another metric which would track the user's reaction to certain types of monetization models.


And assuming they could (despite your arguments against) determine the method of monetization on a site

I can assure you that that they cannot do this accurately. You would be amazed at the scummy business models that openly advertise on Google and are not caught. It would be incredibly difficult to do so as some of them are downright ingenious (one example: free software that updates your drivers.)

It sounds to me that you are positing some type of magical technology that doesn't exist predicated on Google's seeming omniscience. Of course, I am eager to stand corrected...?


I'm not declaring it is a technology they are capable of developing, just wondering, still a bit surprised that with all their top engineers they couldn't (if they wanted to) come up with a solution to this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: