You have an autogenerated web site that consists of practically nothing other than affiliate links to Amazon. You can make an infinite number of autogenerated pages on your site, e.g.
and each autogenerated page consists of literally hundreds of affiliate links stuffed with keywords for unrelated products.
When Google's webspam team takes action on websites in our websearch index, we can pass that information on to the ads group so they can check for violations. But it's a one-way street: we can send the ads team signals or information about spammers or other violations of our quality guidelines, but the ads team doesn't send information over to the quality/webspam team.
Still, when used as intended (with the query a dollar amount, like your 31.69 example), isn't his site useful? (I've not noticed a similarly straightforward tool for this same purpose at Amazon.com itself.)
The non-dollar-amount queries seem like a bug -- triggering a default of listing up from 0.00, unrelated to the query. If this were an intentional manipulative tactic, wouldn't the outlinks be related to the query?
Still, tons of sites have infinite paths when fed unexpected queries. At least in a cursory look over the filleritem site, there's no obvious self-linking to abusive keyword-based queries -- like your example <?q=teen+sex>.
If he just fixes it so that non-numeric queries get an error page, doesn't his homepage still deserve to be found by people searching for 'filleritem'?
It's not obviously worse than the current top hit for that (and similar) queries, which behaves almost exactly the same, but might not trigger your same heuristics with its AJAX-results loads. (In such cases I also have to wonder if the initial penalty may have been triggered by complaints, or maliciously-created links, from similar competitive sites that thought themselves immune from the same enforcement actions...)
I'm not saying that the issues aren't fixable. But when the site came up for a review, we saw things that violate our quality guidelines: autogenerated pages with hundreds of affiliate links that consist of lots of keywords, and the links/keywords are duplicate content. If he fixes the issues by e.g. blocking out the autogenerated pages, then I expect his homepage will be found in Google. The autogenerated pages are also less useful to someone landing on a page because the pages quickly get stale: prices change, things go in/out of stock, etc.
This is moronic. The pages are "autogenerated" because it's a web app. They have affiliate links because that MAKES TOTAL SENSE given the nature of the app ("show me as many products as possible that cost $X").
The difference is Google uses robots.txt to exclude these autogenerated pages from search engine crawlers. filleritem.com didn't; now they do[1]. Quite simple really.
No one should be banned from Google for failing to fill out robots.txt in the manner Google prefers. This is a horrible abuse of power -- and a competitor -- by Google. "Quite simple really."
I disagree; Google should have strong rules to filter out crap and give me great search results.
In this case we have something of an edge case; the site is useful, but the auto-generated pages cause potential problems for search results.
If you were talking about how Google communicates these issues, or how it goes about resolving these edge cases, then I'd be agreeing. Much to be improved.
But getting rid of the potential issue.. thumbs up.
Of course it should. But a rule that says you're banned because you're too open with search engines is insane. robots.txt should have nothing to do with it.
"We do need a competitor"
I think that's the core issue here. I run a portfolio of sites, did a series of experiments on Google's quality guidelines. Frustratingly, they are nowhere closer to finding the quality content as humans do, but they think they know it and implementing it.
My experimental sites(the contents that I made sure they are crappy in the eyes of humans) with cheap content ($3/100 words) without any editorial control marked as quality sites and perform well in searches. While the sites that we spent thousands of dollars with strict editorial control were punished for some reason.
At one point a scrapper site that picks up one of our sites partial feed with 150 words excerpt out ranked us in the search results. What kind of quality guideline is that, a 750 words post is bad quality while a 150 words excerpt of that is good.
Google's motive may be good, observing it for the past 7 months, I found their approach is wrong.
As you do I use BIng/DuckDuckGO these days. Google has so much market share (around 85% in my observation ), if it goes down to 30% then we don't need to worry about pleasing Google, but focus on pleasing our customers/visitors like normal business do.
I first heard about duckduckgo just last week. So far I really like it. Non-techie wife switched over to it, mainly due to sheer frustration with typical Google search results...
- looking for messages in our webmaster console at google.com/webmasters
- asking in our webmaster forum, also linked to from google.com/webmasters
- doing a reconsideration request (also helps with confirmation of manual action)
- you can talk to search engine reps at various search conferences, e.g. I'll be at PubCon next month.
- calling AdWords support (only for AdWords customers, and this won't give you SEO advice)
We've also been experimenting with 1:1 support over email, by way of a link in our webmaster console. The tension there is finding a solution that scales. We do try to keep an eye on tweets, blog posts, Google+, Hacker News, and similar places around the web, but that's also hard to scale.
Matt,
Would it be possible/financially practically for Google to offer a premium support option that would pay for itself?
Figure out how much it costs to respond to someone, offer that as a flat fee with a clear indication, that paying only gets you a consult on what's wrong with your site, and not any promise that it will get you reinstated. Basically what you did in your comments above.
I would personally make it expensive to make sure people don't try continually working the system by paying for the consult to get inside information about how to rank higher on Google.
I work in a support center, so I definitely recognize that it's expensive and difficult to do well at scale, so I appreciate that you're looking into how to do it!
It would be possible. I think Google has a bit of a historical aversion to it because it reminds us of pay-for-inclusion, in which someone would pay for their web page to be crawled/indexed.
Larry Page was especially critical of pay-for-inclusion because it skews your incentives: if you don't crawl the web well, then people pay you to fix your own shortcomings, which in turn encourages you to have more shortcomings.
I think Google also comes from the perspective of self-service AdWords being successful, so the idea of self-service (free) diagnostics really appeals to us. That's why we've put a lot of effort into our free webmaster tools.
I wouldn't be philosophically opposed to a pay-for-support system if it were done well, but it would be a tricky thing to get right. Normally when we consider it, we end up saying things like "Why don't we just try to make it so that people don't need that option?"
>Larry Page was especially critical of pay-for-inclusion because it skews your incentives: if you don't crawl the web well, then people pay you to fix your own shortcomings, which in turn encourages you to have more shortcomings.
It would be fine provided you explicitly didn't turn it into a profit center, and just made it pay for the people's time.
Torching people's websites and shrouding the reasons why in mystery skews webmasters' incentives - away from creating high quality content and towards figuring out how to circumvent the latest change to the search algo.
>I wouldn't be philosophically opposed to a pay-for-support system if it were done well, but it would be a tricky thing to get right. Normally when we consider it, we end up saying things like "Why don't we just try to make it so that people don't need that option?"
I'm sure you could do that tomorrow if you wanted, but explaining in perfect detail exactly why somebody's (legitimate) website got torched would open the details of your algorithm right up, which would not only open it up to gaming, but would open it up to being copied.
You really could use a team of humans who can explain in human terms (as opposed to algorithmic) exactly why webmasters' sites got torched for violating the spirit of your "high quality content rule". Those same humans could equally feed back data to the search team where in their opinion an algorithm accidentally torched something it probably shouldn't have.
> I wouldn't be philosophically opposed to a pay-for-support system if it were done well, but it would be a tricky thing to get right. Normally when we consider it, we end up saying things like "Why don't we just try to make it so that people don't need that option?"
This is going on the wall here at the office. Thank you.
We don't know each other but I think we know of each other. I'm rather immersed in webspam detection and found this incredibly interesting.
You imply that the challenge is finding a solution that scales. Yet it sounds to me from your response that this site was flagged via manual review. Did I misunderstand?
If I heard you correctly, then is manual review a significant equation in the webspam detection methodology? You guys are boiling the ocean so I find that rather hard to swallow.
The more likely conclusion I can draw is that he had a significant number of (auto-generated) pages on his site flagged as spam and that in turn raised some eyebrows.
BTW, you and your team are doing some amazing work. I wish the paid side was up to the standards you set.
The site was flagged both algorithmically and also escalated to a member of the manual webspam team.
The basic philosophy is to do as much as we can algorithmically, but there will always be a residual of hard cases that computers might not do as well at (e.g. spotting hacked sites and identifying the parts of a site that have been hacked). That's where the manual webspam team really adds a lot of value.
In addition to things like removing sites, the data from the manual webspam team is also used to train the next generations of our algorithms. For example, the hacked site data that our manual team produced not only helped webmasters directly, we also used that data to produce an automatic hacked site detector.
I don't think it is really possible here to have a solution that provides a good outcome for all users and also scales. For every person who does a proper write up and gets some attention places like here there would no doubt be many others who have had the same issues but got no attention.
What would really help is a clear reasoning given for the ban and a set of steps a site can take to get back in. From what I see often web masters are more than willing to make changes, they just don't know which to make. Often to the bans are said to be final, denying someone the chance to make it better.
I guess it is tempting to provide minimal support for organic search as people aren't paying and generally there are plenty of other sites to cover ones absence. One thing that would be good is say a yearly fee which guarantees that you can get one on one support if something happens.
From your reply, I understand that your primary issue is indeed with scaling any manner of support issues. As I understand it, the current thinking at Google is to seek automated solutions over technical to reduce costs and improve issue volume handling. With that in mind, is there a system of triggers that can flag issues for manual review or intervention instead of being handled completely automatically?
I think the majority of issues that have people up in arms aren't really that big, but there are the exceptions. Instead of monitoring G+/Twitter/etc., there could instead be a series of internal checks that look for particular criteria:
1. Age of campaign/ad - was it super old and not really applicable or not even running much budget? It may be worth a lighter hand than a complete ban because it isn't indicative of a trend or pattern on the part of the advertiser.
2. Amount of impressions/clicks/spend over time - what is the overall severity based on actual impact? Are users actively clicking on the ad, returning to the site and continuing to use it? As each click and therefore organic link is logged, it would be good to cross reference them and go "well, this site raises flags but has high return-user rates and decent sentiment in organic results"
3. Traffic vs. Content ratio - does the site have fairly thin content that still gets dramatic numbers of users? Tying in with #2, it basically can help tell if a site is offering something thin on content but highly unique and/or valuable. If people are using it, they may be on to something that the QS algorithm misses.
The one thing I'd love to see is just a little bit more of a verbose notification or warning for the people that get hit. While many only have themselves to blame, many want to make sure they stay on the right side and any chance they have to do so is of great benefit. Then again, it is also important to not give to much information to the black-hatters that just make use of it to get even better at gaming the system.
These are just thoughts, but it has long been on my mind considering the metric ass-ton I've run through AdWords for myself and clients over the years. Hope this helps spark a few ideas :)
"With that in mind, is there a system of triggers that can flag issues for manual review or intervention instead of being handled completely automatically?"
I can't speak for the ads folks at all, unfortunately, but on the websearch side we certainly strive to do as much as we can algorithmically, but also to use our manual cycles effectively. The site in question was flagged algorithmically, but also sent to a member of the manual webspam team who concluded that it violated our quality guidelines. When a reconsideration request came in, it went back to a manual person for review again.
To the extent we can figure out ways to do it without compromising our systems, I think both websearch and ads would like to be as transparent as we can.
Matt, I think the main problem (having seen what it looks and feels like on the receiving end) is that Google's solution and treatment of the site owner is, for lack of a better term, rather violent.
Put yourself in the shoes of someone who is not a scammer and is trying to do a good job. Perhaps someone how is learning the ropes. Getting hit with an unilateral, violent and --for all intents and purposes-- permanent action such as seems common in these cases can be devastating.
I'd like to relate a case that I witnessed that was truly perplexing. It involved about 200 domains that were registered with GoDaddy and placed in their "cash parking" program.
They were there for months with no issues whatsoever. One day, the domain owner realized that this cash parking service was Google AdSense with GoDaddy taking a bite of the minimal action. So...he moved all the domains to a product called "AdSense for Sites" (I don't remember the exact name). This services was marketed by Google as a place to park your domains and earn some money through advertising that Google would automagically place on these domains. Same service that had been on these domains for ages through GoDaddy.
The transfer went well and all domains were accepted. No problems. The domains go "live", if you will, and ads start showing as predicted.
Two days later the account is cancelled and the domains are taken off the program citing "suspicious activity". The irony is that the sites --and their content-- were handled entirely by Google through this "AdSense for Domains" product. No self-clicking activity took place whatsoever. No nefarious activity of any kind. This person was far too busy with real business to go around clicking on ads across 200 domains to make $0.25 at the end of the day.
This was simply a transfer to Google for the same service that Google had been providing through GoDaddy for many months, years in some cases.
There was no recourse. No way to speak to anyone. No way to even try to understand what this "suspicious activity" was all about. The account was banned, closed, done...forever.
Now, here's a person who had plans for legitimate and valuable real sites to be launched on some of these domains later on. The whole experience scared him to a point of simply rejecting the idea of doing anything with Google if he could avoid it.
We had no way to provide any kind of an argument to the contrary because of the violent and totalitarian nature of the cutoff. Who would want to do business under those conditions?
The fact of the matter is that AdWords/AdSense generated revenue could evaporate overnight and with no recourse whatsoever. That's a tough pill to swallow for anyone who is a legitimate entrepreneur looking to build value and make some money or earn a living through their efforts.
Scammers are a different matter. However, you seem to treat both groups with the same hammer which, in my humble opinion, is not right.
I now have to advise anyone we work with that any income that relies on Google for either lead generation or direct income (AdSense) has to be treated as though it could evaporate at any time and for any reason without any real opportunity given to restore it in a timely fashion. Without that caveat on the table I couldn't personally advise anyone to use your products.
In a normal business one would engage with ones vendors in order to resolve issues in mutually beneficial ways. In this case it is a one-way street with violent and severe consequences for your customers and partners. That's what you have to fix.
I take your points, and in my group we've worked hard in the last year to improve our transparency and our appeals process, e.g. see http://googlewebmastercentral.blogspot.com/2011/09/reconside... from a few weeks ago. Likewise, AdWords now has phone support for all AdWords advertisers.
I can't speak for the AdSense (for Domains) team other than to say that when they shut down an account, they think that they have good reason for it. And unfortunately, that's typically a situation where they can't give many details--if the team sees abuse, providing information about how the abuse was flagged would help spammers quite a bit.
I know that Google can seem abrupt sometimes, and I dislike that, but part of the issue is also scale. See https://plus.google.com/117377434815709898403/posts/1hRWj489... that notes that if each Google user had a single 10 minute issue every three years, that would need 20,000+ support people to handle that load. Or consider that there's 200M domain names, and all those webmasters want to talk to Google and ask questions.
Even this link is discouraging: http://www.theatlantic.com/magazine/archive/2011/11/hacked/8... It mentions that several thousand people get their Gmail account hijacked every day. Trying to support all the people who want to interact with Google in a scalable way is a really hard problem.
Support for all Google users would obviously be painful.
I think it would be fair if there was some personal support for people who either spend a lot of money on Adwords (I think there already is?) or make a lot of money on Adsense. These people probably have the highest levels of stress due to "insta-evaporation probability", and no doubt the amount of support people would be several magnitudes less if you used a threshold like that.
Sorry it took a few days to reply to this. Too busy writing code...
With regards to the argument that describes the resources required for customer support even to a small percentage of your audience I can only say this: It's your chosen business model. I am paraphrasing one of my favorite answers when someone complains about their job ("It's your chosen profession").
The point is that Google's business is about doing what it does for a huge number of people. If supporting them is overwhelming either get out of that business or figure out how to do it correctly. I can't really accept the "it's too many people" argument as a valid reason for not doing it well or for applying the "criminal algorithm" to everyone.
There's a side thread there that says that "providing information about how the abuse was flagged would help spammers quite a bit". While true, I, again, find myself not agreeing with the idea of punishing legitimate customers for this reason. I would like to think that the vast majority of Google customers fall under the "legitimate" category. If spammers get better because you are providing detail flagging information you will simply have to get better at detecting and blocking spammers. This would trigger an evolutionary phase which, at some point, should make it very difficult for a spammer to game the system, even with "full source" if you will. Much like security algorithms become more secure if the source is released and is tested with full knowledge of the internals, yours should do the same.
Conversely, honest and legitimate customers would gain the huge benefit of now understanding how to behave or how to do things and why a certain approach might not do well in Google's ecosystem.
One of the most frustrating things I have seen is someone full of drive to launch an internet business only to be shot down by a Google shutdown. And, when no reason or actionable information for the punishment is given this entrepreneur simply had to throw their hands up and give up on that tack. Needless to say, their next attempt ignored Google products completely and they are doing OK. My guess is that they could have done very well and much grief could have been prevented had Google said: "Your site has the following problems ..." and then, rather than cut them off schedule them for a review in, say, thirty days (or variable length based on the seriousness of the issue). That would have been far more civilized and far more conducive to helping your community grow and evolve in the right direction.
New and inexperienced internet entrepreneurs (and some experienced ones) need a way to learn how to behave. What works and what does not. What is acceptable and what isn't. It is only reasonable to assume that they will make many mistakes in their zeal to get an idea off the ground. Penalizing them with a permanent hammer blow to the head is not conducive to growing better netizens. Guiding them with actionable feedback is.
The current process can only be characterized as violent. From the perspective of an honest business person it is tantamount to getting hit with a bullet while walking your dog. The reasons could have been many. Maybe the shooter objected to your walking your dog in front of their home. Had the shooter at least attempted to communicate with the dog-walker it is far more likely that violence could have been averted.
Above all, if your "Do no evil" is sincere, then you have to change the way this works right away. The way this hits honest entrepreneurs is nothing less than pure evil. Again, you take a bullet and you don't know why.
I do appreciate your visibility here in HN. In the past I have simply given up trying to raise these and other points with anyone at Google that might remotely have the ability to at least elevate the conversation internally. I hope you might be that person. I mean all of the above in the vein of constructive criticism. We all want to see the ecosystem becoming more conducive to the exploration of new ideas. Google, at this time, has taken a rather totalitarian position of being the "moral authority", if you will. With that, and so long as you want to be a benevolent dictator, I think, you inherit the responsibility to not cause harm through your actions.
Having said that, until things change I have no choice but to treat your offerings as something that one simply cannot rely on to build a business. The "Google love" can disappear from your site overnight and you'll have no practical way to fix it. That's not a business, that's going to Vegas.
The point of the site is to find items of a particular price that qualify for free shipping on Amazon.com. If you want I will give you access to my google analytics to show you that this is a site people want and use.
I guess I should not let people link to the search results page?
My advice would be to use robots.txt to block out the autogenerated pages; when users search for e.g. a long-tail phrase and then land on a page that's nothing but affiliate links with a lot of keywords, they tend to complain to us.
Rather than "banning" an entire site that seems to have some spammy pages, why not index only the home page, and flag it in the Google Webmaster Tools dashboard?
Also: are any specific notices or warnings given in Google Webmaster Tools, to inform the webmaster and give some clues about the findings and decisions of your team?
Thanks Matt for taking time to explain things here.
A search for hacker news results in a page full of affiliate links, just as the example you gave above. Only difference is that they didn't re-write the URL.
filleritemfinder.com has no robots.txt that I was able to pull up.
So, filleritem.com, a google customer, was blocked, but filleritemfinder.com, doing the same thing, is the number one result.
Further, shouldn't this kind of advice be given to people who appeal being excluded from the index? Or should we all post to Hacker News when it happens to us so that you can come explain directly?
I think %50 of the problem is the arbitrary picking of sites to block (and it's not working, btw[1]) and %50 of it is that google seems uninterested in explaining or advising people when it happens to them.
[1] Been buying gear for a project lately, and so doing a lot of google searches in the form of product-model-number review or product-name review. Overwhelmed with spam sites, and mindless human generated spam sites like dpreview.com, etc.
When I do the search [filler item], one of the top results I see is http://www.makeuseof.com/tag/top-5-amazon-filler-item-finder... which shows five different services to fill this information need, and that also has pictures. I do think that's a pretty helpful result.
I mentioned filleritemfinder.com as a random example (there are many of these services), but filleritemfinder.com appears to use AJAX to keep results on the same page rather than making a new url.
"filleritem.com, a google customer, was blocked, but filleritemfinder.com, doing the same thing, is the number one result."
The filleritemfinder.com site is not doing the same thing, because it's not generating fresh urls for every possible search. But you're not really suggesting that we should treat advertising customers somehow differently in our search results, are you? The webspam team takes action regardless of whether someone is an advertising customer or not.
"shouldn't this kind of advice be given to people who appeal being excluded from the index?"
This advice is available to everyone in our quality guidelines. It sounds like the site owner reached out to the AdWords team, which gave him clear guidance that the site violated the ads policy on bridge pages. It sounds like the site owner also filed a reconsideration request, and we replied to let the site owner know that the reconsideration request was rejected because it was still in violation of our policies. It doesn't look like the site owner stopped by our webmaster support forum, at least that I could see in a quick look. At that point, the site owner did a blog post and submitted the post to Hacker News, where I was happy to reply.
Would it be likely that if the site was to use another monetization plan that it would be less likely to be penalized e.g. advertising instead of affiliate links?
I recently talked for about a minute about the topic of "too much advertising" that sometimes drowns out the content of a page. It was in this question and answer session that we streamed live on YouTube: http://www.youtube.com/watch?v=R7Yv6DzHBvE#t=19m25s . The short answer is that we have been looking more at this. We care when a page creates a low-quality or frustrating user experience, regardless of whether it's because of affiliate links, advertising, or something else.
Thanks Matt, took a look at the video, that answers my original question.
Also is there a preferred monetization model, e.g. do Google think advertisements are more or less harmful to the user experience than affiliate links, sponsored posts, etc?
Obviously across different models you can't just track space taken up, so is there some kind of metric that tracks the rate of content diluting via monetization?
I know you were looking for an answer from Matt but I thought I would offer up my opinion here (as you might have noticed, I love talking about this stuff).
Our models currently suggest that the presence of contextual advertising is a significant predictive factor of webspam.
We use 10-fold bagging and classification trees, so it's not all that easy to generalize. But I pulled one model out at random for fun.
The top predictive factor in this particular model is the probability outcome of the bigrams (word pairs) extracted the visible text on the page. Here are a few significant bigrams:
Next, this model looks for tokens extracted from the URL and particular meta tags from the page. Similar to above, but I believe unigrams only. A few examples follow. Please keep in mind that none of these phrases are used individually... they are each weighted and combined with all other known factors on the page:
offer
review
book
more
Management
into
Web
Library
blog
Joomla
forums
The model then looks at the outdegree of the page (number of unique domains pointed to).
From there, it breaks down into TLD (.biz, .ru, .gov, etc)
The file gets pretty hard to decipher at this point (it's a huge XML file) but contextual advertising is used as a predictive variable throughout.
Just from eyeballing it, it appears to be more or less as significant as the precision and recall rate of high value commercial terms, average word length (western languages only), and visible text length.
Based on what I'm looking at right now, my answer would be that sponsored posts are going to be far more harmful to the user experience than advertising.
Can't answer the rest of your question which I assume relates to the number of ad blocks or amount of space taken up by ads... we don't measure it.
Edit: Just realized that Google will probably delist this page within 24 hours. Should've used a gif for those bigrams. Oh well ;-)
"True" - because my current understanding (which Matt_Cutts can elucidate on if he chooses to) is that Google has looked into - but does not currently incorporate - the presence of advertising as a spam signal.
"Sadly" - because my independent research has shown that advertising - most notably the presence of Google AdSense - is a reliable predictive variable of a page being spam.
All things being equal, a page with AdSense blocks on it is far more likely to be spam. Yet as of a few months ago, that does not appear to weigh very heavily into the equation.
I agree but don't you think that from an algorithmic point of view Google would be better looking at what the user wants and what monetization models they prefer to see versus the averages in terms of monetization models on spam sites.
That way their focussing less on removing spammers and more on user quality and thus removing spam.
This really made me think. But I took exception to your comment:
I agree but don't you think that from an algorithmic point of view Google would be better looking at what the user wants and what monetization models they prefer[...]
No. In fact, I am a rather loudmouthed opponent to Google's somewhat clumsy attempts to measure this ala "Quality Score".
In addition to webspam detection and machine learning, I have spent way too much time in marketing (I have a master's degree in marketing, in fact.)
A neat thing I learned along the way was the value of market research.
There are so many nuances in every line of business. Segments, preferences, pricing, even down to minutia (now well studied) such as fonts, gutter widths, copy styles, and so on.
You can learn a lot by combining large amounts of data and well chosen machine learning algos. But even with a few thousand businesses in most categories in a particular country (far less outside of the US), that doesn't give an outsider enough data to truly distinguish what can be a winning formula from a spammy one. This knowledge is hard won through carefully executed experiments and research.
A few years ago I was researching the topic of landing page formulas by category. One example that stuck out most in my mind was mortgages. There were a few tried and true "formulas" that significantly outperformed the rest. Two stuck out:
1) Man, woman, and sometimes child standing on a green lawn in front of home. Arrow pointing down from top left of landing page to mid/lower right positioned form. Form limited to three fields.
2) Picture of home/s docked to bottom of lead gen page. No people. Light/white background. Arrow pointing down from top left of landing page to centrally located form.
These sites were incredibly successful. More than a few of them had to contend with quality score issues over the years. Can an algorithm capture nuances such as the ones I mentioned? In theory... they could. But today, they don't. All of Google's QS algorithms to date have been failed attempts and have caused an incredible amount of harm and distrust.
You finished that sentence with:
to see versus the averages in terms of monetization models on spam sites.
I'm not at all sure what this means. Could you explain? Is it even possible to directly model the monetization model of a site without having direct access to their metrics?
Well we know Google can classify what spam sites are.
And assuming they could (despite your arguments against) determine the method of monetization on a site, then simply compare the models they see on spam sites versus another metric which would track the user's reaction to certain types of monetization models.
And assuming they could (despite your arguments against) determine the method of monetization on a site
I can assure you that that they cannot do this accurately. You would be amazed at the scummy business models that openly advertise on Google and are not caught. It would be incredibly difficult to do so as some of them are downright ingenious (one example: free software that updates your drivers.)
It sounds to me that you are positing some type of magical technology that doesn't exist predicated on Google's seeming omniscience. Of course, I am eager to stand corrected...?
I'm not declaring it is a technology they are capable of developing, just wondering, still a bit surprised that with all their top engineers they couldn't (if they wanted to) come up with a solution to this?
I think %50 of the problem is the arbitrary picking of sites to block (and it's not working, btw[1]) and %50 of it is that google seems uninterested in explaining or advising people when it happens to them.
This. Assume you work on the webspam team and you have a 92% spam detection rate but a 99.9% accuracy rate on what you do detect.
There are around 40,000,000 active domains in a given month listed on Google. That means 40,000 sites on average are being penalized without reason.
If I'm not mistaken, this type of SEO white-noise is exactly what Google work yo stop. It's just search engine spam for profit; No hands on work, just a script that targets search engine results and creates 'dumb content' to farm visits.
Nope, I can't. I can see how often the site shows up in e.g. our search results, but Analytics is a separate property and they don't send data to the search team either.
Oh c'mon, now you're just submitting links blindly -- that's definitely not an official Amazon offering. That list is generated & curated by Amazon user "fargowifeandmom" and was last updated at the beginning of this year.
That's an Amazon user maintained list of low price items. The purpose of the filler item site is to answer the question: "What can I buy for $0.59?" That list doesn't answer that question.
I have no association with this person and just heard about this. But obviously those are search term result pages and not "autogenerated pages" in the usual sense. If Google is going to apply such a standard uniformly, they'd have to ban all sites that have search functions.
Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities, and all generating pages linked to each other, which also have incoming links from ten billion forum comments across the web where a bot has signed up en masse and posted spam.
Now, perhaps this guy is posting spam to his site in places which makes it valid to declare this a link farm and kill it.
Looking at the site now, being able to search amazon by exact price is a pretty neat function and is totally different from a link farm.
I think killing his site for having a search function is pretty unreasonable.
However, you are a private for-profit company, so you can do as you please obviously.
> Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities
Actually there is no difference between search page result sites like the OP and what you said. Having a URL that ends in .html does not mean there is a static html file on the server. For examples, nextag and pricegrabber have URLs like blah.com/digital/Canon-EOS-7D-SLR-Digital-Body/m739295014.html. Scroll around these sites and you'll see they're anything but static.
Each page is simply a result of a query. Whether the database is local (like nextag) or remote (like Amazon API query like OP) is inconsequential. Personally, I would be VERY happy if Google hid/ignored these kind of search-query sites. I have not once found any of these sites useful. Problem is the large gray area in which these sites operate and what I find useless, someone else might find useful.
Would be easy for the google bot to work this out to, replace the URL part that looks like a dynamic variable with different values and see if it returns a page for them all like a search engine would. Surprised it doesn't appear to do this.
You might say that technically they could have the same back-end, but the web is flexible - just about any URL scheme could resolve to the same back-end. The difference is that the first is honest and the second one goes out of its way to hide what it does.
Yup. Search for [quality guidelines] and they're at http://www.google.com/support/webmasters/bin/answer.py?answe... . The relevant part is "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
Matt, I tried that recently with one of my own sites, and after a few days I got a warning in my Google Webmaster Tools dashboard that the bot could not access my /search URL because it was blocked in my robots.txt, and I should take action to correct it. So I then unlocked my /search URL, which is in violation of the above guideline, but made the error in Google Webmaster Tools go away.
These conflicting messages from Google are very confusing!
In that case, Google Webmaster tools is not actually reporting an error. That's a report to show you what URLs Google tried to crawl but couldn't (due to being blocked) so you can review it and ensure that you are not accidentally blocking URLs that you want to have indexed.
I agree that it's confusing in that the report is in the "crawl errors" section.
(I built Google webmaster tools so this confusion is entirely my fault; but I don't work at Google anymore so sadly I can't fix this.)
Literally Google Webmaster Guidelines say:
"Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
So a couple of questions:
- What is of value for a user?
- Who is determining the value for the users in these cases?
As always, it's not always quite clear on what treatment you should use for search pages!
It's hard to draw the line on what is search and what isn't. If you use a system where the last part of the URL will be treated as a search but only certain ones are ever linked to so they are a product page.
Also if you have some kind of recent searches list and those link.
The same could be said of Google's search engine itself: an autogenerated web site consisting of essentially nothing other than links to other places, many of them paid. But no one can deny that it's a useful site.
I think that is the issue here -- probably the vast majority of such sites are undesirable, but you can't discount them all.
But Google had to spend resources to crawl those other sites, collect the data those sites created, evaluate and order the results excerpting those sites' data based on a user's query, clearly adding value for the user performing a search... Oh, wait.
Well, since this is a site people have a need for, look for and use, that must mean the process involving your "quality guidelines" is wrong, isn't it?
However, you seem to imply there's something wrong with the site.
Let me ask you a question, who decides whether a site is "right" or not? Google or the people that use the web? What was the original purpose of PageRank?
You can make excuses all you want, but fact of the matter is that Google's Search Results quality has been plummeting over the past couple of months and it all stems from Google pretending to better know what people want than the people themselves.
As we have heard lately how Google has manipulated the Yelp search results and others for their benefit, one has to wonder -
since google has aquired some big websites like Zagat, theoretically - if one day they will just start to screw Zagat competitors , what can be actually done about it ,
i mean how can one prove that google is just attacking you and they just dont like you as their competitor ...
its relative - you can always say that your site is not original enough, or you have bad links etc.
It's a fair question, but our quality guidelines have been quite consistent for the last decade or so. Our team takes action without regard to whether a website is an advertiser, partner, or competitor. The long-term loyalty/trust of our users is worth much more than any sort of short-term revenue that a sneaky deal would provide. And Google's culture is such that anyone inside or outside of the company can claim that a particular practice isn't in line with "Don't be evil" and kick off a fair amount of self-scrutiny.
Ultimately though if you don't trust Google, we don't lock your data in, so you can use a different search engine or service just by entering it in the address bar. That's the ultimate check on Google: if we start to act too abusive or "evil" we know that people can desert us. So it's in our enlightened self-interest to try to act in our users' long-term interests.
That's the ultimate check on Google: if we start to act too abusive or "evil" we know that people can desert us. So it's in our enlightened self-interest to try to act in our users' long-term interests.
I hear this line thrown about quite a bit. And while it's true with regular users, it's certainly not true for webmasters or advertisers. Google controls around 67% of all US search share. If an advertiser doesn't play by your rules, they forfeit a significant amount of natural search traffic.
All good for most of your policies but there are some real gray areas. I had a site years ago that got hit with a javascript hack on an obscure page. StopBadware found it within a day or two and suddenly we were blacklisted... virtually all organic traffic disappeared overnight. It took weeks to get the warning lifted and that was only as a result of a significant viral PR campaign (such as this).
Maybe this has been addressed, but there are other areas. Affiliate sites are also penalized quite heavily by Google. It's one thing to take a stand due to the supposed quality of many of these sites (which frankly has little correlation with the presence of affiliate links... most sites suck). It's quite another when Google has a large affiliate advertising practice in house and a significant investment in an affiliate link tracking/cloaking company.
I mention this with all due respect and I hope you take it as constructive criticism. I think the organic side of the house does a great job overall. The paid side is another story IMHO. Part of this is organizational stupidity... I struggle with this every day and I have a much smaller organization.
Like Google, webmasters like us also try to improve quality of their web pages, but sometimes we fail to understand what actually "Quality" means to Google. May be the angle from your see, is not easy for us to catch. It would be better if Google can publish some Dos and Don't list for the Panda.
For a keyword like "dating", only top companies are being given space on the first page. Well, if you see dating.com, they have nothing like dating in their site and also the articles are nothing but crap, but they managed to be on top and survived in Panda, while small fish like us are no where. Having a far better site, useful contents, more than 200000 active users has no use.
You do not expect a "Dating" website to hire 10 content writers and keep posting fresh articles, because those are the "Contents" Google consider as content. All niche can not be put under one "Quality" guideline.
Matt: This is a lie. I have seen you talk about quality at SMX and it is laughable. These quality guidelines you speak of might pertain to others, but they do not pertain to Google's own content.
1. When I search for a restaurant, 99% of the time Google places shows up before Yelp. So you are telling me that Google places always has better content than Yelp? Look at these reviews for Gary Danko - http://maps.google.com/maps/place?num=100&hl=en&biw=...
"love it" and "our favorite restaurant" Does Google Places have much better quality content 99% of the time?
2. Look at http://www.seobook.com/images/google-google-google-google-go.... A. Look at all of Google's sites in the results. B. What are those Youtube videos doing in these results? Every other result in this set has the words hollywood and cauldron right together(except the leaky-cauldron domain result), but since Youtube is a Google property it shows up. It's funny. Do this search on your own and you will see that the top 40 results have "Hollywood Cauldron" in its title, but the ones in Youtube do not.
So your results are all about quality unless it comes down to your own content. Then, it doesn't matter.
By the way, I can't wait for Google Plus to start hogging space in your results too. We all know how important "social" is to you guys. Even more important than "local." So we shall assume the results will be even more crowded google content. If we are lucky, maybe Robert Scoble will mention the words hollywood and cauldron in one of his wordy posts and we will see at the top of the results! And by the way, no one buys that don't be evil thing any more... long gone.
Not acceptable:
"Websites that feature links to other websites while providing minimal or no added functionality or unique content for the user
Added functionality includes, but isn't limited to, searching, sorting, comparing, ranking, and filtering"
The page in question provides the added functionality listed (searching/filtering at least)
The filleritem.com website is a very simple tool to search for items of a certain price on Amazon. The purpose is to help select an item to fill the gap in the shopping cart to qualify for free shipping. There is no reason for Google to ban this site. What we MAY have here is an abuse of power by Google.
I can't judge if you broke the rules or not but not showing up on Google's SERPs is terrifying and a stark reminder on how dependent we've become on Google traffic.
I included an image of my adwords account. The ad had been running for years(albeit infrequently due to very low budget), and I had not touched it since creating it.
I checked after this all happened, and there were several warnings from adwords, but nothing from webmaster tools. I simply ignored the adwords warnings as I was not actively using the account and figured they were just disabling ads. If I had know these were the consequences, I would have just deleted my adwords account.
Unfortunately, deleting an AdWords account won't remove the violation; it may actually make it worse. Google will look back several years and disable ads that haven't run for a very long time. Sometimes they even ban accounts that are inactive or deleted!
It is very important to remember: when dealing with Google AdWords, the definition of a violation changes with time. If you violated the current policies in the past, you may still be banned. You take a great risk running there and any and all warnings you get should be dealt with immediately.
That said, seeing an exclusion from organic based on an AdWords suspension is extremely alarming. That may be exactly what the DoJ is looking for in terms of violations by Google.
That said, seeing an exclusion from organic based on an AdWords suspension is extremely alarming. That may be exactly what the DoJ is looking for in terms of violations by Google.
According to Matt Cutts, it was the exact opposite: he got the organic suspension then the AdWords suspension.
Google should really get some healthy competition.
filleritemfinder.com is reacheable from google, and as far as I can tell, has the same functionality, with better design. Maybe some google guru could comment if a redesign would save you.
Another great example of Google randomly removing sites based on flimsy or nor evidence. The fact they don't give you a chance to plead their case is pretty scary.
My hunch is one of your competitors got the site taken down. I've had a few clients who had their competitors file complaints with Google, or inform them they were using black hat SEO to get on the first page of the SERPS. In one case they were successful, but I had the site back up in less than 24 hour,s so it wasn't a huge deal.
It sounds like it might be an ongoing issue and are not going to put your site back in the SERPS for a while. I feel for you brother.
But often there is a reason why a site vanishes from the Google index. We would have to exclude all possible violations of the Google guidelines, before we could safely claim that this is a great example for random removal by Google.
Edit: In a comment above Matt Cutts from Google gives an explanation.
My best guess is that you'll need to include some additional 'unique content' on the site to get it back in the listings. At the moment the Google crawler will see not a whole lot of text, quite a few Javascript buttons and some links, and will probably flag it as low quality.
It may be a bit of a hassle, but may be worth adding a kind of 'interesting/unique filler items' blog to the site to increase the content to links/js ratio.
If you look around, almost everything you click on the web is set up to make someone money. It's none of Google's business how a website is making their money as long as it's legal and follows FTC policies. Browsers make money every time you use that little search box in the top right corner of Firefox. People have a right to be compensated for their work. If the site provides a good service, I'll use it. If you don't want people to make money off your clicks. It would be best to close your browser and never open it again.
My site got hit by Panda (no manual penalty). After relaunching my site (which I had luckily been working on for over a year), my bounce rate is 30% and my direct link rate is 50%. Still, Google has me by the throat. My traffic is almost dead. Panda doesn't care if you're solving a problem for users and giving them the eye candy they want. It apparently just wants your site to be a "unique, well-written" news site. People come to my site to browse vintage items and find stuff that they didn't know existed in the niche. They usually visit during work hours and late in the evening. Reading ease for these items is on a college level (Which Panda doesn't like). They don't want to read a 1000 word article, they usually just want to hear it, see it, maybe ask question or two about the item and see who may be selling one. I offer that.
Panda's obvious Bayesian nature doesn't understand nuance. It only knows napalm.
Niche markets are the ones getting the crap end of the stick. And it's funny, niche markets used to be what made the web so great! Not anymore. Now it's 100% Googlized Walmart-ization.
And yeah, I'll be using www.filleritem.com when shopping Amazon. I didn't know it existed until I read this post. I don't mind, transparently, giving someone a few pennies for a good service. You do it all the time.
I don't think that it is as simple as that. There is some form of ecosystem on the web: there are the content creators and website providers, there are the searchers, and there are the search engine providers. Each of these has its own interests and guidelines.
Searchers have quality guidelines too, though they are not explicitly published by them on the web. If a search engine delivers low-quality results, the searchers move on to the next search engine. A search engine without searchers is hardly profitable.
Most of the quality guidelines of Google are just common sense, following them will in many cases increase the usability and findability of a site.
If this has happend to you many times, you are likely doing something wrong. I have never had one of my sites banned from Google.
lol, poor Matt, always under fire from everybody who hasn't got the rankings they want. It's simple really, Google try to make it fair for everybody (and granted it doesn't always work perfectly, which is why the continue to make changes, it's evolution) and if some sites give themselves an unfair advantage and are caught, then they're punished. How else could a search engine work? If it were any different then al of our results would be greed pages with no useful info and everybody would stop Googling stuff.
As somebody once quoted, "You can please some of the people, some of the time, but you can't please all of the people, all of the time!"
This is a fair comment but I believe the issues stem more from the process Google follows.
It is obvious that Google cannot communicate exact reasons why a site was penalized as that would help spammers. However, there is nothing that prevents them from adding a step to warn the offending website and give them a heads up before the ban/penalty takes place, along with an explanation of the policy that is/was being violated.
Most of these heads up would go ignored, some would not and yes, it would incur a support cost. However, the number of websites which are significantly penalized isn't onerous... I believe fewer than 1,000 each year?
When a company has become the defacto gateway to the internet, I believe they have a responsibility to webmasters. Google has lost a lot of goodwill over the years because of these seemingly arbitrary penalties... Instituting such a practice would be a worthwhile investment.
option1138, I'm afraid you need to recalibrate your expectations of spam on the web. Blekko made a site called Spam Clock that estimates 1 million spam pages are created every hour: http://www.spamclock.com/ .
There's 200+ million websites out there. 1,000 spam sites would be a spam rate of 5.0 × 10^-6. If you remember the days of Altavista before Google, the actual rate of spam on the web is much higher. Here's one stat: I once heard a search engine rep (not from Google) say that they had to crawl 20 billion pages to find 1 billion non-spam pages.
So yes, we do tackle more than 1,000 websites a year. There's a ton of spam on the web, and Google has to operate on the scale of the web (e.g. in 40 different languages) to tackle all that spam.
You are of course correct. The fault is mine for miscommunicating... I find myself becoming less self-editorial these days when I write on the web and tend to think everyone is on the same page as I am.
I was actually referring to an informal study I did earlier this year. I measured sites which were receiving an average of 50,000 or more visitors from Google US search (organic) per month over a six month period. Then I compared those with a similar set from a subsequent six month period to see which had significantly dropped off in traffic and rankings. The purpose of this was to estimate the number of significant sites which were penalized over that period of time. The final estimate came to about 700 sites/year which were penalized. There are lots of uncontrolled variables here of course... but I was looking for an "order of magnitude" answer simply for curiosity's sake.
The 1 million spam pages created per day were of course excluded from consideration as they never received much traffic from Google in the first place.
So just to clarify my earlier response, I am advocating for a policy that would apply to websites exceeding a certain threshold of organic traffic for a significant period of time.
lol, poor Matt, always under fire from everybody who hasn't got the rankings they want.
This isn't about desired ranking. If your highly useful, popular site gets blacklisted from appearing on Google at all, there better be a damned good reason.
For better or worse Google is the gateway to the internet. If your site isn't on Google it is virtually invisible to most of the world.
Since Google has such an enormous position of power, they are really going to have to do better with customer service (and by customer service, I mean providing a way to talk to a human on their end).
I understand it's expensive, but if it doesn't get any better, eventually some government somewhere (that has jurisdiction) is going to regulate them.
Just out of curiosity, if only 30% of your results are from organic search, why did your hits drop by more than 30% after you stopped getting organic search hits?
University should offer courses on Google policy. A thorough understanding of such policies will be the primary determining factor for your success on the web.
I disagree with you there general-marty, I think the primary factor determining your success on the web is a decent business plan and marketing strategy. Without that you rely on free traffic from Google. Any company that puts all its eggs in one basket (e.g. Google Organic) is skating on thin ice.
They usually change policy, or enforce existing ones before an algo update that doesn't prove any connection in you case and it's not like they are dependent on your ad expenditure.
Amazon probably complained to Google and Google despite already having the rules on the books decided to enforce it on this one site, while also keeping many other low traffic clone sites offering the same service alone. Call me paranoid, or cynical but I've seen much worse from Google on the AdWords side of their offerings.
If that were the case Amazon would have a much simpler solution by stopping the affiliate account altogether, instead of having to ask Google to stop some of the traffic.
This has been an amazing thread but there really is a pattern that is hard to follow for webmasters. In the case of the original site couldn't Google have just de-indexed the "extra" pages but left the the index page alone since it is indeed useful?
Also, how did his site get canned but others like sportslinkup.com, which is an ebay affiliate spam site cloaked as a link directory, have over 7 million indexed pages and 8 million indexed images (all hosted by ebay, not the sports site) and it gets over half a million Google visitors per month (according to compete.com)?!? I think the sports site is even scraping Google for keywords, it's full of examples of what not to do, but it's been sailing along for years with Google approval, or at least no automated detection.
The fine line between sites getting canned and sites getting MASSIVE traffic for essentially the same thing is very confusing.
http://www.filleritem.com/index.html?q=hacker+news
http://www.filleritem.com/index.html?q=31.69
http://www.filleritem.com/index.html?q=teen+sex
and each autogenerated page consists of literally hundreds of affiliate links stuffed with keywords for unrelated products.
When Google's webspam team takes action on websites in our websearch index, we can pass that information on to the ads group so they can check for violations. But it's a one-way street: we can send the ads team signals or information about spammers or other violations of our quality guidelines, but the ads team doesn't send information over to the quality/webspam team.