Hacker News new | past | comments | ask | show | jobs | submit login
Google penalizes original content site because of scrapers (seobook.com)
136 points by raphaelb on May 12, 2011 | hide | past | favorite | 70 comments



Google's latest algorithmic changes seem to be either horribly wrong or not fully baked.

Our site went from ranking #8 for our target search "artist websites" to PAGE 440 of the results. Our listing for "how to sell art" just went away. There's been nothing but original content on our site for 10 years, and, among artists, we're considered one of the best sources of art marketing information, given that I owned an art gallery for 20 years and all of our other writers are professional artists. (and yet Google still has ehow ranked for "how to sell art".....yeah, I'm sure ehow knows a whole lot more than we do).

I'm saying this not to vent, but to concur with Aaron and others that there is something wrong. It may not hurt Google's business....the latest algorithm's probably improve adsense revenue, and that's fine, it's their business. Fortunately I've read HN long enough to know not to build my entire company on top of someone else's platform and, as much as it upsets me, we don't need Google. Bing (and Yahoo) have us at #3 for that same search ("artist websites"). We don't depend on search engines as our only source of marketing leads....nor even our main source.

The most frustrating thing is not even that it happens, but that they do not communicate. There's no way to find out WHAT happened. Nothing in Webmaster Tools. No way to pay for search support. I read the Google blog post with guidelines on how to structure content after Panda and, none of that applied to us, at least not that I could tell.

They say "just focus on users" and that's what we do, but I guess, that's BS.

I, frankly, think Google's gotten to big for their britches and as unlikely as it is to happen, I hope Bing, Blekko and yes, DuckDuckGo take some market share away. Windows is better for having OSX and Linux to compete with. Maybe Google would be a bit less "evil" with more competition too.

Sorry for the bit of the rant, I'm usually only a lurker here, but this article of Aaron's really hit close to home this week. At least there are a couple of relevant points buried in my little rant....I hope ;-)


>the latest algorithm's probably improve adsense revenue, and that's fine, it's their business

There's no reason to jump to that conclusion. We don't make ranking changes to improve adsense revenue, and don't use it as a metric to evaluate ranking algorithms. We don't even have a mechanism to collect the data.


Thank you for replying, thank you for clarifying that. My question is since we've followed Google guidelines for 10 years, never intentionally engaged in any practices against Google's recommendations, are already doing the suggestions outlined in the most recent Google blog post on this subject, yet have, for some reason lost all of our rankings completely and are being outranked by known content farms and sites with adsense, what should we do?

I realize you can't guarantee any results, but honestly, we really have absolutely no clue what to do next. Every change we've made to our site in the past 2 years has been to try to do everything we read that Google wants from official Google channels. We truly just don't know what else to do.

Edit to add: we did 3 months ago change from a very long domain name to a short one (faso.com) because it is shorter and we own a federal trademark on the word "FASO" and thought that Google wanted to place more emphasis on brands - our rankings stayed the same and even improved....until Monday.


Just like the anti-content farm blog posts previously, I'm half hoping that the internet community turns its attention to the problem of content scrapers (in the hope that Google take more action against the problem). I am a fan of Google - and the biggest source of traffic to my websites is Google search traffic - although the issue of scrapers does seem to be growing (even despite the attempted anti-scraper Google algorithm update earlier in the year).

A couple of days ago I did some searching online and found that a fair number of websites had copied some of my articles in their entirety. And sadly, a lot of these 'websites' were actually Google Blogger (Blogspot) blogs. And whilst some of these copied articles weren't appearing in Google search (I guess since the entire site contained copied/scraped content, thus giving them a Google SERP penalty?), some of the copied articles were appearing in the SERPs. And a couple of these websites even had Google AdSense on them.

So there was the crazy situation whereby my content had been stolen/scraped illegally, and put on a Google Blogger blog with Google Ads on it, and (in some cases) that blog then received traffic from Google Search. Hrmph.

In the interest of balance, I will point out that I filed Google DMCA requests after finding these scraped articles, and Google did promptly reply (a non-automated reply around 30 hours later, which is quick considering how many DMCAs Google must get).

They only removed the individual blog post (and not the blogs overall, even though they were clearly spam blogs), but nonetheless I am happy with Google's quick response.

I just wish that content scraping isn't (in some cases) a profitable endeavor..


The headline is factually inaccurate. It looks like a mistake that he was denied access to Adwords due to original content, but that has no connection to his ranking in search. It looks like the adwords representative may not have access to fine-grained enough tools to assess the site accurately, which is an organizational failing, but there's no bad intent there. In some sense it reflects how disconnected search and ads are from each other that they're using crude tools to assess original content.

Google cares a great deal about putting the original source of a piece of content first. If we're doing that incorrectly, it's because we screwed up, not because that's how things are designed. It's a hard problem and an area we are still working on intensely. It would be great if someone involved could post the queries on which we are screwing up so we can debug what's causing it.


You don't seem to have read the article.

The search:

["a superb app for iPad and iPhone that lets you quickly and easily transfer photos and videos between iOS devices and computers – has been updated this week, to Version 2.3."]

returned results from content scrapers above the original content.

For me, the original content doesn't even show up in search results, even though it's in Google's index:

http://ipadinsight.com/ipad-apps/photo-transfer-app-updateda...

Further, Google wouldn't let the site owner buy AdWords to drive traffic to their site.

Google owns both search and AdWords. This makes the headline here:

  Google penalizes original content site because of scrapers
accurate, as far as I'm concerned.


The results for that query are horrible, but his complaint about losing traffic certainly isn't due to his ranking on 30-word quoted queries. I'm hoping to get an example of a normal query where he's losing out to scrapers so we can debug what's going on.

The headline implies that he's penalized in search due to scrapers, which isn't happening.


> The headline implies that he's penalized in search due to scrapers, which isn't happening.

In this case the phrase "Google penalizes" = "Google denies access to adwords". The word penalize doesn't always refer to site penalties in the Google search index.


The reason it's on top of HN is almost certainly because that's the conclusion everyone jumped to. That seems to be how the word "penalize" is consistently used with respect to Google.


I agree, I am a bit surprised this is on the frontpage. What is ironic though, is than now when I search for the 30 word phrase, two scraper sites appear with the article scraped from seobook.com. ...nicely backdated by 37 seconds.


I am not sure what you mean by the headline is inaccurate. It would be more clear if it said, "Google penalizes an original content site because of scrapers", but the meaning is the same. You seem to agree that an original content site can get penalized if Google thinks the scrapers are the original source of the content.

I personally don't see what is news about this. It has been known for a long time that newer or less frequently updated sites can get beaten by scrapers, though it usually resolves itself later unless the scraper is a decently reputable source like the Huffington Post.


He was denied access to adwords due to a mistaken impression that he's a scraper. He did not lose search traffic due to that.

The headline conflates the two things, so is inaccurate.

I've never seen an actual instance of a site with original content being penalized because it is getting scraped (though it is theoretically possible.) Our systems for this are robust and quite conservative. When a scraper outranks the original site it's because we weren't aggressive enough in demoting the scraper, or don't have enough data about it, not because the original was penalized.


...and how is being denied access to AdWords over mistaken identity not penalization? If you get thrown into jail because of mistaken identity, that doesn't make the situation any fairer or just for you. You're still being punished.


Here is one that ranks the original content bottom of the first page. The first two results are exact replicas of each other and a copy of original article. The query is for the title of his most recent post.

https://encrypted.google.com/search?q=Quick+Look+%E2%80%93+W...


I admit I struggled to come up with a headline for it and my choice of wording was not very good. However, he did lose search traffic as a scraper site took over his top ranking for certain searches.

It appears he has more details about specific queries on this page: http://www.google.com/support/forum/p/AdWords/thread?tid=0bb...


A good headline would have been "Google rejects an original content site from Adwords due to scrapers." He may have a quite legitimate complaint, but it doesn't seem to involve search.


I sent the site owner an email to see if he can comment on more general search queries he may have lost ranking on. If it is just that long quoted one I agree that your suggested headline would be better.


Is Google doing anything to solve the content duplication problem?

It seems like a solvable problem. Why don't they let webmasters implement some kind of time-based cryptographic signature?

It seems so lame that his problem has gone on so long, especially when there must be some kind of technical solution.

For real businesses spending a few days implementing some authentication protocol would not be particularly burdensome.


I'm guessing a lot of it has to do with determining where the content originated. Google will crawl sites at different times of the day. If site B stole content from site A but Google came to site B first, how are they supposed to know that the content originated on site A? I'm sure there's certain things they look at like PR, trust signals, etc. to determine if the content on site B could have been copied, but it can't possibly be perfect. The time based signature sounds good in theory but implementing it across billions of pages would be very difficult. Not only that, but what if site A didn't create a signature but site B did when taking site A's content? I don't think there's an easy solution to this problem.


I don't know if it's an easy fixer but it's certainly not difficult to eliminate 90% of scrapers. My sites get scraped all the time and if you look at these scrapper sites, they usually are not scrapping just one site. To simplify the issue, let's look a small data set:

  Site 1:
    * Content ABC
    * Content DEF
    * Content GHI

  Site 2:
    * Content JKL
    * Content MNO
    * Content PQR
    
  Site 3:
    * Content STU
    * Content VWX
    * Content YZ0

  Site 4:
    * Content ABC
    * Content DEF
    * Content MNO
    * Content PQR
    * Content STU
Which of these is a scrapper?


When you add site 5 in the mix that has:

* Content ABC

* Content DEF

It makes it much harder to identify.

If you are a blackhat SEO, then you keep track of the last time google indexes you (including the anonymous crawlers, which is tricky), and backdate scraped content to just ahead of the time you were last indexed. Then you can send a complaint through googles tools about the site that wrote the original content.

Blackhats make content duplication a really challenging problem, and having a complaint form isn't going to solve much. The blackhats can take advantage of that as well.


Google does eliminate more than 90% of scrapers. It's just really easy to create scrapers; they can outnumber the original content by more than 1000-1. So you need many, many systems to remove scrapers.


The scraper is the one where the content appeared last.


How do you perform that measurement using practically bounded computing and networking resources?


It seems to me that there should be some way for Google to see the original creators content before it goes live to the internet; some sort of pre-publishing protocol. Essentially tell the Googlebot "This is the content I am planning to publish; have a look so you will know who created it first."


(I used to work on Google search quality.) In general, the problem with such systems is that scraper-writers use them much more diligently than original-content-writers.


Has this been integrated into Blogspot?

The trick would be getting the feature hooked into the major publishing tools.


A human can figure it out. That's the real problem. You have a guy send in a support ticket / request, and then you pawn them off on someone who really cannot do anything about it. It really seems like support is just there to deflect rather than solve. Heck, look at the actual forum thread the article links to http://www.google.com/support/forum/p/AdWords/thread?tid=0bb...

It just seems like a purely algorithmic solution doesn't really scale and some human intervention is really necessary.


Humans don't scale either. That's why google doesn't use humans.

Sounds like the easy solution here is to have people submit their content to some google authentication system at the same time they submit to their blog. Problem solved.


They don't scale but Google can afford to hire enough to make the problem go away if they made it a focus. But not going to happen any time soon when they don't believe in customer service.


It doesn't need to do it between billion of pages. As soon as they detect duplicates, there is a high chance that it will be done again. So monitoring the two or more site with same content is doable but not very efficient. Ideally, the author subject to scrapping should be able to reference its page to google to prove anteriority. The scraper won't be able to do that.


Google has a lot of people working on improving search quality, and solving content duplication is a big part of that. It just isn't a trivial problem, because there are far more SEO black hats in the world working to outwit the Google algorithms.


Yes, the time-base signature seems like part of the solution. But what amaze me is if a cryptographic signature is available some scamers will generate articles and sign them to gain reputation.


It would help with dup detection if Google had a "push" mechanism for websites to upload their content to Google.


Not necessarily. That would only help if everyone used it. Otherwise it'd be another tool for the scrapers to claim legitimacy.


Not if you told them first - if popular CMSs/blog platforms notified Google when you clicked published they'd know just before the page even existed on the internet. Other content producers would rush to integrate it as well.

I believe they already ping Google to inform them of new content, it would be a natural extension of that.


So how about - content creator pushes content to Google, and list of people they think will scrape it. If sufficiently large sufficiently close copy of content shows up on scraper page they're penalized (adwords, search, banned from Blogger whatever).

Edit: I guess that only works if they're fixed IPs.


You don't need the list of potential scrapers, it could apply to anyone who published the same content after.

The issue though is if you write a blog and don't know to do this, the scrapers do know and nick your content and inform Google of "their" new article. At that point you're now seen as a scraper for an article you actually wrote because the real scraper plays the game better.

And that's the real problem - the scrapers play the game better than the rest of us because they can commit larger amounts of time to it (not having to waste time writing actual content and all).


What I was thinking is if you have a list of scrapers supplied and a publish notification, Google could then index the alleged scraper sites directly (instead of waiting for a general index to find them), and then at some interval later. For true scraper, this should result in no match followed by match. Then Google can establish order of publishing, assuming Googles push-notification-to-index time is shorter than the scraper-polling-interval. And it seems people often know of their regular scrapers.


They might be able to claim it but at least it'll Also give the content creator a tool to deny that claim.


Here's a simple idea that could fix a lot of this problem. Copy Twitters idea of verified accounts.

Google could issue verified sites. If someone copied a verified site the content would be automatically removed from the index.

Now they would have to hire some staffers to research the applications and handle complaints. But this is beginning to cost them far more than adding a few more staffers.


While certainly a potential fix, Google seems very averse to non-algorithmic solutions (see their issues with customer support).


I'm sure this is a gross oversimplification, but Google is in the business of monetizing people's interest in content it doesn't create. Who will it perceive is the better partner for that monetization, the scraper who understands how to apply Google's tools to maximize monetization, or the original content author?


Umm... long term, no original content authors = no original content. No content = search volume goes down. Search volume down = holy crap for Google. I'm just saying.


Sorry I didn't reply earlier. I don't think that's necessarily the case. The question here is Google's ranking of original content. If they rank it lower than search sites, that would only drive original content off the net if the primary motivation of the content authors is to get traffic through Google.

As long as there are chumps like me that write content for our own personal reasons, or who drive traffic through sites like HN and twitter referrals, scrapers can piggyback on my work and Google can do whatever it likes. It won't affect my motivation to create content.


Umm, wasn't Google supposed to "Don't be evil" and care about their search result quality about all else? I find your defense of and apathy to hypocrisy disturbing.

And oh, your post pushed a hot button of mine: comments by people who think they're being incredibly insightful by saying "the world is not fair" in different ways. On HN, I think we can assume that people are adults and understand such things. Sorry about the flame.


I think you're putting words in my pen when you describe my comment as defending Google. The article describes Google's behaviour. My comment describes a mpotivation for that behaviour. If you tilt your head sideways, you will see that my comment is helping to build a case against them, not a defence of them.

For an analogy, consider a murder trial. Someone stands up and says, "The accused stands to profit from the victim's death." Isn't that suggestion more likely to come from the prosecution than from the defense?


"Umm, wasn't Google supposed to "Don't be evil" and care about their search result quality about all else?"

Mmmm... no, unfortunately Google is supposed to make money, and they make money mainly from adsense. I'm not prepared to accept that random algo modifications having big impact on their revenue are done without concerns.

And when you are at the same time the same company driving people to web sites (search) and getting profits from ads showed in web pages (adsense) something bad can happen as it is not a free market setup.


>I'm not prepared to accept that random algo modifications having big impact on their revenue are done without concerns.

The effect on adsense revenue isn't even measured let alone used. The people who make decisions about changes to ranking algorithms don't even see the data on adsense revenue because we don't even have a system to collect it for ranking changes.

I work in search quality, and there are many metrics I have to collect to launch changes. None of them involve ads.


I'm glad to hear that, and I trust you. If it is this way I expect to find less spam engines or copied content in the next months when I search on google :)


I'm the guy whose site this is all about - iPadInsight.com. I only used that specific search in the forum thread because I discovered that was the single point on which the Adwords reviewer had judged that my site didn't produce original content. I sent back the results of the same search showing that links were either from legit aggregator sites (like alltop.com) linking back to my original review, or from a number of scraper sites that rip my content. Even when the review was overturned for Adwords I was told it would leave a black mark against my site because it had already been marked that way. Great system.

Soon after all that hassle, my site suddenly lost 60% of its traffic. From what I can gather, mine is one of the quality sites that produce original content that has been mistakenly penalized in the Panda / Farmer / Whichever Other updates.

Among the reasons I say my site is a quality site that produces original content, in accordance with this post at the Google Webmaster Central Blog (http://googlewebmastercentral.blogspot.com/2011/05/more-guid...) and with all the logic I can apply to the subject, are:

-- The site contains over 1,700 posts published in the last 15 months. I wrote around 1,550 of them myself. The remainder are written by three other occasional authors, who are colleagues and friends of mine. There's no 'outsourcing' of content creation or anything of that ilk.

-- I spend tons of hours every day researching and writing the content that appears on my site. Every app review on the site is 100% original content (http://ipadinsight.com/category/ipad-app-reviews), as are all posts published.

-- I do consider myself an expert on the subject my site covers - the iPad. I have been writing app reviews,accessory reviews tips, how-to posts on it ever since it launched. I've appeared on ABC World News and numerous radio programs as an iPad and Apple expert. I've been a contributing author for iPhone and iPad Life magazine (printed publication) since their debut issue - writing expert tips and tricks posts, buyer's guide articles, and more. I'm listed in Robert Scoble's Twitter list of best tech people to follow. Blue-chip app publishers and accessory vendors approach me to write about their products. The Daily (the first iPad only newspaper) contacted me before their app even hit the App Store, as do many leading publishers. I've been a beta tester for many top iOS apps for years. I participate regularly at several leading iPad and iOS forums. I'm not saying any of this to boast, but in an effort to establish that I'm a blogger who is enormously passionate about the subject I cover, and someone who is respected in the area (mobile tech) that I write on.

-- My site is a long-standing member of the Got-OATS group of sites (http://www.gotoats.org/) that seek to uphold and promote the highest ethics in app reviews. We never accept money for reviews or coverage, and add disclosure statements to our reviews to indicate whether we received a promo code for an app reviewed, or a sample unit of an accessory reviewed.

-- I spend a lot of time on every single post, on researching, on testing apps and whatever else I'm covering, on ensuring that spelling and grammar are spot-on, on providing good screencaps of apps in action, and every other detail I can think of.

-- I use a great cache-ing plugin on my site and do my best, with help from a few Wordpress experts, to keep the site fast and clean.

-- I currently have close to 4,500 RSS subscribers and over 3,000 Twitter followers for the site's account.

-- Before my recent sudden traffic fall off a cliff due to Panda, my site had around 80-100,000 unique visitors per month.

As for search results and scraper sites, I am still often seeing horrendous spam sites ranking above me for recent posts. Here is just one quick example on a recent post I wrote about iPad rivals, where several scraper sites rank above mine, including one (ipads101.com) which I have submitted 3 spam reports on via Google Webmaster over the last two months, and had zero response:

http://www.google.com/search?sourceid=chrome&ie=UTF-8...

I run a good site. I pour hours of effort and my heart and soul into it. And I think it has been very wrongly assessed by whichever new algorithm.


But do you produce original research? Or like the airprint article do you scour the big blogs for information and distill that on your blog? Do you give the complete picture? Do you often link out to your sources?

You say you wrote 100 articles a month for over a year. Were these all quality posts? Even a site like smashingmagazine with a very general subject and a team of writers can't output 3 articles a day. If you were to put the time and effort of 3 articles into 1 article, does that help in the number of links the piece attract and the times it is shared?

I understand that you spend time each day to update your blog, but if that is not original research, do you believe to be rewarded more than "amateur" status? How many people do you think write about Justin Bieber? How many of those blogs should rank in the top 100 for Justin Bieber terms?

Next up: Could you become an affiliate of Apple for your Apple product site? Really unlikely as your domain name violates their TOS and trademark. Why should Google allow a site to advertise for Apple products, when Apple wouldn't allow that same site to do the same? Apple doesn't consider you an expert, they consider you a legal hassle. And are you really an Apple expert? Or do you just own some Apple products and keep up-to-date?

Also your shareasale footerlink should disqualify you for Adsense, as dofollowing an non-editorial affiliate link is against the Quality Guidelines.


Yes I do original research all the time, every day. I beta test apps, I assess pre-release versions of apps from many leading publishers, I explore and test new functionality, or getting more out of existing functionality, I jailbreak my devices at times to see what more they're capable of, and so on. The AirPrint subject is a good example. I did not just 'scour the big blogs'. Before AirPrint was ever released, when many people believed it was impossible to print from the iPad, I did an article that proved quite popular that explained that you absolutely could print from the iPad, via several good 3rd party apps. When AirPrint was released along with iOS 4.2 I wrote several pieces criticizing it for being lame and worse than half-assed compared to what Apple had touted it to be. I wrote articles on how to use AirPrint, how to get more and better features via 3rd party apps, and how the list of supported printers has grown at a snail's pace. And yes, I also posted on the tweak that eventually came out to extend AirPrint's own functionality. Not because I was following the big blogs, or anyone else, not because I jumped on the AirPrint bandwagon late in the day, but because I had been covering the subject extensively all along.

My 100 articles a month and were they all quality posts: Yes they were and are. Is every single one of them a new in-depth app review or how-to post, No. I sometimes write shorter pieces offering my opinion on a major bit of iPad news or similar. I sometimes write lighter pieces, about anything from iPad-related humor to how my big goofy Labrador is my work colleague. I think that provides a nice mix of content for readers.

If Apple announces the date that a new iPad is going to be released, or that they are finally going to support subscription plans for iPad magazines and newspapers, that's of interest to me and to my readers, so I mix in a small percentage of original news coverage as well. The fact that many sites may cover the same story does not mean that mine is not original or professionally written content. When there is a major news story, it is covered by The New York Times, The Washington Post, and other quality broadsheet papers. When there are major political and economic stories, they are all covered by Time, by Newsweek, by The Economist, and so on. Does that mean that none of these titles are producing original or professional content? When any of these big titles pick up stories from the AP, Reuters, and smaller local news outlets, does that mean they are suddenly low-quality titles? I'd say absolutely not. And when a small %age of my posts cover major news, it doesn't mean I'm a low quality site either. I always have my own take any iPad related news or rumors that I choose to write on. I don't 'borrow' anybody else's take - I write my own thoughts on the very few news or rumor items that interest me.

On the how many people write about Justin Bieber question, there are not a lot of sites that focus solely on the iPad, as mine does. And very few indeed that do so and are quality sites - the majority are scrapers that just continually ripoff content from sites like mine. Mine is one of the very few sites that covers only the iPad and does so with 100% original content.

Could I become an affiliate for Apple? I have no desire to be one. That's not at all what my site is about and it's not at all relevant to this discussion. "Why should Google allow a site to advertise for Apple products?" I don't do that at all. I am often critical of Apple, of App Store policies, of iPad related decisions etc. When I post App Store links for apps that I cover, I don't even use the affiliate links that many sites do, I just use straight-up links.

Am I really an Apple expert? Yes. And particularly an iPad and iOS expert. Again, I've written for the leading print title (iPhone and iPad Life Magazine) in this space since its debut issue, I've appeared on ABC World News and various radio programs and podcasts as an iPad and iPhone expert, I've been working with mobile devices since back in the days of the Palm Pilot, many leading publishers come to me to beta test and assess pre-release builds of their apps, Robert Scoble lists me among his most influential tech writers. I've also worked in tech support, network management, and IT consulting for over 15 years. So yes, I'm very confident in saying I am an expert on the very tightly focused subject I cover.

On the footer links, I honestly didn't even realize what kind of links they were. I have used the Thesis theme for years and Rackspace for hosting for years. I think very highly of both of them, so I was happy having a small link to them. I've made exactly zero dollars via those links. I took them down when I saw it mentioned here that they are not a good idea.


I get your site #1 for that search. I suspect the scrapers are ranking above your site only when you search due to personalization (you've clicked on them in the past.) Try adding &pws=0 to your search to see if they still rank higher and I'll debug further.

This is a quite common problem actually when webmasters have reason to hate certain sites. They click on those sites in search results a lot and often see them promoted above their own, even though they're the only one who sees them promoted.


Searching for "ipad rivals the year of the clueless" with or without &pws=0 I get 5 scraper sights before ipadinsight.com/category/ipad-rivals-2:

1. usedipadforsale.net/ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless.htm

2. www.ipads101.com/ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless/

3. ipads2nd.com/.../ipad-rivals-this-year-the-year-of-the-copycats-or-the- clueless/

4. catsmakemebats.micasaessucasagermania.com/.../ipad-rivals-this-year-the- year-of-the-copycats-or-the-clueless/

5. ipad.thedailyglobe.com/.../ipad-rivals-this-year-the-year-of-the-copycats-or- the-clueless/

6. ipadinsight.com/category/ipad-rivals-2

That looks pretty bad to me.


For that query I get ipadinsight #1 in bing. The only other result from that list that shows up is the "usedipadforsale.net" as #5.


I get ipadinsight.com in 7th position, behind 5 scrapers too (searching from the UK).


The only one of them I've clicked on before is ipads101.com, because they so flagrantly rip my content every single day. None of the others are sites I've clicked on before.

My main point here is that my site doesn't match any of the criteria for getting slapped by Panda. It's a site that has its content ripped off a ton, and every time I report the offending spam sites to Google there is no response at all.


Unfortunately panda is something I can't help with, other than passing the site along as an example of where it may not be giving good results.

Based on what I've been able to debug so far though, I'm pretty certain that those scraper sites aren't hurting your ranking, as annoying as they are. I'll keep digging though.

Also, it looks like you switched domain names recently from http://justanotheripadblog.com to http://ipadinsight.com/? Are all the pages 301-ing correctly?


Thank you, much appreciated! Is there a process for asking to have a site re-reviewed following Panda? Other than the one epically long forum thread (http://www.google.com/support/forum/p/Webmasters/thread?tid=...)?


Someone wrote about a honey pot trap for robots. They made a new directory and added it to the robots.txt as do-not-enter. All IPs that hit the directory where banned.

If the sites that scrap your content do it from servers with fixed IPs, then you could go hunting and try to block them.


I believe Panda also looks at originality, content freshness, document authority, trust factors, usability factors, site authority etc.

I'd agree that the denying of Adsense was (obviously?) wrong, if this is all there is to the picture. As for looking at ipad information on the internet, after a manual inspection of that site, I, as a user that cares for quality and relevance, don't need any of the results on justanotheripadblog.com in my top 100.

The order of relevance, discovery and editorial quality seems to flow from:

http://reviews.cnet.com/8301-19512_7-20023976-233.html

>

http://ipadinsight.com/ipad-tips-tricks/how-to-make-airprint...

>

http://www.info4arab.com/how-to-make-airprint-work-with-just...

With a lot of intermediate steps.

iPadInsight.com is not a cheap scraper site, but is it a site that does original research, beyond rehashing what is hot in the industry? I think Panda might have judged correctly in not assigning higher rankings to this site.

The site seems to have had a canonical problem with the comments in 2010, inflating the site size in index to * 10. The depth of these comments is usually not much more than: "Great! Interesting Article! Love this! Thank you!" and might just as well have been auto-generated.

Also the shareasale footerlink "Thesis Theme for WordPress" alone, might disqualify you for running Adsense, as you dofollow an affiliate link (and this is not allowed in the webmaster quality guidelines).

The trademark inside domain name might be another issue.


The same thing happened to us as well. When I did an event last month our page rank dropped from 4 to 2 despite getting 12+ new links, which I strongly suspect is because all of the bloggers who link to our conference site are having their blogs duplicated by content farms. Overall our page rank has dropped from 7 to 2 in little over a year, despite having 10x more inbound links. (And zero SEO or anything else that would violate Google's best practices.)

I did ask a Google employee, who said it was because we weren't using canonical tags, but this doesn't make much sense and fixing this doesn't seem to have done anything to improve the situation.


The article above is about his problems getting an AdWords campaign approved, not about his ranking in search results.


Those problems are one and the same. He can't get an adwords campaign approved for the same reason his search results rank poorly, he's being scraped.


In the original post, he used a very specific query and his site was listed 2nd. Naturally the scrapers were also all over that results page, since they were all exact matches to the text. I didn't see anything about him complaining about poor search ranking (http://www.google.com/support/forum/p/AdWords/thread?tid=0bb...)


I don't understand the incentives for google to deny AdWords to someone. AdWords is what ultimately gives Google the profits, not AdSense. Google already has plenty of places for you to see ads, and AdWords is the product that actually takes money acquired through other industries and funnels it into Google.

I really don't think it's being done of malicious intent. I think it's very likely that it's just being done because of negligence, since service/app reviews happen to be frequently scraped.


The complaint is indeed about negligence, not malice.

Google puts some effort into making sure that (i) people aren't annoyed by ads directing them to content they didn't want, and (ii) the top ads are good quality and likely to have a good click-through rate. Ads that have a high quality score cost less to place for that reason, ads with a low quality score don't get placed, or get placed on the second page of ads.


Easy, if anyone was allowed to put any ad on adwords the quality would quickly go down. Google has made a lot of money on having ads that users perceive to be useful. If they lose this perception and people ignore the ads then they stand to lose big.


Hell with Facebook’s replacements; it’s Google who needs a replacement! (Note: I am currently thinking about possible alternatives for PageRank.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: