Hacker News new | past | comments | ask | show | jobs | submit login
Help Google just nerfed us out of the blue (safeshepherd.com)
90 points by useflyer on May 17, 2012 | hide | past | favorite | 77 comments



Hi

I work at Google helping webmasters like this.

As far as I can tell, there are a couple of interacting issues we're seeing on the site that can be causing what you're seeing. It's a bit technical, but it's easy for you to implement a fix.

Firstly, our algorithms recently have been picking one of the following URLs as the canonical URL for the homepage:

http://www.safeshepherd.com/ https://www.safeshepherd.com/ https://safeshepherd.com/

For example, I see that the non-HTTPS pages redirect to the HTTPS pages (e.g. http://www.safeshepherd.com/ to https://www.safeshepherd.com/), but the non-www pages do not redirect to the www pages (both https://www.safeshepherd.com/ and https://safeshepherd.com/ return content). When we find the same content on multiple URLs like this, our algorithms pick one representative URL, and over the past few weeks the choice has been changing. As of 3 days ago, the current choice is https://safeshepherd.com/ .

As it stands, our algorithms are trying to figure out the right canonical URL format, but it's difficult in this kind of situation. You can help by redirecting to your preferred URL format (say https://www.safeshepherd.com/*), and our systems will pick up this signal, and that will be reflected in the search results and reporting in Webmaster Tools.

Secondly, Webmaster Tools treats these as different sites. For example, you would need to verify and check the statistics of both https://www.safeshepherd.com/ and https://safeshepherd.com/ (as well as the HTTP versions) as they're separate sites. It may be that you're checking (say) the stats for http://www.safeshepherd.com/ but if our algos have picked the https://www.safeshepherd.com/ URLs as canonical, the search queries of the former will suddenly be closer to zero but the latter will be a more accurate reflection of the site's traffic.

Hope this helps, Pierre


Pierre, thanks for stopping by to confirm these issues. I often see sites be inconsistent between www and non-www, or between http and https. It looks like safeshepherd.com was doing both. More consistent redirects and adding rel=canonical should definitely help us figure out which url you prefer.

Just to confirm what I said elsewhere, this site doesn't have any manual spam actions or anything like that. It's just a matter of Google trying to pick the correct canonical url when you have a lot of different (www, non-www, http, https) urls you're showing. If you make things more consistent, I think Google will stabilize on your preferred url pretty quickly.


Matt & Pierre, thanks for your thoughts and sorry that this ended up being a rookie mistake. I have a rel canonical good to go. Thanks again for your time.


No worries at all--glad it turned out to be easily fixable.

And now I know what "nerfed" means. :)


This is why I love HackerNews. Guy asks for SEO help, frigg'n Matt Cutts answers!


FYI - Another fairly common usage of "nerfed" these days (especially in the gaming community) refers to something being toned-down. E.g. if many players are complaining about a character ability being too powerful, the developers may consider "nerfing" that character.


In Internet Marketing, we refer to being delisted as getting "sandboxed"


As you're replying to Google's Web Spam lead, I'm sure he's familiar with the terms for obliterating your splog farms. :)


I have also seen a messed up (404 erroring) robots.txt file cause a site to get deindexed out of the blue


That's a misconception. A 404 on robots.txt will not have any effect on crawling as it's treated the same as an empty robots.txt file allowing all crawling.

But it's different for 5xx HTTP errors for the robots.txt file. As Googlebot is currently configured, it will halt all crawling of the site if the site’s robots.txt file returns a 5xx status code for robots.txt. This crawling block will continue until Googlebot sees an acceptable status code for robots.txt fetches (HTTP 200 or 404).


interesting that needs to go into the webmaster guidelines I was not seeing 500's or having it reported in GWT as errors on the site that it happened to


It doesn't belong in the guidelines but it is described in the relevant section of the Help Center:

http://support.google.com/webmasters/bin/answer.py?hl=en&...

In summary: If for any reason we cannot reach the robots.txt due to an error (e.g a firewall blocking Googlebot or a 5xx error code when fetching) Googlebot stops its crawling and it's reported in Webmaster Tools as a crawl error. That Help Center article above is about the error message shown in Webmaster Tools.

Given that you said you did not see errors being reported, That suggests there was something else going on. If you need more help, our forums are a great place to ask.


Chears I am of on leave for a week ill get this put into our best practice guide for our devs and IS guys when I am back.

Funny thing was I tried resubmitting the main page in GWT an all the traffic came back almost instantly.


I have a question: Why are these considered different sites by your algos? If we were talking ".com" vs ".net", OK, I get it. But this is about "www.domain.com" vs "domain.com" and their http and https variants. I'm sure there's something I don't understand.

Would "http:www.apple.com", "https:www.apple.com", "http:apple.com" and "https:apple.com" be treated by Google as four completely different and separate sites also to be ranked in isolation of each other? Why?


"www" might be a very special case, but there are lots of times where "this.domain.com" is completely unrelated to "that.domain.com".

Many sites, for example give users their own "name.whatever.com" subdomain. In those cases treating the sites as the same doesn't make any sense.


True, but I think that is just the special case being asked about. Would it make sense to have an exception for “www”?


That's true, but the www/no-www/http/https cases are very likely to refer to exactly the same site. Besides, Google's algos, through crawling, should know that it is the same site. It seems unfair to punish the site for this.


Do we still have to do the redirects if we have this setup properly in webmaster tools? That is, under configuration --> settings --? preferred domain.

Second, in webmaster tools, should we always have the www and non-www setup so we can do the "change of address". For example, if www.mysite.com is my preferred URL, do I need to also make sure mysite.com is in webmaster tools and change the address to go to www.mysite.com?


My SOP is to make sure first that all non-canonical versions of the home page (the page with the most value, usually) are redirected, then make sure you set your preferred address in WMT, then add the rel=canonical tag to capture all the possible versions of the home page that you can't think of. As a side note I also noticed (the hard way) that Google treats capitalization in URIs the same way as these examples so www.Example.com is treated differently (at least for link value) than www.example.com. Basically if there is a single different character in the URI then it's considered a different URI.


Wow! Kudos to Google putting someone out in the wild answering these kinds of questions. Is there a normal place to ask for web/index help or does HN serve that purpose?


The best places are our forums ( http://productforums.google.com/forum/m/#!forum/webmasters ) and our regular webmaster office hours ( http://sites.google.com/site/webmasterhelpforum/en/office-ho... ) where a webmaster support Googler (me and a few others) have a hangout on Google+ that anyone can join and ask about their site. We do them in many languages and time zones to cover the world as much as possible.

Obviously we can't be everywhere and we can't answer every question, but we try as much as possible to help when we can.


http://productforums.google.com/forum/#!forum/webmasters

Those folks would have spotted these issues pretty quickly.


Why doesn't google look at identical url paths on https, http, www, and no-www variants of the url, and if they look similar then use some default google policy to select which of them is canonical?

For example, if http://mydomain.com/path and https://www.mydomain.com/path have 95% content correlation and repeated requests to http://mydomain.com/path have 95% content correlation, and the server headers look the same, why would it not be safe to decide those are duplicates of a single canonical url?

It's not safe to merge www.domain1.com and www.domain2.com. it's not safe to merge subdomain.domain.com and www.domain.com. However, for the limited cases of www and no-www, https and http, if they look similar, I think it's harmful not to treat them as the same site. You can't expect every website owner to be aware of this issue.

If it's a matter of not being able to be 100% sure, is there a single site that cares about google ranking that runs different sites on different combinations of www/no-www and https/http, but has similar content that would confuse a simple heuristic looking at page similarity? In what sort of circumstance could that happen other than with placeholder pages?

GWT allows selecting a preference between www and no-www, but I don't see a preference between https and http. I think Google should add a notice that using GWT to select between www and no-www is deprecated and the recommended way to handle www, no-www, http, and https selection is to use 301 redirects or rel="canonical" tags.


I don't mean to derail the conversation, but I just noticed today that one of the sites I work with (aptcorner.com) has dropped off of the first page of results for the company name (it was previously at around position 4–5. Is it not enough that I've set the preferred domain in webmaster tools? Will setting a rel=canonical tag make a difference?


And this is why many people like me refuse to allow https connections to their sites... Who wants to confuse Googlebot?


Yeah... this is actually a surprising (at least to me; certainly irritating) liability of allowing users to access your website via https..


Hopefully I can help some. :)

I see a couple of problems with your redirects:

1) http://www.MelonCard.com uses a 302 redirect to the https version. You have inbound links to that domain pointing to the http version, so the 302 negates the following 301.

2) Furthermore, your 301's seem to be implemented incorrectly. It works for me in Safari/Chrome, but if I use curl on the command line, or any crawling software I see an infinite redirect:

www.MelonCard.com/?from=shadow&from=shadow&from=shadow.....

This is likely interrupting Googlebot's crawl too. Certainly worth a fix!

A couple of other bits to note:

1) The redirect in Chrome sends me to:

www.safeshepherd.com/?from=shadow

which Google believes could be different to:

www.safeshepherd.com/

You should add a rel=canonical meta tag to the page to help Google out.

2) Your internal links point to safeshepherd.com without the www., but your MelonCard redirect redirects to the www. version. You should probably make this consistent, and also 301 one version to the other.

This latter points may seem picky, but Google can be troublesome with this.

Hope this helps! :)


Since the OP is using nginx, here is how you properly do a 301 redirect straight from nginx.conf (in the 'server' section for meloncard.com):

        rewrite  ^/(.*)$  http://safeshepherd.com/$1  permanent;


That's actually more complicated than it needs to be, see http://wiki.nginx.org/Pitfalls#Taxing_Rewrites

Bad:

    rewrite ^/(.*)$ http://domain.com/$1 permanent;
Good:

    rewrite ^ http://domain.com$request_uri? permanent;
Better:

    return 301 http://domain.com$request_uri;


Awesome - thanks!


Nice tip. Perhaps with https instead of http in the redirect URL, though? :)


True! It does look like Google will accept https as a canonical homepage link. They do it for Twitter.


> We just completed the transition [301-ing from MelonCard.com to SafeShepherd.com] yesterday

Clearly, your definition of 'out of the blue' is not the same as mine... good luck figuring out if the problem was on your end or theirs (hitting the front page of HN usually helps).

Edit: clarified transition per comment below


Yeah, this hardly got a mention, despite being the most obvious and influential factor. It's certainly not out of blue. Now, as to why Google failed to connect the dots for them, who knows.


Yup. It took us several months to get our page rank back. We did everything "by the book" too.

Also: did you request a change of address in Webmaster tools? You have to control both domains, and request a "Change of Address" from old to new.


I should clarify. We've been ranking for "Safe Shepherd" and SafeShepherd.com. A 301-redirect from MelonCard.com shouldn't bring DOWN our rank for "Safe Shepherd" on SafeShepherd.com. Unless I'm missing how the algorithm works..


If MelonCard was the primary source of "google juice" for SafeShepherd.com, then the change would certainly affect your rank.


301 redirects are supposed to transfer the link juice so that couldn't have been the issue.


I hit this issue when I owned a .us site. After my site started making money, I arranged to buy the .com.

When I made the switch, I did everything by the book (my background is in SEO) including properly 301 redirecting the .us to the .com and alerting Google in Webmaster Tools.

The site completely disappeared from the Google search results--same symptoms as yours, with site: returning valid info but the site not showing up in search queries at all for the site's name.

This went on for 3 weeks.

Finally, after 22 days, it came back up in the rankings. Where previously it had been #4 for its keyword, it came back at #1.

No explanation from Google or in Webmaster Tools.

(If I may do a brief plug--our rank tracker will help you see if your site is ranking somewhere lower than the first few pages now, and will send you daily email updates so you'll know right away when you come back: http://whooshtraffic.com/rank-tracker/ )

Anyway, I'd have to say that this is par for the course for Google. It will likely come back in a few days or weeks. Time to play the waiting game, and develop some good links from your blog to your main site!


>Finally, after 22 days, it came back up in the rankings

That is EXACTLY what is supossed to happen. Google runs a SE with over 5 billion URLs - if anyone tells you, or told you, that a transition was seemless and instantaneous, they lied.

To explain why, Google has this flow:

1. Crawl the old site - once it finds 301 redirects, it kills the old site and has no data on the new site yet.

2. Google crawls the new site, and starts to apply the old sites criteria - this is NOT, repeat NOT instantaneous.

3. All the "pre processed" signals are applied from the old to the new site.

22 days is a pretty short time to see things come back better, and I'd thank my lucky stars!


Not my site but this may help you: http://www.sxw.org.uk/computing/robots/check.html

------------ Results for https://safeshepherd.com/robots.txt Error at line number 1:

User-Agent: * Capitalization. Field names are case sensitive - the User-agent field should be written with that capitalization Error at line number 2:

Allow: / No User Agent. An Allow line must have a User-agent line before it. As records are delimited by newlines, there cannot be new-lines between the User-agent and Allow lines. Warning at line number 2:

Allow: / Allow is not widely supported. The Allow field was a late addition to the robots.txt standard, and is not currently widely supported by crawlers. You should consider alternative ways of constructing your robots.txt file

Error at line number 3:

Disallow: /login/auth No User Agent. A Disallow line must have a User-agent line before it. As records are delimited by newlines, there cannot be newlines between the User-agent and Disallow lines. Error at line number 4:

Disallow: /users No User Agent. A Disallow line must have a User-agent line before it. As records are delimited by newlines, there cannot be newlines between the User-agent and Disallow lines. Error at line number 5:

Disallow: /signin No User Agent. A Disallow line must have a User-agent line before it. As records are delimited by newlines, there cannot be newlines between the User-agent and Disallow lines. Error at line number 6:

Disallow: /upgrade/submit No User Agent. A Disallow line must have a User-agent line before it. As records are delimited by newlines, there cannot be newlines between the User-agent and Disallow lines.


Also, your redirection is weird.. You have a 302 on http://meloncard.com, that forwards to https://meloncard.com which then 301 redirects to safeshepherd.com

If I were moving a domain, I would have 301'd http://meloncard.com to safeshepherd.com right out of the gate.

Too add to everyone's anecdotal experiences, I 301'd a domain last week and the new domain only took a couple of days to show up in search. I was a happy customer.


You're still number one when I search for you. Proof: http://i.imgur.com/HnTCn.png


Number two in the UK - Number one is a film called 'Safe' showing tonight at a cinema in 'Sheperds Bush, London'.

Number one in AU, BR, JP,...


Number three in Canada...(your blog is number one and your article on the BeenVerified scam is two).


Same for me using ze German Google


Same for me in France.


<evangelizing> Still number one on Duck Duck Go as well. http://duckduckgo.com/?q=safe%20shepherd </evangelizing>


Google should consider auctioning support time. Make it into a profit center rather than a cost center. No guarantee of a successful outcome, but it takes the situation out of the press.


"Build for users, and don't worry about search engines".

I guess situations like this shows the absurdity and lunacy of people who say things like that.

But I do see your blog as the first result. As for why your main landing page isn't #1, I suggest you just stay calm, and wait a few days. Google has a knack for bouncing results, especially these past few weeks with the Penguin update and all. And if you just recently did the 301-redirect, those things take time to get sorted out. (Since MelonCard was still a relatively young brand, a 301-redirect was harmless, but if you had an old brand/domain, a 301-redirect would be SEO suicide)

You're still in the index. Your blog is ranking for your brand. You haven't been nerfed out of the blue, so stop worrying and be patient.


Great to see Google responding here. Looks like the confusing 301/302 redirections are the problem. I love HN for this type of community stuff.


Agreed. Was very impressive seeing Googlers respond, and respond helpfully which is not always easy in tech esp to outside parties, and have forums and regular G+ hangouts so it isn't a one time thing. Not to mention other people with similar problems point out the delay in reapplying the signal, etc..


I get their blog as the first search result, but at least their blog links to their home page. I'm surprised by how often I see company blogs that don't link to their main site or make to hunt around the side bar for a link.


I just signed up for your service. Hope it helped. Your service will certainly help me.

The infographic about beenverified was alarming: (https://www.safeshepherd.com/beenverified) and I have actually tried to get my stuff off of there before and failed.

I'm not even worried about my own information. What I'm worried about is the accuracy of it! How can I trust a third party who doesn't even know me to provide an accurate background check when it references PRIVATE databases that I can't even verify the integrity of? My biggest worry is that some day I will be screwed out of an opportunity because of a company like this that simply provides inaccurate data because they confused me with some other John Doe.

It would be one thing to provide public records as a service, but been verified seriously ticks me off. To think they have the authority to 'verify' people irks me.

Then again, it may be a good thing, because I sure as hell wouldn't want to work with anyone stupid enough to use a service like beenverified.



I'm in the US, also (Rochester, NY)


For me, all but one result on the first page of "Safe Shepherd" is them.


Are you in the US? We were told that changes roll out in the US first and then elsewhere. A friend in Japan said he sees as as #1 in Japan, and nowhere from a US search


Yes.

To be fair, their landing page is nowhere to be found. The first result is their blog. The second result is safeshepherd.com/beenverified. The rest are their blog or the press talking about them. But all in all, they've hardly disappeared from teh intarwebs :)


Ok, so I thought, let me check on ours. Boom. Went from #1 when searching for our company name to not even present in the results.

What gives? We had perfect placement, built not through any SEO, but just a lot of people talking about us - had the proper config and everything, recently had a site overhaul that was handled properly, using webmaster tools, and everything was updated in google indexes within a day.


I have to add my kudos to Matt and Pierre for taking the time to respond to these threads. They could clearly just refer users to the Google Webmaster forums rather than answering in detail here. For all the flack Google takes, I have to say that I'm encouraged that two senior engineers take the time to answer questions in a 3rd party discussion forum.


Sadly, stories like this are getting more and more common.


The black box is getting more complex...this is a problem I hate to see ourselves and any other startups have...


I'd be interested to know how Google protects trusty-worthy sites from the SEO equivalent of joe-jobbing.


If nothing else works, remove the domain from google webmaster tools - completely. I had two sites that lost top serp positioning and once i removed the domains from google webmaster tools, they went right back up to first 3 results on page 1.

Google webmaster tools is more or less designed to help google, not you.


Still number 1 in the UK too


You are number one on duckduckgo.Use social marketing if you really need the traffic.


Buy ad words or quit complaining about it. You act like you are entitled to search ranking. You aren't.


sounds like you got slapped?


I think you've been hit by the "penguin" update, which is probably the biggest SEO embarassment from Google in a long time.

Unlike other updates which attempted to remove "web spam" from the search results by tweaking some of the parameters, this update (according to many in the SEO community) is an active attempt to catch people doing black hat or over SEO optimization. Unfortunately many legit sites have gotten caught in its net.

You can read about it in the WSJ among other places:

http://online.wsj.com/article/SB1000142405270230350550457740...


No, safeshepherd.com has not been hit with the Penguin algorithm, nor with any kind of manual spam action.

If you look at the first website mentioned in that WSJ article, they even admit that they paid for hundreds of backlinks to their site, which is a violation of our quality guidelines.


IMO any SEO besides writing good content should be penalized and that what's Penguin does so personally I think it's great.


I've been following the Penguin update and this doesn't sound like the cause. The OP said they did not do any crappy link building.


I've even gone in to Alexa to start verifying all of our back-links. All of our links are from blogs / news / twitter , et al. That said, it doesn't mean Penguin didn't hurt us. Algorithms have corner cases.


Use Open Site Explorer to start on your bad link search:

http://www.opensiteexplorer.org/domains?site=safeshepherd.co...


OpenSE and Majestic SEO databases are not really going to find the spammy links that penguin is targeting:

http://www.branded3.com/seo/penguin-update-renders-seomoz-ma...


It's come out that for a few thousand, you can buy black hat's who will do crappy link building for your competitors. So, possible scenario: Safe Shepard stepped on somebody's toes and they retaliated by hiring (or creating) a ton of crappy back links to the website. Unlikely, but possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: