Hacker News new | past | comments | ask | show | jobs | submit login

There have been techniques for dealing with this for a very long time: https://developers.google.com/webmasters/ajax-crawling/docs/...

And this is only an issue for pages where you actually want to expose your content to search engines.




Sigh, Google-only websites.

_escaped_fragment_ is a stupid hack that requires hardest bit of work needed for progressive enhancement, but provides none of the benefit.


_escaped_fragment_ is a bit of a hack, but it also provide enormous benefit. It allows you to gain all the benefits this article is talking about (which I won't repeat here) and still make you site crawlable without duplicating code.

You are right about one thing, it is a pain to setup. Much harder than it sounds at first. After doing it a few times, I finally got sick of it and built http://www.BromBone.com. It's a service that takes generates the html snapshots for site owners and keeps them up to date. I hope it will let site owner keep all the positives and eliminate most of the new negatives.


Not only Google! Bing (and so Yahoo), Yandex and Facebook (only #! for this last) support this protocol too.

And if you use html5 pushstate you could even serve the HTML captures to any bots. Eventually you also could easily add graceful degradation to your site serving snapshots for non js users. As of PoC for SEO4Ajax (http://www.seo4ajax.com), I built this application (http://www.appscharts.me) illustrating this purpose.


Google has been crawling AJAX'd websites bypassing the _escaped_fragment_ mechanism for at least a year now.


Has it really? Do you have an example?

I've never seen an example where Google crawled a page and included content that was loaded via ajax or an external javascript file. I have seen Google index content that is loaded from a script tag included in the html page.



Certainly cool that Google is working on this. I suspect someday it will all work perfectly.

However, from working with customers of BromBone, I know this. People have javascript powered webpages that were not getting indexed by Google. Google would crawl the page, and see no content. The pages were not in Google. After creating prerendered snapshots and serving them when the _escaped_fragment_ parameter was present, the pages showed up in Google.


It's not like other search engines can't step up their game and build the same functionality.


The problem is that now everyone who wants to build a decent crawler/scraper has to do it as well. Used to be any kid in his parents' basement could put wget and sed together and explore the internet. Now Timmy has to bolt a whole damn browser runtime onto his script to make sure he's getting the content text from blog posts and twitter feeds.

Asking Timmy to "step up his game" is heartless, and it'd be unnecessary if we were better citizens.


Maybe Timmy should just have a look at the Google specification! And he would see that he could continue to crawl the web with his favorite tool! He has just to replace #! with an _escaped_fragment_ query parameter, that's not so heartless ;)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: