I've been thinking for a while that the larger spidering community (google/bots)...

s_fischer · on Nov 5, 2020

Wouldn't separating the indexable content from the actual DOM representation be a huge opportunity for abuse? It seems like it would make it much easier to game SEO or mislead users for even more malicious reasons.

thomasfromcdnjs · on Nov 5, 2020

Yeah I agree, don't doubt it will have problems.

Search engines have been dealing with black hat SEO for a while now. I'm a bit persuaded that Google knows whats up.

As in, I think it's just one of those inevitable technical challenges of castle defence.

Build better walls, they build better trebuchets, ad infinitum.

sdfhbdf · on Nov 5, 2020

That's interesting you mention.

I think it's already happening in a little different form with Structured Data [1] and schema.org and it's using for example JSON-LD [2].

But a separate endpoint might also be an evolution. Right now this could be the sitemaps and robots.txt as an entrypoint for spiders.

Although the one drawback that comes to my mind with a separate endpoint is that it'd have to be kept in sync with the normal page and that might vastly enlarge the costs for maintaining it. So keeping it close to the DOM seems to be reasonable

[1]: https://developers.google.com/search/docs/guides/intro-struc...

[2]: https://json-ld.org

thomasfromcdnjs · on Nov 5, 2020

Sounds like we come from the same experience.

I think for anyone who wants to pioneer this idea, the costs will be noticeable, but that's what bandwagons and the youth are for =D

Ultimately I think there will be a reduction in costs (bits, process, maintenance)

To steal from React lingo, UI represented as state, that is more or less what any web page is.

Wrapping it in a meta layer like the DOM just adds overhead which I think can be optimized.

I'm undecided but still entertaining the idea.