Famous last words. I mean, I get it; it feels that way to me intuitively too. Bu...

helper · on July 1, 2019

I'm not saying that isn't true for some things. I don't think its true here given that this is a nice narrowly scoped library that does a single thing and has well defined semantics.

Adding a cgo dependency is generally something that isn't done lightly by teams. Having a port to go instead of a wrapper around go would be much more likely to see widespread adoption.

zzzcpan · on July 1, 2019

Do you even need to match Google's robots.txt parsing behavior? With less than 1000 lines you can be pretty sure they are not doing it right and are breaking plenty of people's assumptions about it. Either way you have to test it on real world data.

jerf · on July 1, 2019

The point of this code release seems to be to release Google's precise logic. That you may incorporate it into something else is, IMHO, less interesting; we've got plenty of other solutions that "do robots.txt" well enough. If it was just about that, Google's release of this would not be worth anything. The point is so that non-Google parties can see exactly what Google is seeing in your robots.txt.

That's why I'm saying there's no point trying to re-implement this. If you were going to re-implement this, there's probably already a library that will work well enough for you. The value here is solely in being exactly what Google uses; anything that is a "re-implementation" of this code but isn't exactly what Google uses is missing the point.

If they formalize it into a spec, others may then implement the spec, but they can and should do that by implementing the spec, not porting this code.

zzzcpan · on July 1, 2019

As I understand the point about Go complaint is to parse actual real world robots.txt. For which you don't need to behave exactly as this library does.

joshuamorton · on July 1, 2019

> Do you even need to match Google's robots.txt parsing behavior? With less than 1000 lines you can be pretty sure they are not doing it right and are breaking plenty of people's assumptions about it.

This seems like a weird assertion. The specification isn't particularly complex (ignoring the implicit complexities of unicode). There are ~5 keywords and like 3 control characters. Why would you expect to need all that much?

zzzcpan · on July 1, 2019

Very few people follow the specification or even know it exists.

joshuamorton · on July 1, 2019

I'm not talking about the formal specification, but the implicit specification of what people have been using for decades. That only has 5 keywords and a couple control characters. The formal spec is based on that informal spec, which again, isn't that complicated.

To be more direct: what are all of these assumptions you assume google's parser is mishandling?

zzzcpan · on July 1, 2019

Top comment [1] talks about noindex directive for example. Some people definitely expect it to work.

[1] https://news.ycombinator.com/item?id=20326098

themacguffinman · on July 1, 2019

It definitely feels excessively risky for a third party to port it, but Google can either canary it or run both parsers in production and compare results to accurately assess confidence in the port's correctness.