I have no idea what Google does, but expect their parsers to be quite robust. I ...

I have no idea what Google does, but expect their parsers to be quite robust. I tried doing some web scraping, and so many pages are not even valid HTML (most often invalid nested tags, like a table inside span, missing closing tags even when required, random unopened closing tags, ...). Not closing <p> and <td> tags is quite common, I have not seen omitted <html> <head> and <body> yet.