To me, the most important part of HTML5 is the parsing algorithm. Of the eleventy-billion web-pages on the Internet, approximately zero percent of them are properly-validating, standards-compliant HTML. It takes person-years of labour, bug-reports and fixes to create a system that can reliably parse a web-page as the original author expected, and that barrier-to-entry was a pretty significant part of the original IE/Netscape duopoly.
Now, there's a plainly-described parsing algorithm in the HTML5 spec that handles conformant documents correctly, and non-conformant documents sensibly, and third-party implementations like html5lib[1]. Opening up a corpus the size of the Internet to interested individuals and researches has to be a pretty important achievement, I'd think.
Now, there's a plainly-described parsing algorithm in the HTML5 spec that handles conformant documents correctly, and non-conformant documents sensibly, and third-party implementations like html5lib[1]. Opening up a corpus the size of the Internet to interested individuals and researches has to be a pretty important achievement, I'd think.
[1]: http://code.google.com/p/html5lib/