Hacker News new | past | comments | ask | show | jobs | submit login

I'm surprised that this pays attention to '\r' (CR) specifically, and not '\n' (LF), ' ' (space), or '\t' (tab). It doesn't seem like HTML assigns any special meaning to carriage return compared to other whitespace. What is the parser doing with the locations of carriage returns?



I also have no idea why he needs the newline-mask \r. Only <pre> blocks only on windows would need that.


<pre> blocks don't depend on the OS, that would be ridiculous.


Is it something to do with http headers? They have CR LF pairs terminating the lines.


I wondered about that, but that wouldn't be described as parsing HTML, and it shouldn't involve parsing '<' and '&'.


Probably need to do a pass to find all \r chars, check if the next char is \n and if so, discard it. Otherwise convert it to \n.

edit: Yeah, does exactly that: https://chromium-review.googlesource.com/c/chromium/src/+/55...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: