This is such an important lesson, but it's a difficult one to convince people of - telling people NOT to sanitize their input goes against so much existing thinking and teaching about web application security.
It's worth emphasizing that there's still plenty of scope for sensible input validation. If a field is a number, or one of a known list of items (US States for example) then obviously you should reject invalid data.
But... most web apps end up with some level of free-form text. A comment on Hacker News. A user's bio field. A feedback form.
Filtering those is where things go wrong. You don't want to accidentally create a web development discussion forum where people can't talk about HTML
because it gets stripped out of their comments!
An anecdote: I haven't checked recently, but at least until 2015 there was a Belgian govt web site for the register of companies that would yield 0 companies in the city of Aalter.
The reason is not hard to find. In fear of SQL injection they would filter out all SQL keywords; in this case "alter".
You beat me to it. I couldn't remember the exact term. I just knew the "pattern" – and that it wasn't something that would have made the issue as obvious as eg. the pee-ass-word.
I have never sanitized any input, full respect to whatever text there is.
It's simple - html/xml/javascript/json/url is not text. You render it with whatever tools you have to - and that tools happen not to be concat. You render xml - use DOM, xslt, etc. html - same story, use whatever templating engine you wish. json - use your own model and render it to json. SQL - prepared statements.
It's worth emphasizing that there's still plenty of scope for sensible input validation. If a field is a number, or one of a known list of items (US States for example) then obviously you should reject invalid data.
But... most web apps end up with some level of free-form text. A comment on Hacker News. A user's bio field. A feedback form.
Filtering those is where things go wrong. You don't want to accidentally create a web development discussion forum where people can't talk about HTML because it gets stripped out of their comments!