Hacker News new | past | comments | ask | show | jobs | submit login

Example HTML output in a user's profile:

Would you like to contact ${NAME}?

Where ${NAME} is a user supplied parameter (you ask them what their name is)

Let's say I entered my name as: <script>/evil code/</script>

Now, if the output isn't escaped the page reads:

Would you like to contact <script>/evil code/</script>?

You've just injected evil code into the website that will be executed every time my user profile page is visited by another user.

EDIT: Hacker news doesn't properly render javascript comments.




But that script tag would be taken care of in the input sanitation step. You normally remove all hints of HTML tags on input sanitation, which renders output sanitation a moot point.


Unless you application has a static mapping of input -> output that never changes, you can't properly sanitize input for all potential output contexts. The string ';alert(1) is perfectly safe to drop in between HTML tags, but can be very dangerous in JavaScript, but only if it's inside a single-quoted string.

You can try to filter for anything that may be potentially dangerous, but that's going to make a very long list of invalid inputs and once again you're playing whack-a-mole, hoping you correctly sanitize your input for all potential output contexts (unless you go through and re-sanitize all your user data whenever you add a new output context, which is a bit absurd).

From a programming perspective, it's akin to a function not checking that the input it has received is valid (because the caller is always going to do that...).


>>You normally remove all hints of HTML tags on input sanitation

Then what happens when you want to use that input in an excel export? PDF export? CSV file? Text file? How about if you want to use it in an HTML attribute? In a URL? Export the database elsewhere? (Such as a credit card company reporting to the CSAs). You can't assume that your data is going to be inside an HTML page between tags always because that mucks up your data. Data should be able to be used in many different ways because it will be and should not be tied to HTML.


Ok, this is the comment that best explained it to me -- you want to sanitize (escaped, etc, whatever) output because, even if you sanitize all HTML/CSS/JS on input, they might have inserted malicious Excel scripts or PDF exploits, etc, that eventually do get executed in an output context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: