We were contacted by a bug hunter once stating he has access to our database and...

HHad3 · 2024-07-14T11:31:29 1720956689

Not storing raw HTML might be a last resort to avoid these kinds of bugs in other software, but a good amount of things need to go wrong for them to happen in the first place. The issue is that your data is rendered outside of your software and known-good environment, so all bets are off.

You could as well have triggered a bug in some LaTeX engine that happened to be configured to allow arbitrary shell command execution.

Another strategy to defend against these issue you describe would be to not let developers access raw production data in the first place, but always anonymize it first, or remove internet access from machines accessing production data. (How sensitive is the data in your users table? Could a developer's test script accidentally send emails to your live users?)

orangepanda · 2024-07-14T09:56:08 1720950968

If you dont know where it will be rendered, how do you know what to sanitize for?

KronisLV · 2024-07-14T12:15:24 1720959324

Sanitize it for all XSS. Or better yet, avoid something like HTML or anything that can contain executable instructions, when all you need instead is a regular markup language.

I’ve seen HTML be used for user rich text input and it was an absolute mess, with old data that wasn’t properly sanitized, the sanitization library itself getting outdated, someone putting potentially unsafe content in from another system and so on, whereas people would sometimes bikeshead and worry about breaking old style classes or display of the data across multiple systems instead of addressing just how serious the potential risks are.

Not all of the details here might be accurate, but honestly just use Markdown or something like that for user input, disallow HTML altogether and never use the raw input.

jefftk · 2024-07-14T11:29:47 1720956587

That sounds like a pretty serious bug in the sql development software!

kazinator · 2024-07-14T12:24:58 1720959898

If you don't know where it will be rendered, you have no idea what escape syntax to use. If the field can end up in JSON, CSV, SQL, HTML, ... are you going to try escape it for all of them at once?

This idea of escaping input worse than sanitizing input (what the article says not to do).