You should still use input validation for many other reasons besides log injecti...

chmod775 · on Dec 11, 2021

"You cannot use this character in your name because it trips up our logging library".

rank0 · on Dec 11, 2021

Are you seriously arguing that you don’t think input validation is required for untrusted input?

There’s a myriad of security vulnerabilities based off failing to escape special characters. Use output encoding if you need usernames to have special chars. There’s really no excuse to not sanitize input it’s a basic security principle.

chmod775 · on Dec 12, 2021

My original comment is a joke to security minded people, because if pentesters/crooks see you're handling your inputs in that manner, they know to keep looking.

Nobody in their right mind will sanitize (and specifically not encode), on receipt, something like a name to be safe for every logging library, query language, or output in HTML/terminal/etc their backend may use. Such an undertaking is even provably impossible for combinations where one component requires escape sequences that again would need to be escaped for something else - in a circular manner. And that's just one thing that's objectively wrong about the idea.

Beyond the rules applicable to the specific type of input (for example: it should be a correctly UTF-8 encoded string with a maximum length), you treat all user inputs as opaque binary/character/whatever blobs within your system. That means your system certainly never parses such a blob looking for 'magic characters'. And only once you're going to do absolutely anything with it you apply sanitation as required for the target - and you make sure it always happens automatically as a matter of process. For example: your SQL client library will automatically build queries in a safe manner, your HTML template library will escape all provided strings by default, and your logging library will not look for magic characters in format string arguments - that's what the damn format string itself is for!

By all means use whitelists of characters for user-provided inputs (if you know what you're doing and are not going to prevent 2 billion people from using your software because you just deleted their alphabets). But don't even try to accommodate your random assortment of current and future backend technology at that point.

rank0 · on Dec 12, 2021

> Nobody in their right mind will sanitize (and specifically not encode), on receipt, something like a name to be safe for every logging library, query language, or output in HTML/terminal/etc their backend may use. Such an undertaking is even provably impossible for combinations where one component requires escape sequences that again would need to be escaped for something else - in a circular manner. And that's just one thing that's objectively wrong about the idea.

You aren't understanding. You should only need one regex per input. It's super easy. Developers should understand what data their applications expect to receive from a client.

From OWASP:

"Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. *Input validation should happen as early as possible in the data flow*, preferably as soon as the data is received from the external party."

See https://owasp.org/www-community/Injection_Flaws for more details.

> For example: your SQL client library will automatically build queries in a safe manner, your HTML template library will escape all provided strings by default, and your logging library will not look for magic characters in format string arguments - that's what the damn format string itself is for!

Except when those libraries fail. Just like in the headline for TFA. Libraries can't always fix insecure application logic.

I don't understand how you think additional security checks are somehow detrimental. If I know some URL parameter should be a 16 character alphanumeric string, it should take you about 10 seconds to make a regex for that.

chmod775 · on Dec 13, 2021

First off, how nice of you to quote only the first and last part of my comment while paraphrasing the middle part as if you're telling me something new:

> Beyond the rules applicable to the specific type of input (for example: it should be a correctly UTF-8 encoded string with a maximum length)

versus

> You aren't understanding. You should only need one regex per input.

Also you're hopping between sanitation and validation as if you believe they're the same thing. My original example was a case of doing specifically validation badly and you specifically spoke about validation in your original comment. You then replied with a comment suggesting one should apply encoding, specifically escaping, to inputs. That is not called validation.

Apparently with this most recent comment we're back to validation.

At this point I don't know how to talk to you because you seem to make this conversation about something new with each comment and I'm past humoring it.

rank0 · on Dec 13, 2021

> Also you're hopping between sanitation and validation as if you believe they're the same thing. My original example was a case of doing specifically validation badly and you specifically spoke about validation in your original comment. You then replied with a comment suggesting one should apply encoding, specifically escaping, to inputs. That is not called validation.

You know what, I admit my writing isn't excellent. I'm fully aware of the difference between sanitization and validation and frequently lump them together in technical conversations. Input validation/sanitization along with OUTPUT encoding, are major security controls that should be present in your application and proper use of these techniques will protect you most of the time with your dependencies have a security flaw.

> At this point I don't know how to talk to you because you seem to make this conversation about something new with each comment and I'm past humoring it.

You replied to me saying that regex whitelists on untrusted input was "a horrible idea" and "objectively wrong". You argued that libraries should safely handle the untrusted input for you. You argued that validation/sanitization should not happen immediately upon receipt on the input. These points are just wrong. I'm not trying to be a dick, and I think its possible to have a constructive conversation here.

Full disclosure: My day job is as a web application pentester. I've tested/reviewed hundreds of applications.

I don't expect devs to be experts on security, but what triggered my ORIGINAL comment was all the software engineers ITT bashing open source libraries without considering security in their own code.