> A general premise that you should block anything you don't recognize is flawed...

zrm · on Sept 14, 2016

> Which is why trying to maintaining a blacklist of bad things ("enumerating badness") is destined to fail.

Unfortunately it's also why "enumerating goodness" is destined to fail. It's like the instructions for securing a computer: Dig a deep hole, put the computer in the hole, throw a live grenade in the hole, and now the computer is secure.

It's not enough to be secure, it also has to do the thing the user wants it to do. If it doesn't then the users (and developers who are after the users' patronage) will figure out how to make it happen anyway, which means bypassing your validation one way or another.

The flaw is in the assumption that immediately denying something the user wants is more secure than not immediately denying something the user doesn't want, which is flawed because of the second order effects.

If the thing you put in front of Alice to prevent Mallory from sending bad things also prevents Bob from sending good things, Alice and Bob are going regard your validation as adversarial and get together to devise an alternate encoding for arbitrary data that will pass your validation. Which information theory says they can always do at the cost of some space inefficiency. But as soon as Alice starts accepting unvalidated data using that encoding method, it allows Mallory to send malicious data to Alice that will pass your validation.

The solution is to do validation as soon as possible but no sooner. If you don't know what something is but it could be valid, you have to let it go so that the thing downstream which actually does know can make that determination itself.

I mean I get how we got here. Some things that should be doing validation don't do it well or at all, and then people try to put validation in front of them to make up for it. But if you do that and reject something the endpoint wants (or you're e.g. stubbornly enforcing an older protocol version) then new endpoint code is going to pay the cost of encoding around you, which is expensive in efficiency and complexity and deprives you of the ability to do the validation you are equipped to do.

If the downstream code isn't doing validation correctly then it has to be fixed where it is.

> If you haven't, you should really watch the talk I linked to initially.

I don't think anything I'm saying is strongly in conflict with it. You can validate against a grammar and still treat part of the data as a black box. Obvious example is validating an IP packet without assuming anything about the payload structure.