Hacker News new | past | comments | ask | show | jobs | submit login

> Parsers for file formats?

Yes. As trivial example, consider a UDP packet format { u16 type,size; u8 data[size]; } fed the 4-byte packet {ECHO,0xFFFF}, which is incompetently parsed as {ECHO,"<65535 bytes of stack memory>"} because the 'shotgun parser' assumes its input is well-formed. Whereas a 'recognizer' (ie a non-buggy parser) would reject that packet as 65535 bytes too short.

Good programming language design can make it harder to write the buggy parser and easier to write the 'recognizer', especially if the language/standard library provides built-in parsing tools.




See, I get the idea, but to me it's just like saying "decent code is better than shitty code". Well, I mean, no shit.

I'm not being snarky. Compare "LangSec" to memory safety, which also kills this class of bug dead. Which approach is more powerful and forecloses on more bug classes? Which approach requires more developer effort? Introduces more jargon?

I know multiple very smart, capable people who work under the rubric of "LangSec". But I just don't get it. Is it a real thing?


Actually I was just answering the "parsers for what" question.

FWIW, I think LangSec is saying "Code that doesn't have remote code execution vulnerabilities[0], or limits them to a weak computational model[1], is better than code with RCE vulnerabilities." - which is also "Well, no shit." - and "Parsing a nontrivial data format is the same thing as executing a (not-necessarily-)very constrained programing language."[2] - which seems obvious to me, but could plausibly be a "superpositions don't collapse"-level epiphany for someone who doesn't think about parsing the right way.

0: such as javascript or stack execution

1: like FSMs or pushdown atomata

2: with the implication that you had better make sure it actually is very constrained


It's a real thing: http://langsec.org/papers/langsec-cwes-secdev2016.pdf

There are whole classes of errors related to programs that parse then validate input when it's already too late. And often the validation happens in the source code in a cloud of checks that happen at run time. It's rather difficult to verify these programs.

It is much easier to verify a parser that only produces valid values at the edge of a program, isolated from the main program.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: