Hacker News new | past | comments | ask | show | jobs | submit login

This trick may be thought of as a simplification of the systematic approach to parsing stuff, that is the lexer-parser division of responsibilities.

The lexer uses regexes but only for splitting the input stream of characters into tokens. Identifiers, integers, operators, strings, keywords, opening brackets and whatnot - each type of token is defined by a regex. This part is hopefully deterministic and simple, although the lexer matches regexes for all kinds of tokens at once, which is why lexer generators are often used to generate lexers.

The heavy lifting is done by the actual parser which tries to combine the tokens into something that makes sense from the point of the grammar.

So in this trick the sub-regexes between |'s define the tokens (the lexer part) while the group mechanism selects the single token that we want to keep (a very very simple parser).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: