> Can you provide some intuition on which languages can be parsed by this algorithm, and which can't?
The restriction is that any recursion in the grammar has to be inside explicit begin and end tokens. For example, you can't write a rule like
stmt = 'if' expr 'then' stmt* | expr
because `stmt` refers to itself directly, but
stmt = [ 'if' expr 'then' stmt* 'end' ] | expr
is OK -- the [ and ] symbols indicate explicit recursion using 'if' and 'end' as the begin and end tokens.[1]
> Btw, why did you choose the name Owl?
I wanted a bird name, and Owl was short and easy to type!
. . .
[1] Though in this case, the language is still visibly pushdown, since you can expand it manually to:
stmt = (('if' expr 'then')+ expr*)+ | expr
This is because the grammar only has right recursion. Middle recursion (like you get if you add an 'else' clause) can't be expanded like this, so explicit recursion is necessary to parse languages which use it.
To automate this expansion process (and re-association into a sensible parse tree), Owl also has syntax for operators with precedence. Here's an example: https://ianh.github.io/owl/try/#expr
This seems similar to grammars described by DTTs (predecessor to XML Schema), in that it's regular expressions, that must be deterministic, that can recurse, but only when isolated by a tag. That is, recursion always results in nested elements, never a sequence of iterated elements:
yes: S -> <a> S </a> | e
no: S -> <a></a> S | e
I've seen these called "tree grammars". Perhaps it's identical with "visibly pushdown grammars"? I skimmed the wiki link (from your readme.md), but it requires further study. I think it would be great if your readme.md included a brief idea, along the lines of what your comment here.
I played around with tree grammars. I think there's a formal language reason for them being incapable of parsing expressions properly. I tried anyway, to find a subset or workaround that would work well enough, but failed. (IIRC) At best, I needed parentheses around one of the operator operations e.g. always around +, like (a+b)c or (a+bc)
Yeah, this is a fundamental problem. Luckily, arithmetic expressions are always left- or right-recursive, never middle-recursive. That means they're recognizable by a regular grammar, even if the parse tree doesn't match what you'd expect:
expr = number (('*' | '+') number)*
You can replace number by (number | [ '(' expr ')' ]) to add support for parentheses as well.
My solution was to add special syntax which makes rules like this automatically:
expr = number : num
.operators infix left '*' : times
.operators infix left '+' : plus
Thanks, the eg I tried before works now, I must have misinterpreted the output.
Nice, a language can be recognized, even if a different parse tree.
I vaguely recall the shunting yard algorithm... am I right in characterising your solution as a second level of parsing? It's not "pure", but from a practical point of view, it's no worse than back-references in PCRE. The practical issue is just of ui usability.
Hi, I see that this is your tool, just wanted to say it looks amazing! Also I did not realize how much of a difference syntax highlighting makes for this kind of thing until I clicked through to your Try Owl link, and, wow! Have you used this in anything? Do you know of anyone else using it yet? It looks like you've only been working on this for half a year, so it's still young, but it looks thoroughly built and well thought it. Very impressive!
Thanks! The two biggest uses of Owl right now are Owl itself (see grammar.owl in the root directory) and the "egg" programming language which is included as an example: https://github.com/ianh/owl/tree/master/example/egg-lang. I'm hoping to use it in some future projects as well!
Handling arithmetic expressions in a simple parser is difficult. The approach here seems limited in the usual way. e.g the following is parsed as two statements (separated by the +)
The restriction is that any recursion in the grammar has to be inside explicit begin and end tokens. For example, you can't write a rule like
because `stmt` refers to itself directly, but is OK -- the [ and ] symbols indicate explicit recursion using 'if' and 'end' as the begin and end tokens.[1]> Btw, why did you choose the name Owl?
I wanted a bird name, and Owl was short and easy to type!
. . .
[1] Though in this case, the language is still visibly pushdown, since you can expand it manually to:
This is because the grammar only has right recursion. Middle recursion (like you get if you add an 'else' clause) can't be expanded like this, so explicit recursion is necessary to parse languages which use it.To automate this expansion process (and re-association into a sensible parse tree), Owl also has syntax for operators with precedence. Here's an example: https://ianh.github.io/owl/try/#expr