Hacker News new | past | comments | ask | show | jobs | submit login

> Can you provide some intuition on which languages can be parsed by this algorithm, and which can't?

The restriction is that any recursion in the grammar has to be inside explicit begin and end tokens. For example, you can't write a rule like

    stmt = 'if' expr 'then' stmt* | expr
because `stmt` refers to itself directly, but

    stmt = [ 'if' expr 'then' stmt* 'end' ] | expr
is OK -- the [ and ] symbols indicate explicit recursion using 'if' and 'end' as the begin and end tokens.[1]

> Btw, why did you choose the name Owl?

I wanted a bird name, and Owl was short and easy to type!

. . .

[1] Though in this case, the language is still visibly pushdown, since you can expand it manually to:

    stmt = (('if' expr 'then')+ expr*)+ | expr
This is because the grammar only has right recursion. Middle recursion (like you get if you add an 'else' clause) can't be expanded like this, so explicit recursion is necessary to parse languages which use it.

To automate this expansion process (and re-association into a sensible parse tree), Owl also has syntax for operators with precedence. Here's an example: https://ianh.github.io/owl/try/#expr




This seems similar to grammars described by DTTs (predecessor to XML Schema), in that it's regular expressions, that must be deterministic, that can recurse, but only when isolated by a tag. That is, recursion always results in nested elements, never a sequence of iterated elements:

    yes:  S -> <a> S </a> | e
    no:   S -> <a></a> S  | e
I've seen these called "tree grammars". Perhaps it's identical with "visibly pushdown grammars"? I skimmed the wiki link (from your readme.md), but it requires further study. I think it would be great if your readme.md included a brief idea, along the lines of what your comment here.


I think they're the same thing, though IIRC tree grammars are usually defined to operate on actual trees rather than strings of text.


I played around with tree grammars. I think there's a formal language reason for them being incapable of parsing expressions properly. I tried anyway, to find a subset or workaround that would work well enough, but failed. (IIRC) At best, I needed parentheses around one of the operator operations e.g. always around +, like (a+b)c or (a+bc)

I'm interested that you have this too, but it seems to have the same problems I found: https://news.ycombinator.com/item?id=17650770

It would be very exciting if you have found a way around this!


Yeah, this is a fundamental problem. Luckily, arithmetic expressions are always left- or right-recursive, never middle-recursive. That means they're recognizable by a regular grammar, even if the parse tree doesn't match what you'd expect:

    expr = number (('*' | '+') number)*
You can replace number by (number | [ '(' expr ')' ]) to add support for parentheses as well.

My solution was to add special syntax which makes rules like this automatically:

    expr = number : num
     .operators infix left '*' : times
     .operators infix left '+' : plus
While the parse tree is built, the operators and operands are rearranged into the expected parse tree using https://en.wikipedia.org/wiki/Shunting-yard_algorithm. Here's an interactive example with some more operators: https://ianh.github.io/owl/try/#expr


Thanks, the eg I tried before works now, I must have misinterpreted the output.

Nice, a language can be recognized, even if a different parse tree.

I vaguely recall the shunting yard algorithm... am I right in characterising your solution as a second level of parsing? It's not "pure", but from a practical point of view, it's no worse than back-references in PCRE. The practical issue is just of ui usability.


Hi, I see that this is your tool, just wanted to say it looks amazing! Also I did not realize how much of a difference syntax highlighting makes for this kind of thing until I clicked through to your Try Owl link, and, wow! Have you used this in anything? Do you know of anyone else using it yet? It looks like you've only been working on this for half a year, so it's still young, but it looks thoroughly built and well thought it. Very impressive!


Thanks! The two biggest uses of Owl right now are Owl itself (see grammar.owl in the root directory) and the "egg" programming language which is included as an example: https://github.com/ianh/owl/tree/master/example/egg-lang. I'm hoping to use it in some future projects as well!


This is actually a full fledged programming language? With two syntaxes (s-expressions and non)?? You are very serious about this! I am astounded.


Handling arithmetic expressions in a simple parser is difficult. The approach here seems limited in the usual way. e.g the following is parsed as two statements (separated by the +)

    print 2*3 + 4


Did you want a bird name because of "nested" words?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: