I played around with tree grammars. I think there's a formal language reason for them being incapable of parsing expressions properly. I tried anyway, to find a subset or workaround that would work well enough, but failed. (IIRC) At best, I needed parentheses around one of the operator operations e.g. always around +, like (a+b)c or (a+bc)
Yeah, this is a fundamental problem. Luckily, arithmetic expressions are always left- or right-recursive, never middle-recursive. That means they're recognizable by a regular grammar, even if the parse tree doesn't match what you'd expect:
expr = number (('*' | '+') number)*
You can replace number by (number | [ '(' expr ')' ]) to add support for parentheses as well.
My solution was to add special syntax which makes rules like this automatically:
expr = number : num
.operators infix left '*' : times
.operators infix left '+' : plus
Thanks, the eg I tried before works now, I must have misinterpreted the output.
Nice, a language can be recognized, even if a different parse tree.
I vaguely recall the shunting yard algorithm... am I right in characterising your solution as a second level of parsing? It's not "pure", but from a practical point of view, it's no worse than back-references in PCRE. The practical issue is just of ui usability.
I'm interested that you have this too, but it seems to have the same problems I found: https://news.ycombinator.com/item?id=17650770
It would be very exciting if you have found a way around this!