Hacker News new | past | comments | ask | show | jobs | submit login

I feel like this is very underspecified, The very first example:

    $ echo 'cat' | trre 'c:da:ot:g'
    dog
Feels strange. What is happening here; the grammar says

    TRRE    <- TRRE* TRRE|TRRE TRRE.TRRE
    TRRE    <- REGEX REGEX:REGEX
What is the parse tree here? Why is "c" not being replaced with "da"? Or why isn't c being removed and "da" being replaced by "ot"?

I do like the idea of having a search/replace semantic that is more intuitive than grouping operators; back in MS-DOS days you could do "ren .log .txt" and this would work which feels bananas to my modern bash-minded way of thinking, but it's very obvious looking at this what it is supposed to do.






This is a matter of operator precedence and tokenization. Tokens are single characters in this language, and there is an invisible operator between them.

If the operator were explicit (let’s call it ~), the example would look like this:

    $ echo 'cat' | trre 'c:d~a:o~t:g'
    dog
With unnecessary parentheses:

    $ echo 'cat' | trre '(c:d)~(a:o)~(t:g)'
    dog

That's true. Thank you for elaborating.

There is a hidden operator of concatenation as for usual regular expressions. In the code I denote it as lower dot '.' (as in the old Thompson's implementation).


The grammar is underspecified. The full grammar is more complex. I guess I need just remove the current version from docs. Now it is confusing indeed.

> Why is "c" not being replaced with "da"?

It is all about precedence. According to the discussion I think I've chosen a wrong one and it raises confusion. Current version of precedence table is this:

| 1 | Escaped characters | \<special character> | | 2 | Bracket expression | [] | | 3 | Grouping | () | | 4 | Single-character-ERE duplication | * + ? {m,n} | | 5 | Transduction | : | | 6 | Concatenation | . (implicit) | | 8 | Alternation | | |

So the ':' is stronger then '.' (implicit concatenation).


Yes it is underspecified. The deletion example shows that an empty string is possibly a REGEX. So you can essentially treat any position as containing as many empty string regexes as you want. So there are indeed infinite number of parses.

If we instead require regex to be non-empty (breaking the deletion examples), then the ambiguity becomes that of concatenation: whether it's '(((c:d)(a:o))(t:g))' or '((c:d)((a:o)(d:g)))'. Assuming associativity, this would not matter.


From what it feels as to how it works, it seems that c:d and a: (nothing) and ot:g

but yes now that I read it , it also makes confusion , theoretically your point makes valid , I also believe that c should be replaced by da after I read the repo , I am not sure ...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: