Hacker News new | past | comments | ask | show | jobs | submit login

Once you have an intuition, you can use yacc.



Once you have an intuition, you just write the parser yourself.

I'm not a fan of parser generators. I find them restrictive. It is also notable most mainstream languages do not use tools like YACC, they have "handmade" parsers and compilers.

This may seem like I'm suggesting pie-in-the-sky approach from the ground up for something that may never need such control. Could be. But I don't feel like learning YACC is easier than learning how to parse anything in general. I think YACC and so on are holding people back from truly expressing themselves with custom languages, and after all that was the whole point of them writing a language or wasn't it?


So let's say, you decide to create your own programming language, now you can automatically afford to do as much work as the organizations behind mainstream programming languages? Organizations with thousands of contributors, funding, etc.?

Now, let's omit that for a moment. Let's say that after having implemented your language after years of work, now for some reason you got 1 prospect user.

That person will ask you about: syntax highlighting, linting, code navigation, testing, packaging, documentation, operating systems/architecture support, etc. Who is going to contribute all that if you are busy writing everything by hand?

Unless you have as much time as Terry Davis, you are better off starting by piggybacking on something that exists. Then, once you have an idea that scales, you can convince other people to help you and have a successful project. Then you can have a viable ecosystem that people can comfortably join.


The real problem about parsing is not parsing the formal grammar. It's deciding what to store about the parsed input (CST to AST) and designing the ancillary datastructures for further processing, integrating the parser with the rest of the application, etc.

The real problem about building a compiler is everything else but not parsing. Parsing is the easiest part. Once you've written a view recursive descent parsers you will probably agree.

The parser for syntax highlighting is probably a different one than the parser for AST generation and compilation. So is a parser for batch compilation different to one for incremental compilation. Even the syntax might be different.


Using YACC doesn't give you magical integration with IDEs, operating systems, testing, documentation and so on. I feel as if you took my specific comment about parsers (which are not hard to write at all, few hundred lines of code for a moderately complex language) in an inappropriately general way as in "don't use anything for anything when doing anything".


.... "syntax highlighting, linting, code navigation, testing, packaging, documentation, operating systems/architecture support"

This has nothing to do with using a parser generator or not.


It has to do with having time to spend in creating a convenient development experience and picking your battles.


That's agreeable. We seems to disagree on how to achieve that ideal - on the specific point of whether to use a parser generator or not. The correct answer is not :)


What you consider correct or not is simply an opinion.


I'm not well versed in this space. Are you suggesting designing a language without specifying a formal grammar?


One does not preclude the other. Most languages that lend themselves to hand-parsing via recursive descent are probably LL(1).

Syntaxes that are easy to parse via recursive descent are probably also the ones that are easiest to parse for humans.


Knowing how to build a parser manually will make you much much better at generating, and most importantly debugging, automatically generated parsers.


Yes. And once you are done learning every aspect of it, one day you will look yourself in the mirror and you will be 60 years old.


I'm not sure why this intuition is so strong, I had it myself before learning this stuff in my early 20s on shipping projects (both existing, and from scratch) and at the end of that I was still in my early 20s.

Despite this intuition it's all actually very learnable and my experience is that using a parser generator makes no sense in a professional context and very little sense in a hobby/toy project context.


With any luck that will happen to you sooner or later.

Better to have it happen knowing you've mastered some subject.


Why do you think people are writing toy programming languages?

It's a learning exercise. Sure, if you want to learn to use yacc, or whatever, use yacc. I don't think time efficiency is a useful metric when the overwhelming majority of people is this space aren't doing it for their job, or even because they 'need' to create a new language.

Ultimately you could just use python, it'll be faster and more fully featured.


And lose touch with your code and having too deal with problems through a heavily abstracted layer. Great!


You will likely not get a perfect syntax right away. So you will need to iterate on it until you are happy with how it looks and feels, and doing it while developing the compiler yourself will naturally lead to a lot of unnecessary bugs and tech debt.

Alternatively, you iterate using a cheaper method and once you reach a point in which you are content with the syntax, you can consider writing a compiler for it.

You can also avoid iterating on the syntax so you can ship a compiler as fast as possible, but users won't like the syntax and won't use your language.


I have a feeling you haven't worked on parsers or compilers that much if you think you design all of the syntax before building anything


I don't really think of yacc as having much more abstraction than a regex engine -- like in an (NFA-based) regex engine, you're building a big data structure a human wouldn't want to hand-engineer from a declarative specification. (And, like in regexes, you've got good theory backing it.)


You got good theory backing hand-written parsers, and the code is there right in your face. I don't know how a regex engine works.


By "good theory," I mean being able to know properties of the parser as a result of the formalism.

For example, one can prove that any yacc-generated parser never infinite loops and runs in O(n) (at least for the actual parsing; you can write custom code in actions that runs when a construct has been parsed, and of course you can write whatever you want there).

If you hand-write a grammar like the following, you'll end up with an infinite loop without restructuring the grammar; such restructuring is of course much easier if you have written the grammar out, rather than having it implicitly exist in code.

    Expr ::= Expr '[' Expr ']'
    Expr ::= Name
    Expr ::= Int
If you try to naively parse the following, you'll never parse the marked production either. Something like yacc will warn you about this, while an ordinary compiler compiling a hand-written parser won't.

    Stmt ::= Expr ';'
    Stmt ::= Name '=' Expr ';'
    Stmt ::= Expr '[' Expr ']' '=' Expr ';'
    Expr ::= Expr '[' Expr ']'
    Expr ::= Name
    Expr ::= Int
[0] is a good introduction to regex matching, [1] is a good introduction to the algorithm yacc uses, if you want to learn either.

[0]: https://swtch.com/~rsc/regexp/regexp1.html [1]: https://web.archive.org/web/20210507215636/http://web.cs.dal...


> For example, one can prove that any yacc-generated parser never infinite loops and runs in O(n)

Trivial when actually writing the code. And your example of infinite left recursion is basics in parser theory. No one in their right mind would write a hanging program like that, at least not for long.


Left recursion is basic, sure, but it's not a problem for an (LA)LR parser generator like yacc. (Depending on your parse actions, you might even get slightly better memory consumption when using left recursion rather than right recursion.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: