Hacker News new | past | comments | ask | show | jobs | submit login

> makes it difficult to share parser logic between the two

It’s difficult to reuse traditional compiler logic in an IDE, but there’s good examples that the reverse isn’t true. IDE validation of language semantics is a strictly more complex problem, but if you start by solving that, it’s not as hard to add a compiler backend. A compiler’s job is to either take correct code and translate to its compiled form or take incorrect code and report errors. There’s no reason an IDE-focused parser/compiler can’t do both.

IIRC, Microsoft talked publicly about how they built the C# compiler as IDE-first and found that it simplified things greatly. And I think there has been substantive discussions within the Rust community about bringing parts of rust-analyzer into the official compiler whereas the RLS approach of reusing compiler APIs wasn’t able to provide a reasonable IDE experience.




My perspective comes from writing an XPath and XQuery lexer and parser for IntelliJ, which has its own lexer, parser, and AST APIs.

The XPath lexer and parser are designed to be overridden where needed to implement the XQuery lexer and parser.

The lexer itself has state as a stack-based lexer in order to tokenize the different structures (string literals, comments, embedded XML) correctly. A compiler could use the parse state as the context to drive the tokenizer without needing a state/stack-based lexer.

The lexer also treats keyword tokens as an identifier type as keywords can be used as identifiers. This is not necessary in a compiler as it knows when it is reading/expects a keyword.

My parser handles the different versions of XPath/XQuery, the different extensions, and vendor-specific extensions all in a unified lexer/parser. A compiler could ignore the bits it does not support and simplify some of the logic.

My QName parser is very complex due to providing error recovery and reporting for things like spaces, etc. -- Other parsers (e.g. Saxon) treat the QName as a single token.

I'm also generating a full AST with single nodes removed, e.g.:

    XPath
       InstanceofExpr
          IntegerLiteral       "5"
          XmlNCName            "instance"
          XmlNCName            "of"
          SequenceType
             AtomicOrUnionType
                QName
                   XmlNCName   "xs"
                   Token       ":"
                   XmlNCName   "string"
             Token             "?"
I'm traversing this AST to do things like variable and namespace resolution. For the modules, I'm using the IDE's mechanisms to search the project files. -- In a compiler, these would be collated and built as the file is parsed, which does not work with incremental/partial parsing.

I'm getting to the stage where I can evaluate several static programs due to the need of implementing IDE features, and providing static analysis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: