Rust-sitter: Define your entire tree-sitter grammar in Rust code

hardwaregeek · on Feb 22, 2023

I'm super excited about the ability to get good quality bindings to tree-sitter! Tree-sitter is a very cool project but it needs a little love in a few regards. A better WebAssembly story, good bindings, and the ability to construct trees synthetically (to facilitate code generation) would make it a really remarkable tool.

As for rust-sitter, it's a very promising direction. The error story needs a little work. I found it pretty hard to debug issues with my grammar. And it is slightly annoying having all those extra () fields in the struct, but that's really a minor complaint.

shadaj · on Feb 23, 2023

Yeah, definitely agree on WebAssembly! We have https://github.com/shadaj/tree-sitter-c2rust for running Tree Sitter on WASM via Rust, but definitely more potential in that direction. And very much agree on the error story needing work, right now it's mostly `panic`s everywhere and could definitely be improved with richer diagnostics.

shilangyu · on Feb 22, 2023

Are there any benefits for users that tree-sitter is used under the hood? Can we benefit from the killer features of tree-sitter? Namely incremental parsing, fallible parsing, lossless syntax tree, or being embeddable into editors supporting tree-sitter syntax highlighting?

shadaj · on Feb 22, 2023

Yes! Right now, the main benefits are the ability to write grammar definitions that are quite close to the ideal AST structure (made possible by Tree Sitter's grammar format), and being able to embed the parser in many different applications (including WASM via https://github.com/shadaj/tree-sitter-c2rust). Rust Sitter also gives quite nice error diagnostics with spans thanks to Tree Sitter's recovery logic.

Fallible parsing is something I plan to implement in the very near future, by letting users wrap types in `Result` to mark them as an error boundary. Incremental parsing is a bit more difficult, since we'll need to add logic to know when an existing AST struct can be reused, but is on the roadmap.

shilangyu · on Feb 22, 2023

I would like to delve into the compatibility with tree-sitter, since in other features tree-sitter being under the hood is mostly an implementation detail:

If I were to write my parser using rust-sitter, would I be able to still generate the final standalone tree-sitter parser as a `.so`? That way I could integrate with tools supporting tree-sitter parsers (for instance https://github.com/nvim-treesitter/nvim-treesitter#language-...) without having to write the `.js` grammar?

shadaj · on Feb 23, 2023

In principle, yes, you can use the `rust-sitter-tool` crate to generate the Tree Sitter JSON definition and then compile it to a standalone parser. The grammar is auto-generated though so it may be a bit trickier to integrate into other tooling? The general problem of exporting just the grammar is something that's been on my radar, but haven't had a chance to think through it too deeply yet.

shadaj · on Feb 22, 2023

Hi! Rust Sitter creator here, happy to answer any questions about the project and where it's going!

mcluck · on Feb 22, 2023

Great project!

I'm curious about the set of available parser annotations. I went through the list but I didn't see anything which allows for optional elements. One example would be dangling commas. Did I miss it or are there plans to support such constructs?

shadaj · on Feb 22, 2023

We support optional elements by wrapping them in `Option<T>` (other annotations are applied to the contents of the option)! So you can define

  struct ... {
    ...
    #[rust_sitter::leaf(text = ",")
    _dangling_comma: Option<()>
  }

mcluck · on Feb 22, 2023

Perfect, thanks!

IshKebab · on Feb 22, 2023

What's the advantage of defining the grammar in Rust? JavaScript is admittedly an odd choice, but it works, there are a ton of grammars out there and it isn't needed at runtime.

Changing the runtime to Rust is very useful though - I've wanted that for a long time to make WASM and cross compilation easier.

hardwaregeek · on Feb 22, 2023

The ability to define the grammar via high quality Rust bindings is pretty inspired imo. It gets you waaaay better bindings than the homogenous Node type that tree-sitter provides you.

dlahoda · on Feb 22, 2023

does it support callbacks to handle semantic space languages like haskell and python?

shadaj · on Feb 23, 2023

Not yet, but this is something I've been investigating. The general plan is to have safe Rust bindings to the underlying Tree Sitter APIs used by custom scanners, and then have the Rust Sitter proc macro expose a Rust scanner as an `extern` function that the Tree Sitter runtime can call back to.