This is great, and a step in the right direction. I wish tree-sitter had an official higher level API that allowed processing and pattern matching for use cases other than those required for text editors.
I’m currently using tree-sitter at work to build AST-based tools, as performance is amazing, even with huge codebases, but I’m finding it slightly frustrating to have to manually write recursive descent processors keyed by strings, with no compile time guarantees on the structure of the grammar.
This is compounded by the fact that grammars themselves don’t really follow any standard structure, some have named fields (presumably the ones created after GitHub contributed this feature), while others require hierarchical pattern matching.
I wish there existed a tool to consume a grammar and output a rust ADT that we can simply match on. This would at least save me from redundant error handling. I’d build one myself, but I’m that good at rust yet.
> I wish tree-sitter had an official higher level API that allowed processing and pattern matching for use cases other than those required for text editors.
Is the pattern matching API not sufficiently high level? In my experience, it's a huge improvement over implementing visitors for everything.
I’ve also encountered this problem using various tree-sitter grammars. I would love a data set that showed various implementations for different languages, along with some kind of consistent test coverage for each language that shows compatibility versus the compiler’s parser. And, of course, links to precompiled wasm modules. Basically, a tree-sitter package manager.
More or less, yes. CLI, offline, no need for a cloud account. Used ast-grep successfully to locate bad code blocks (dynamic typing, don't even get me started) and also to replace them with others. Highly recommended.
ast-grep is based on treesitter. I found Semgrep great for simple things but impossible due to edge cases for complicated things. ast-grep is more difficult for simple cases but all the information you need is there for complex cases.
As the other sibling commenter said, both `ast-grep` and `gritql` are based on Treesitter which means that you can in fact just look for certain function call and it will be found no matter how it's formatted, something that plain grep and sometimes semgrep I am not sure can do.
I have used `ast-grep` to devise my own linters with crushing success.
Question (caveat: first export to treesitter and tools like this):
Is there a reason the example demonstrates the use of depth as a variable
instead of it being built in?
Nesting level of a particular "type" is general enough that it might be included OOTB.
What you want to do with this might be generalizable -
for example instead of
So depth is always of the nested levels of the same node type, but available out of the box.
For markdown, it's headings, sections and lists come to mind - but I might be wrong.
In any event, this looks really well thought-out and now to checkout the other tools mentioned in the comments.....
The depth here can be context dependent. For example if you had a bunch of brackets and parens in your grammar, you might only care about paren depth. Or if your language had brackets and parens and function definitions, your "expression depth" might ignore function definitions (or even reset at a function definition boundary if you have inner functions!)
Impressive! The grammar.json file is just a little bit too underspecced to automate some things. Not to mention it's self-referential. How did you deal with extras and other 'specialisms' that are secretly hidden away in the C-level scanner and so on?
I ask because I wrote Combobulate [1], a structured editing and movement tool for Emacs using TS.
Also there was several requests to create a more formal grammar to describe the grammars but the tree-sitter developers doesn't like the idea and reject then.
Hi, in case you're not already aware of the name clash, there's already a `rr` in the programming world. It's "record and replay": https://rr-project.org/.
As someone writing a neovim plugin using treesitter thank you! Languages like this help leverage treesitter in more interesting ways whereas current apis are still a bit low-level
The md-to-html demo is a good one, but worth mentioning that the Markdown parser[1] being used may not be suitable for more complex documents. From the README:
> "...it is not recommended to use this parser where correctness is important. The main goal for this parser is to provide syntactical information for syntax highlighting..."
There's also a separate block-level and inline parser, not sure how `tbsp` handles nested or multi-stage parsing.
Even worse, the README implies tree-sitter is just not going to work for markdown at all[1], this is not a matter of a little polish and bugfixing:
> These stem from restricting a complex format such as markdown to the quite restricting tree-sitter parsing rules.
[1]: Outside of something like tree-sitter v2 with a much more complex grammar support. And frankly I personally don't think making more complex grammars in Javascript+C is a good way forward.
Adding a way to query the path at the current node would let you skip out on doing stuff like keeping track of `in_section`.
I wonder if the `enter|exit ...` syntax might be too limiting but for a lot of stuff it seems nice and easy to reason about. Easier than tree-sitter's own queries.
I think if you really wanted performance and whatnot, you might end up compiling the queries to another target and just reuse them.
I could see myself writing a lua DSL around compiling these kinds of queries `enter/exit` stanzas or an SQL one too.
Though, being the abbreviation for tablespoon, make searching for this a fair bit harder. As long as code files using this language don't get called recipes...
This is really cool! I have a lot of short projects that are essentially “parse out 2 or 3 tags of HTML and convert that to CSV. This will be perfect for that; in the past I’ve done it by hand with vim. Next time I’ll give this a shot.
I’m currently using tree-sitter at work to build AST-based tools, as performance is amazing, even with huge codebases, but I’m finding it slightly frustrating to have to manually write recursive descent processors keyed by strings, with no compile time guarantees on the structure of the grammar.
This is compounded by the fact that grammars themselves don’t really follow any standard structure, some have named fields (presumably the ones created after GitHub contributed this feature), while others require hierarchical pattern matching.
I wish there existed a tool to consume a grammar and output a rust ADT that we can simply match on. This would at least save me from redundant error handling. I’d build one myself, but I’m that good at rust yet.