Tbsp – treesitter-based source processing language

fellowmartian · 2024-09-02T05:23:32 1725254612

This is great, and a step in the right direction. I wish tree-sitter had an official higher level API that allowed processing and pattern matching for use cases other than those required for text editors.

I’m currently using tree-sitter at work to build AST-based tools, as performance is amazing, even with huge codebases, but I’m finding it slightly frustrating to have to manually write recursive descent processors keyed by strings, with no compile time guarantees on the structure of the grammar.

This is compounded by the fact that grammars themselves don’t really follow any standard structure, some have named fields (presumably the ones created after GitHub contributed this feature), while others require hierarchical pattern matching.

I wish there existed a tool to consume a grammar and output a rust ADT that we can simply match on. This would at least save me from redundant error handling. I’d build one myself, but I’m that good at rust yet.

pedrovhb · 2024-09-02T11:11:48 1725275508

You may already be aware of it, but in case not - it sounds like tree-sitter-graph could be something you'd be interested in: https://docs.rs/tree-sitter-graph/latest/tree_sitter_graph/r...

I haven't gotten into it yet but it looks pretty neat, and it's an official tool.

sweetgiorni · 2024-09-02T06:45:52 1725259552

> I wish tree-sitter had an official higher level API that allowed processing and pattern matching for use cases other than those required for text editors.

Is the pattern matching API not sufficiently high level? In my experience, it's a huge improvement over implementing visitors for everything.

https://tree-sitter.github.io/tree-sitter/using-parsers#patt...

CGamesPlay · 2024-09-02T06:44:32 1725259472

I’ve also encountered this problem using various tree-sitter grammars. I would love a data set that showed various implementations for different languages, along with some kind of consistent test coverage for each language that shows compatibility versus the compiler’s parser. And, of course, links to precompiled wasm modules. Basically, a tree-sitter package manager.

rtpg · 2024-09-02T01:25:58 1725240358

So an awk but that knows how to walk structures instead of just lines. Excellent!

I'm a big fan of semgrep letting me query ASTs, this feels like something in a similar space. Down with lines, up with everything being trees!

pdimitar · 2024-09-02T01:47:02 1725241622

Have you checked ast-grep and gritql?

normie3000 · 2024-09-02T02:12:50 1725243170

Are these alternatives to semgrep?

pdimitar · 2024-09-02T02:54:36 1725245676

More or less, yes. CLI, offline, no need for a cloud account. Used ast-grep successfully to locate bad code blocks (dynamic typing, don't even get me started) and also to replace them with others. Highly recommended.

dbaupp · 2024-09-02T03:06:16 1725246376

Semgrep also a CLI, that can run offline and without a cloud account.

At work, we use it for enforcing a bunch of custom lint rules configured as a yaml file committed directly to our repo, entirely cloud-free.

(I may be overreading your comment as suggesting that these were reasons to use ast-grep over semgrep.)

gregwebs · 2024-09-02T10:04:21 1725271461

ast-grep is based on treesitter. I found Semgrep great for simple things but impossible due to edge cases for complicated things. ast-grep is more difficult for simple cases but all the information you need is there for complex cases.

beardedwizard · 2024-09-07T17:32:47 1725730367

Semgrep is also based on tree sitter

pdimitar · 2024-09-05T06:05:12 1725516312

As the other sibling commenter said, both `ast-grep` and `gritql` are based on Treesitter which means that you can in fact just look for certain function call and it will be found no matter how it's formatted, something that plain grep and sometimes semgrep I am not sure can do.

I have used `ast-grep` to devise my own linters with crushing success.

sramam · 2024-09-02T02:42:00 1725244920

This is so cool.

Question (caveat: first export to treesitter and tools like this): Is there a reason the example demonstrates the use of depth as a variable instead of it being built in?

Nesting level of a particular "type" is general enough that it might be included OOTB. What you want to do with this might be generalizable - for example instead of

```

    enter section {
        depth += 1;
    }
    leave section {
        depth -= 1;
    }

    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }

```

It could simply be:

```

    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }

```

So depth is always of the nested levels of the same node type, but available out of the box. For markdown, it's headings, sections and lists come to mind - but I might be wrong.

In any event, this looks really well thought-out and now to checkout the other tools mentioned in the comments.....

rtpg · 2024-09-02T05:52:52 1725256372

The depth here can be context dependent. For example if you had a bunch of brackets and parens in your grammar, you might only care about paren depth. Or if your language had brackets and parens and function definitions, your "expression depth" might ignore function definitions (or even reset at a function definition boundary if you have inner functions!)

mingodad · 2024-09-02T07:40:27 1725262827

For those that want to explore the grammars listed at https://github.com/tree-sitter/tree-sitter/wiki/List-of-pars... in a more friendly railroad diagram format I made https://mingodad.github.io/plgh/json2ebnf.html that reads the "src/grammar.json" and try it's best to generate an EBNF understood by (IPV6) https://www.bottlecaps.de/rr/ui or (IPV4) https://rr.red-dove.com/ui where we get a nice navigable railroad diagram (see https://github.com/GuntherRademacher/rr for offline usage).

mickeyp · 2024-09-02T09:00:15 1725267615

Impressive! The grammar.json file is just a little bit too underspecced to automate some things. Not to mention it's self-referential. How did you deal with extras and other 'specialisms' that are secretly hidden away in the C-level scanner and so on?

I ask because I wrote Combobulate [1], a structured editing and movement tool for Emacs using TS.

1: https://github.com/mickeynp/combobulate

mingodad · 2024-09-02T10:15:37 1725272137

Also there was several requests to create a more formal grammar to describe the grammars but the tree-sitter developers doesn't like the idea and reject then.

But some people did nice attempts like https://github.com/eatkins/tree-sitter-ebnf-generator that I also adapted and exposed it here https://mingodad.github.io/lua-wasm-playground/ to allow play with it online (select "Tree-sitter-ebnf-generator" from examples then click "Run" to see a "grammar.js" generated from the content in "Input Text (arg[1])").

mingodad · 2024-09-02T17:41:37 1725298897

I've added more non trivial grammars Javascript, Java, Kotlin, PHP, C, CPP, Rust, Ruby, CSS, HTML, Python using a quickjs script to convert "src/grammar.json" to an EBNF understood by https://mingodad.github.io/lua-wasm-playground/ (the script is here https://github.com/mingodad/plgh/blob/main/json2ebnf-lua.js).

mingodad · 2024-09-02T10:04:08 1725271448

I simple ignore then as right now they doesn't seen relevant in most grammars to generate an usable railroad diagram.

yaantc · 2024-09-02T08:14:38 1725264878

Hi, in case you're not already aware of the name clash, there's already a `rr` in the programming world. It's "record and replay": https://rr-project.org/.

Very different, but a very fine tool tool too.

rafram · 2024-09-02T12:59:55 1725281995

It doesn’t seem like the rr that GP linked to is their own project, just something they’ve found useful.

In any case, in the non-software world, “RR” stands for railroad, as it does in the name of that tool. You can’t own a common two-letter abbreviation.

lukan · 2024-09-02T10:46:04 1725273964

Awesome!

Just yesterday I started some experiments in that direction, to visualize grammars, but now I can rather do something else ..

MantisShrimp90 · 2024-09-02T02:21:29 1725243689

As someone writing a neovim plugin using treesitter thank you! Languages like this help leverage treesitter in more interesting ways whereas current apis are still a bit low-level

freedomben · 2024-09-02T14:58:43 1725289123

What neovim plugin are you writing?

samgriesemer · 2024-09-02T05:59:31 1725256771

The md-to-html demo is a good one, but worth mentioning that the Markdown parser[1] being used may not be suitable for more complex documents. From the README:

> "...it is not recommended to use this parser where correctness is important. The main goal for this parser is to provide syntactical information for syntax highlighting..."

There's also a separate block-level and inline parser, not sure how `tbsp` handles nested or multi-stage parsing.

[1]: https://github.com/tree-sitter-grammars/tree-sitter-markdown

yencabulator · 2024-09-03T18:51:32 1725389492

Even worse, the README implies tree-sitter is just not going to work for markdown at all[1], this is not a matter of a little polish and bugfixing:

> These stem from restricting a complex format such as markdown to the quite restricting tree-sitter parsing rules.

[1]: Outside of something like tree-sitter v2 with a much more complex grammar support. And frankly I personally don't think making more complex grammars in Javascript+C is a good way forward.

ashkankiani · 2024-09-02T06:44:59 1725259499

Adding a way to query the path at the current node would let you skip out on doing stuff like keeping track of `in_section`.

I wonder if the `enter|exit ...` syntax might be too limiting but for a lot of stuff it seems nice and easy to reason about. Easier than tree-sitter's own queries.

I think if you really wanted performance and whatnot, you might end up compiling the queries to another target and just reuse them.

I could see myself writing a lua DSL around compiling these kinds of queries `enter/exit` stanzas or an SQL one too.

orra · 2024-09-02T08:34:01 1725266041

Not a technical comment (as cool as this is), but I love the name.

We always say naming things is one of the hard parts of programming. They avoided the default option of something like tawk.

askvictor · 2024-09-02T11:57:31 1725278251

Though, being the abbreviation for tablespoon, make searching for this a fair bit harder. As long as code files using this language don't get called recipes...

icy · 2024-09-02T11:43:02 1725277382

trawk (tree awk) was one of the initial names for this (not author, but know him personally)

linguistics__ · 2024-09-02T08:45:14 1725266714

I mean I'll be calling (pronouncing) it Tablespoon, that's a great name:)

toastal · 2024-09-02T03:54:21 1725249261

Always kudos towards taking a self-hosted-forge approach

otreblan · 2024-09-06T23:52:50 1725666770

https://aur.archlinux.org/packages/tbsp-git

lumb63 · 2024-09-02T11:40:47 1725277247

This is really cool! I have a lot of short projects that are essentially “parse out 2 or 3 tags of HTML and convert that to CSV. This will be perfect for that; in the past I’ve done it by hand with vim. Next time I’ll give this a shot.

jpgvm · 2024-09-02T01:45:01 1725241501

Maybe update the link to https://git.peppe.rs/languages/tbsp/tree/readme.txt?

Terretta · 2024-09-02T01:51:49 1725241909

Some might prefer https://git.peppe.rs/languages/tbsp/about/

barlog · 2024-09-02T04:47:43 1725252463

Is it formerly peppe.rs ?

Here is the new account and doc for tbsp below.

https://oppi.li/posts/introducing_tablespoon/

icy · 2024-09-02T11:43:55 1725277435

The git is still hosted at peppe.rs.

orjicu98 · 2024-09-02T10:15:32 1725272132

very interesting paradigm of programmin i would recommend checking out, for inspiration: https://rosettacode.org/wiki/Category:Bracmat and https://www.egison.org/

they define themselves as non linear patter matching pretty niche and unique way to program and i enjoyed playing with thier code

thanks for posting very nice

azeirah · 2024-09-02T06:56:22 1725260182

Awesome! I'd love to see this flourish.

vslira · 2024-09-02T11:40:52 1725277252

That's a lot of work to write lisp without parentheses /j

I joke, really interesting project, props to the team

PoppGolfer · 2024-09-02T10:47:56 1725274076

tablespoon - of course....