Stack Graphs

spinningslate · on Dec 9, 2021

Cool to see this is inspired by Eelco Visser's [1] group at TU-Delft, who originated the concept of scope graphs [2].

The back story of their language design research and tools is worth reading [3]. Not least because Nix originated as their build system!

[1] https://eelcovisser.org/

[2] https://github.blog/2021-12-09-introducing-stack-graphs/

[3] https://eelcovisser.org/blog/2021/02/08/spoofax-mip/

--

EDIT: Fixed formatting.

dcreager · on Dec 9, 2021

Hard agree, this is very much a “standing on the shoulders of giants” situation. Eelco's group has done amazing work on scope graphs, and we would not have been able to make this without that work as a foundation.

spinningslate · on Dec 9, 2021

...and I should have said: great that you acknowledged it in your blog too.

dcreager · on Dec 9, 2021

OP author here. I also gave a talk about this at Strange Loop back in October if folks want to watch/listen instead of read: https://dcreager.net/talks/2021-strange-loop/

chefandy · on Dec 9, 2021

As an aside to the technical conversation— I appreciate the use of cooking as an analogy in your code samples in the posted article.

It's not just because I was a chef! Using nouns, adjectives and verbs from (familiar) concrete hierarchical analogies makes technical writing much more accessible. It reduces cognitive load by implying more about the structure of the relationship than variables like "intVar" and functions like "printVar" tied together in entirely contrived, abstract ways. In particular, newer developers, or ones unfamiliar with your language paradigms will benefit heartily.

I implore my fellow developers to follow suit.

geoduck14 · on Dec 10, 2021

>Using nouns, adjectives and verbs from (familiar) concrete hierarchical analogies makes technical writing much more accessible.

This is a really good insight. I can remember back to my university days, a teacher used a cooking analogy to explain something that was really hard to understand- since then, the concept stuck.

dcreager · on Dec 9, 2021

Haha also I like food :-)

dsanchez97 · on Dec 9, 2021

I haven't had time to watch the full talk yet, so sorry if this is answered there.

When python resolves 'import' statements, it looks for the modules based on the PYTHONPATH. Although not done that often, it is possible to modify the PYTHONPATH at runtime, changing what an imported symbol will resolve to. How do you handle situations like that?

Just from a hypothetical stand point, someone could take advantage of this to make it seem like the library is linking to a safe implementation of a function such that when using this feature people are directed to the safe implementation. Then at runtime without the user knowing, they could dynamically change the PYTHONPATH so a malicious version of the function is loaded.

dcreager · on Dec 9, 2021

Ooh that's a good one. Right now, our lookups are only within the single repository. And so if you had two files that _could_ provide the same (fully qualified) symbol, we aren't doing any PYTHONPATH analysis to determine which one it is. We'll show you both.

We do eventually want to support cross-repository use cases, and there, the answer boils down to needing to find the set of dependencies in which to do the search. One we have that, it's no different than an in-repo case — we look for any file in any of the repos (yours and your dependencies) that could provide the symbol that we're currently looking for.

So, short version, we'd be aiming for a solution where we'd be able to show you both the “good” and “bad” definitions, and let you the user decide how to use that information.

dsanchez97 · on Dec 9, 2021

I have been doing some stuff where I analyze python code via the AST to try to figure out symbol reference so it was top of mind when I read the article. My tool works at runtime by importing the users code as module, which means all the symbols are evaluated by the python interpreter and then I can inspect the loaded module to determine the references. This is all part of a larger framework that has lifecycle rules for how/when it will load user defined code, which allows me some flexibility and information.

Even with that flexibility, there are still some things that just weren't possible because of how configurable python is at runtime. For example, someone could write a factory style class that dynamically creates python object instances based on a passed in string that represents the class the object will be of. Then they could pass user input into this factory making the created objects completely dependent on runtime input.

I would wage 99% of python written doesn't use these kinds of runtime abilities, and it probably isn't a great practice to use them in general from a maintainability point of view but they do exist. My solution to this is that if you are sophisticated enough to be using these features then you should be able to understand why my tool can't capture that information from the AST.

Not sure if that solution would work for what you are working on, but I figured I'd let you know about my experience because it can get gnarly quickly once you start thinking about all the things that are possible in python.

dcreager · on Dec 9, 2021

Yes! Those are exactly the kinds of examples that mess up this kind of analysis. Anything where the structure of your program depends on arbitrary computation: https://twitter.com/dcreager/status/1467654252516589571

dsanchez97 · on Dec 9, 2021

Yep I feel you on that making life harder for these kinds of tools!

jkaptur · on Dec 9, 2021

I think the (unstated) expectation with code navigation tools is that they are best-effort. Beyond your example, plenty of languages allow exotic and dynamic runtime behavior - eval(..)ing user/network input, monkeypatching, etc. - that makes it impossible to know a priori exactly what a call site might invoke.

billconan · on Dec 9, 2021

Thank you very much for this and for open sourcing it!

For production, is there a good database system that can index this graph structure?

for incremental update, how do you prune deprecated part of the graph (for example, removed/renamed files/functions?)

and for this example

(function_definition name: (identifier) @name) @function {

    node @function.def

    attr (@function.def) kind = "definition"

    attr (@function.def) symbol = @name

    edge @function.containing_scope -> @function.def

}

how can it guarantee the python shadowing rule? it doesn't seem to encode any order preference. does the code traverse the source file in the reverse order basically?

And, probably not closely related to stack graph, but about using tree-sitter for c/c++ understanding, how to handle the preprocessor?

because the c preprocessor can make the code look like a completely different language and mess up the parser.

And how to prune and simplify CST to AST at scale (supporting many languages)?

dcreager · on Dec 9, 2021

> For production, is there a good database system that can index this graph structure?

For awhile, we were storing this in a (very large) MySQL database, sharded with Vitess. The sharding behavior worked great (since repo ID gives you a nice sharding key), but we found that it wasn't elastic enough for our needs, since we quickly filled up the available capacity of the machines that we had reserved.

Since then we've switched over to storing this data in Azure Blob Storage, basically using it as a glorified key/value store. We had to write custom logic for deciding how to structure our data so that we can efficiently write it at index time and read it at query time, but so far it's been working quite nicely!

> for incremental update, how do you prune deprecated part of the graph

Short version is that we're storing everything on a per-file basis. So whenever a file is changed, we generate a new stack graph snippet for that file. There might be lots of content in that stack graph that is identical to the stack graph of the previous version of the file, but we don't try to do any structural sharing more fine-grained than the file.

Right now we aren't going in any pruning old files that aren't being touched by any active queries, but we could. Or move it to a colder storage tier in Blob Storage, something like that. At least for now, the marginal costs of storing the data for longer aren't our cost bottleneck.

randomswede · on Dec 13, 2021

For at least some languages, it might even be important to have access to older versions of a file.

As a concrete example, Go imports (at least for module-enabled code) is version-locked and the HEAD of the referenced code may no longer be representative of the code that would actually end up being compiled.

On the other hand, just having an easy navigational tool to get to roughly the right place is a very good help.

dcreager · on Dec 14, 2021

Right now code nav on GitHub only works within a repository, and so every link you follow keeps you within the commit that you’re already viewing. As we move to cross-repo code nav, you’re right that it will be difficult to determine the right commit to take you to when following a cross-repo link.

dcreager · on Dec 9, 2021

> And, probably not closely related to stack graph, but about using tree-sitter for c/c++ understanding, how to handle the preprocessor?

Ha yeah that's a good question. Some uses of the preprocessor won't be problematic — it would require deep token mangling, for instance, to really start to cause a problem. You can treat more basic `#ifdef` style conditional compilation as parsing/analyzing both sides and showing both as potential definitions. (And from there you could extend it further to try to identify (or define) "profiles" that have different preprocessor symbols defined, and use that to actually prune some of the results.)

billconan · on Dec 9, 2021

Thank you very much for the answers! This is a great work!

I'm thinking maybe stack graph can be used to understand the preprocessor. finding the original toggle/condition that turns on/off a #ifdef block.

I heard a simple c++ hello world contains 5000 #defines introduced by standard libs. if stack graph can improve exhaustive search somehow, that would be awesome.

dcreager · on Dec 9, 2021

> And how to prune and simplify CST to AST at scale (supporting many languages)?

We're not doing any pruning or CST→AST translation, we just operate directly on the CST. With the new graph DSL you should be able to implement something like that, since an AST is a tree, and a tree is one shape of graph that you could create. For our purposes, that isn't a meaningfully useful step, since we can just as easily generate the stack graph structures that we need directly from the CST we get from the tree-sitter grammar.

dcreager · on Dec 9, 2021

> how can it guarantee the python shadowing rule? it doesn't seem to encode any order preference. does the code traverse the source file in the reverse order basically?

That snippet of graph DSL does not show the precedences being applied, but if you look at the diagram a bit earlier in the post, you'll see that some of edges do have precedence values applied. In the graph DSL, that would appear as an additional statement in the stanza:

    attr (@function.containing_scope -> @function.def) precedence = 1

one_off_comment · on Dec 9, 2021

For people like me who don't have time to watch the talk, what's the answer to the question posed on the blog post? "Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?"

dcreager · on Dec 9, 2021

I go into some amount of detail in a talk I gave at last year's FOSDEM: https://dcreager.net/talks/2020-fosdem/

For LSP, the short version is that running separate sidecar services in production for every language that we want to support is a complete non-starter. That would completely eat up my team's time budget handling operational duties.

LSIF is a great technology that lets you run LSP servers in a “batch” mode. But we really need our analysis to be incremental, where we can reuse results for unchanged files when new commits come in. Language servers tend to do monolithic analyses, where every file needs to be reanalyzed whenever any new commit comes in. If you want to analyze your dependencies, as well, that exacerbates the problem. LSIF (the data format) has recently grown the ability to produce incremental data, but that requires language servers to work in an incremental mode as well. Very few (if any?) do, and because language servers tend to piggy-back on existing compiler technology (which is also not typically incremental), it will be a heavy lift to get incrementality into the LSP/LSIF world.

Whereas stack graphs have incrementality out of the box. (This was the primary thing that we added to “scope graphs”, the academic framework that stack graphs are built on.) It's the core algorithm (which is implemented once for all languages) where the incrementality happens. The only language-specific parts are figuring out which graph structures you need to create to mimic the name binding rules of your language.

dahart · on Dec 9, 2021

This is way cool!! I don’t have any deep questions yet, but to start I’m a bit curious about some very minor things like the name and description. I’m curious why “stack graph” as opposed to “call graph” or “callstack graph” or something like that, I’m guessing you do have some thoughts there. Also curious about the way you described it, the process overall certainly sounds a lot like parsing, compiling, and linking at the end, but you haven’t really used that analogy. I guess I’m just wondering if you’re framing the name and description carefully and if there are specific reasons you’d be willing to discuss.

dcreager · on Dec 9, 2021

Great questions! The framework is based on some great existing academic work from Eelco Visser's group at TU Delft. Their framework is called “scope graphs”: https://pl.ewi.tudelft.nl/research/projects/scope-graphs/

We extended scope graphs to have the symbol stack (described in OP) and also a “scope stack”, which allows us to support the more advanced examples that I alluded to at the end. So we chose the name “stack graphs” because it was “scope graphs but using stacks”.

ZeroCool2u · on Dec 9, 2021

Just out of curiosity, what's the timeline look like for adding precise code navigation to other supported languages and what languages do you think will get support first?

No need for precise answers, just wondering which ones we're likely to see next after Python :)

Also, I'm looking at the list of supported languages here[1]. Maybe you're not the right person to ask, but are there any plans to add support for one of the lower level / systems programming languages like C, C++, or Rust, etc?

Finally, thank you so much for you and your teams hard work. This feature is _incredibly_ helpful, especially in Python!

[1]: https://docs.github.com/en/repositories/working-with-files/u...

dcreager · on Dec 9, 2021

It's not “easy”, but we've found that because it's based on a declarative DSL it's less effort than you might expect. And we're finding that there are common patterns that you use in your graph construction rules, because there are many aspects of name binding that end up working the same way in different languages. So, hand-wavily (not out of secrecy but because of not having rigorous data yet), we're finding that it's O(months) to get a new language out the door.

We do have a couple of other languages in the pipeline that my team has been working on, both in terms of writing stack graph rules to get precise support, and also to write "fuzzy" tagging rules to get search-based support. And we definitely do plan to include lower level languages like the ones you mentioned.

Lastly, one major reason that we're doing all of this in open-source projects is that we want to ensure that language communities can self-serve support for their languages, should they wish to. That will be especially useful for the long tail of languages that my team will honestly never be able to get to ourselves. We have some work to do to get the documentation written to properly support self-serve stack graph rules, but it's definitely a goal that we're aiming for.

sterlinm · on Dec 10, 2021

Are there any publicly available examples of what it actually looks like to implement this for a given language? Is that somewhere in the linked rust crate? It seems very cool but I'm having some trouble imagining what the actual implementation for a given language would look like. Thanks!

dcreager · on Dec 14, 2021

Unfortunately not yet — we have working stack rules for Python, but they use an older internal version of the graph DSL. We’re actively working on porting it to the open source tree-sitter-graph DSL and adding it to the public tree-sitter-python grammar repo. But it’s not there yet.

ZeroCool2u · on Dec 9, 2021

Got it, that's really helpful in terms of estimating the work involved. Thanks for taking the time to answer all these questions and congrats again on the release!

munificent · on Dec 9, 2021

Apologies if you answered this in the talk, but from the slides it doesn't seem you did.

How do you handle statically typed languages where type inference (which may rely on types from other imported files) and overloading deeply interacts with name resolution? I can't see any easy way to model than in terms of a simple "parse a file at a time" model like tree sitter.

dcreager · on Dec 10, 2021

This relies on “scope stacks”, which are another piece that I didn’t really have a chance to discuss in either the blog post or Strange Loop talk. In brief, they allow you to “package up” context from one part of a file and “send it over” to another part of a (possibly different) file. We use that to model the (types of the) actual parameters passed into a function call or generic type instantiation, for instance. Scope stacks are essential for both of the more advanced examples I mention at the end of the post.

dcreager · on Dec 10, 2021

This early (and rough around the edges) design doc goes into more detail about scope stacks, and works through a couple of examples that rely on them: https://github.github.io/stack-graph-docs/

munificent · on Dec 10, 2021

Thank you! I'll dig through this when I get the time.

xvilka · on Dec 10, 2021

Meanwhile their Tree-Sitter-based semantic parser[1] looks abandoned. There is even rotting for years pull request[2] adding support of the same stack graphs into it.

[1] https://github.com/github/semantic

[2] https://github.com/github/semantic/pull/535

dcreager · on Dec 10, 2021

As mentioned elsewhere on this thread, stack graphs and Semantic were built by the same team (which I manage). Semantic is not abandoned, we've just been focusing on a different layer of our tech stack for the past year or so. https://news.ycombinator.com/item?id=29501389

That PR on the Semantic repo was our first attempt at implementing these ideas. We decided to reimplement it in a separate library (also open source, https://github.com/github/stack-graphs), which only builds on tree-sitter directly so that there's an easier story for us and language communities to add support for new languages. It's a fair point that we could have closed the Semantic PR to indicate that more clearly.

onionisafruit · on Dec 10, 2021

They’re no longer interested in anything from the pre-microsoft days.

cloogshicer · on Dec 9, 2021

I just want to say: This is an absolutely fantastic blog post. So well written, and really well done graphics that help understanding.

dcreager · on Dec 9, 2021

Thanks very much, I appreciate the kind words!

jandinter · on Dec 9, 2021

Looks neat!

I wish this was available for legal texts, making it easy to jump from one law to the referenced next legal provision. Many legal provisions, especially in very regulated areas, make use of “functions” “imported” from other, totally different laws.

Sorry for being off-topic, but if anyone knows a resource for that, I am super interested!

earth_walker · on Dec 9, 2021

I started doing this for a niche area: US and European regulations and guidance documents for Good Laboratory Practice, and later for Canadian Cannabis regulations. Basically I created a standard XML schema for regulations and parsed them into XML [1]. This allowed for e.g. presenting tables of contents and section folding, pulling and linking definitions into their own search engine, etc. [2]

I thought that I could easily write a parser for each jurisdiction's formats, and then get predicate rules and related regulations for free.

I was wrong. a) there are many jurisdictions and sub-groups all doing their own thing; and b) most don't have any standard document formatting or tagging, let alone a defined structure. Even in the most structured formats (like the US eCFR's XML) the focus is on display rather than content. In the worst cases it was just whoever wrote up the Word document chose how they numbered and formatted chapters and sections etc.

There were so many special cases that it was a huge amount of work to add or update each document, and I ended up doing a lot of categorization and fixing by hand.

[1] I know people hate XML on HN, but I did my research and had specific reasons for choosing it at the time, including human readable, nesting sections, being able to easily publish and validate a schema, etc.

[2] See ReadtheRegs.com. You can browse the definitions page without an account.

jandinter · on Dec 9, 2021

This looks great! I share your sentiment: I looked into the XML files for the published German legal texts[1], and they seem to be made for display purposes only.

[1] Table of contents for XML files: https://www.gesetze-im-internet.de/gii-toc.xml

earth_walker · on Dec 10, 2021

Crazy isn't it?

I actually pitched to the American Society of Quality Assurance a few years ago that we should be going to the various governing jurisdictions with a schema and encourage them to publish regulations in a standard format.

The benefits of treating regulations as data are enormous - not only do you have a standard way of consuming and linking regulatory requirements like in an API, you also get discoverability, the ability to make tools (syntax highlighting in legalese!), compile requirements over multiple jurisdictions, and more!

I had difficulty selling the idea among the non-computer-savvy (but technical) regulatory professionals, but I'm sure a few of you on HN can imagine the benefits of having a tree-sitter for legal code...

I could have pushed it further, taking the lead to pitch to the various regulators I work with in my consulting business, but in the end it was just too much work for a side project without interest from my peers.

jandinter · on Dec 11, 2021

> having a tree-sitter for legal code

This! I think this would enable so many services that make the legal system more approachable for many people.

physicles · on Dec 10, 2021

I completely agree: in a lot of domains, freeform human language provides far more expressive power than you actually need, or want, for communicating ideas. My IANAL understanding of legalese is that it's an attempt to constrain the use of language to be more precise, but from an outsider's point of view it looks needlessly complicated.

Could be a https://xkcd.com/793/ situation though.

earth_walker · on Dec 11, 2021

In this case I wasn't attempting to constrain the language rather than to capture the structure already implicit in the system - hierarchy of chapters, sections, clauses and sub-clauses, attributes such as definitions and exceptions, cross references, repeals and previous versions, interpretation notes, etc.

While the programmer/engineer in me likes the idea of trying to codify and constrain standard legal terms and grammar to some consistent interpretation, I do think this is an XKCD style oversimplification of a very complex system.

Though IANAL I am a "regulatory QA professional" who has to interpret intent, wording and current enforcement of various food, drug and cannabis regulations every day. It's a complete mess of spaghetti code and undefined behaviour, and worse it's the implied, imprecise and badly worded parts that turn out to be the most important.

It's a moving target of guidance documents, published inspection findings that reveal "the current thinking of the inspectorate" and "industry best practices" with no single point of reference. Not to mention the pharmacopoeia and published standards. Though there are so many ways we could improve things, I doubt you could ever actually get that ideal constrained language without turning it into a billion special cases.

It can be very frustrating to work with, especially trying to convince management why they can't do something that isn't expressly forbidden in the regulations! But this does show exactly why there's so much leaning on intent rather than precise requirements - much like tax code, organisations would and do find money-saving loopholes all the time that might put people at risk, hence the moving target of interpretation and best practices.

masklinn · on Dec 9, 2021

> I wish this was available for legal texts, making it easy to jump from one law to the referenced next legal provision. Many legal provisions, especially in very regulated areas, make use of “functions” “imported” from other, totally different laws.

I mean, it "is", to the extent that if you put in the work of hyperlinking all the things during the digitizing process they can be.

Légifrance is fairly highly (though nowhere near completely) hyperlinked for instance, here's one of the laws I selected from the front page: https://www.legifrance.gouv.fr/jorf/id/JORFTEXT000044446848

Many (though nowhere near all) the legal texts being referenced, modified, or inserted (as references, into other texts) are hyperlinked.

ova-throwaway · on Dec 9, 2021

? baby lawyer and former dev here: don't we have that anyways? E.g. on Casetext, Lexis, all the usual legal research sites.

I personally haven't encountered a situation where it was totally lacking.

jandinter · on Dec 9, 2021

Many available options seem to be based on manual annotation and, therefore, cover a limited range of all legal texts. Especially with regard to regulatory topics, those research sites usually fall short.

masklinn · on Dec 10, 2021

> Many available options seem to be based on manual annotation

I’m not sure there’s an alternative: if a reference to an other text is complete (and thus fully disambiguated) it’s reasonably easy to infer it, but if it’s only partial and thus ambiguous (e.g. Article 54) then it becomes a lot more problematic: what happens legally if the system misinterprets the reference (e.g. to the current law’s article 54 but nearby contextual clues made it clear that it was some other text’s) and the reader follows this misinterpretation?

ss108 · on Dec 9, 2021

I would be interested to know where you're encountering these issues, specifically. I'm interested in legal tech, would like to know where the gaps are

dinobones · on Dec 10, 2021

This is already a thing: https://coparse.com

ChrisMarshallNY · on Dec 9, 2021

This looks like something that's been available in Xcode (Swift) for a long time. I use it constantly. Makes navigating large codebases fairly straightforward. I'm glad to see it being made available to other languages.

I don't know that much about Python, but I think that it's sort of a "JIT compiled" language, so this capability is pretty damn impressive. Swift is compiled and has the LLVM Toolchain system. A lot of the symbol resolution and instrumentation depends on that.

I remember when MS brought GitHub, everyone was declaring that this was the end of everything, but I've been seeing GH do some really cool stuff, since.

A lot of it isn't stuff that I can use, but I can still appreciate it.

dcreager · on Dec 10, 2021

You touch on an important difference between languages. A lot of existing tooling in this space focuses on static languages, exactly because there are existing tools that you can build on to get something implemented more quickly. But, very few of those existing tools work for multiple static languages. LSP does, in that it provides a standard data API that all of the language-specific tools can implement — but you still have to implement the concepts once for each language that you want to support.

There are relatively fewer existing tools in this space for dynamic languages. (Sorbet etc for Ruby are good examples of ones that do exist.) The ones that do require a fair bit of effort to implement, because you can't piggy-back on existing bits of a compiler, since there isn't one! (The analogous parts of a runtime interpreter tend to be much harder to piggy-back on for analysis purposes, since they all have a deeply ingrained assumption that you're in the middle of trying to execute the code.)

So, the end result is that the “delta” between existing tools and what stack graphs provide is a bit bigger for a language like Python than (for instance) Go or Java. And that's a roundabout description of one of the reasons we had for targeting Python first!

_ZeD_ · on Dec 10, 2021

yeah, it's also available in eclipse, idea and whatever other IDE under the sun.

but let's reinvent the weel again :)

dcreager · on Dec 10, 2021

I talk a bit in my FOSDEM talk from last year how the “local editor” and “hosted service” versions of code navigation have enough real differences that it's not obviously best to reuse local editor solutions in a hosted service like GitHub: https://dcreager.net/talks/2020-fosdem/

alew1 · on Dec 10, 2021

Very cool!

The StrangeLoop talk includes an example where you infer that Stove() returns a Stove object. If someone writes something like `f(x).broil()`, do you need to do some kind of type inference to figure out what class f(x) is?

What cases do Stack Graphs fail to handle? (e.g., I assume dynamic modification of .__dict__ can't be tracked; are there other representative examples?)

dcreager · on Dec 10, 2021

Your `f(x)` example is similar to one of the harder examples I mention (but don't dive into) at the end of the blog post. You need a way to pass along information about `x` (such as its type) so that whatever you've constructed to model the body of `f` can use that information, should it need to. We have a notion of “scope stacks”, which live along side the symbol stacks, which we use to encode this kind of information. This early design doc goes into more detail about how scope stacks work, including a worked example that uses them: https://github.github.com/stack-graph-docs/

Dynamic modification of `__dict__` is definitely something that would be hard or impossible to track, depending on what kind of modification you're doing. If the keys are all string literals, then you could probably still get something approximate that's still useful. It's when the keys are constructed from arbitrary computation that it gets impossible, at least with the current stack graph framework. You'd have to have some way to lift that computation into the graph structure, so that the path-finding algorithm could simulate its execution. All while maintaining the zero-config and incremental requirements. https://twitter.com/dcreager/status/1467654252516589571

parhamn · on Dec 9, 2021

Are there any editors that let you edit code in a structure closer to the way the language handles it (e.g. graphs/stacks)?

You could achieve a lot more than the conventional file-per-module approach. Off the top of my head benefits could include: much easier refactoring, comments bound to specific tokens, function level versioning, much smarter git diffs, and so much more.

The mapping graphs to flat-files thing feels especially silly when I do FE dev. Manipulating JSX doms on top of the JS function stack is a constant reminder of how the flattening step feels unnecessary.

sweetsocks21 · on Dec 9, 2021

I don't know of any generic editor that can take a current (popular) language and do that. There is however a lot going on in this space, and I'll list some examples. Most are about editing the tree structure better, but some do branch out a bit more to the graph idea.

https://docs.darklang.com/structured-editing

https://hazel.org/

http://lighttable.com/

Emacs has some structural editing plugins like ParEdit

Many graph/node based editors like Blender shaders (3D), Reaktor (music)

https://www.unisonweb.org/ (this has some of the function versioning ideas)

https://enso.org/

Smalltalk

There's many many more, these are just some I remembered off the top of my head.

parhamn · on Dec 9, 2021

I appreciate the links!

hiaux0 · on Dec 9, 2021

The Dion editor [1] is trying to tackling this problem, if I understand correctly. [2] has some sweet GIFs for you.

[1]: https://dion.systems/dion_format.html

[2]: https://dion.systems/gallery.html

ball_of_lint · on Dec 9, 2021

Lisp + SLIME is probably the closest thing that exists to what you're describing.

franky47 · on Dec 10, 2021

Nice post, though the title evoked another idea: visually representing the tech stack used in repositories.

But maybe we don't need a glorification of complex stacks.

gorgoiler · on Dec 9, 2021

How do I plumb this precise code jumping into vim?

My tags files just don’t seem to cut it any more.

dcreager · on Dec 9, 2021

The most straightforward option would probably be to write an LSP wrapper around the stack graphs code. One of the people on my team wrote a very “contrib directory” version of that for internal testing, which lets us test our stack graph rules for a new language in something like VS Code before deploying to production. If someone were to write a more polished version of that, that would be a great way to get code nav into any LSP-compatible editor for any language that supports stack graphs.

Annili · on Dec 10, 2021

> One of the people on my team wrote a very “contrib directory” version of that for internal testing

If that is open-source, could you please share a link to it?

I'm curious what advantages this "stack-graphs LSP wrapper" has over existing language-specific LSP servers for an editor/IDE, maybe lower memory usage or better performance?

bbkane · on Dec 9, 2021

NeoVim has Tree-sitter and Language Server integration. I couldn't make it work for me, bit lots of people really love it!

botdan · on Dec 9, 2021

If you haven't tried again recently the neovim team has done a ton of work updating the documentation on nvim-lspconfig [1]. There's also projects like kickstart.nvim [2] which aim to provide a very simple starting point for new users. It's "batteries-included" neovim which notably includes LSP, TreeSitter, completion engines, and some basic git functionality.

[1] https://github.com/neovim/nvim-lspconfig [2] https://github.com/nvim-lua/kickstart.nvim

luxurytent · on Dec 10, 2021

Jump over to neovim with built-in lsp + treesitter .. and enjoy :)

(happy to help clarify setup/configuration)

armchairhacker · on Dec 9, 2021

is this from Github semantic (https://github.com/github/semantic)?

Seems very suspicious since it’s the same goal using the same technologies. The latest commit is 4mo ago but i assume they have a closed-source version they’ve been working on.

dcreager · on Dec 9, 2021

It's from the same team (which I am the manager of), but it's not using that same codebase. In Semantic, you would have to write Haskell code to add support for a new language, and we've found that the declarative DSLs that tree-sitter provides are a lower barrier to entry. (Semantic also uses tree-sitter for the parsing step, btw.) We do still have plans for Semantic, but our stack graph code does not live there.

[Edit] Also, the stack graph implementation is also open-source, just like Semantic, and we do our development on the core algorithms directly there. The Python extraction rules have not yet been moved over to the public tree-sitter-python language repo, but that's on the docket. Future language support would happen directly in each language's public open-source tree-sitter repo.

https://github.com/github/stack-graphs/

EgoIncarnate · on Dec 10, 2021

Are the extraction rules in the stack-graphs repository or other public repository, or in a private repository? It would aid understanding to have a fuller working example.

dcreager · on Dec 10, 2021

They're currently in a private repo, only because they use an old crusty implementation of the graph DSL. We're porting them over to the open-source graph DSL (tree-sitter/tree-sitter-graph), and we'll add them to the tree-sitter-python repo as part of that effort. As mentioned above, future language development will happen directly in the per-language tree-sitter repos using the open-source graph DSL implementation.

a-dub · on Dec 10, 2021

how does it compare to doxygen's call graphs?