Egg: E-Graphs Good

mannykannot · on March 26, 2021

If, like me, you have never heard of e-graphs, this takes you straight to an introduction:

https://docs.rs/egg/0.6.0/egg/tutorials/_01_background/index...

ztatlock · on March 26, 2021

Tate's POPL 2009 talk is also a great intro and companion to Willsey's POPL 2021 talk.

https://www.youtube.com/watch?v=LKELTEOFY-s https://www.cs.cornell.edu/~ross/publications/eqsat/

verdverm · on March 26, 2021

And the paper for reference

https://arxiv.org/abs/2004.03082

philzook · on March 26, 2021

Glad to see this getting play on here. Egg presents a very compelling algorithm and library for equational reasoning over syntax trees that I've been pretty fascinated with over the past couple months.

These e-graph data structures are in the heart of SMT solvers but giving easier access to end users and specializing the algorithm to the different needs of optimizing rewrites really changes the game.

There's some cool applications linked on the site that I recommend people check out. A kind of neat thing about it being in rust is that you can compile to WASM to throw stuff on the web. I had a little demo of using egg to rewrite some category theory on my blog here https://www.philipzucker.com/rust-category/

tinco · on March 26, 2021

So theory is not my strongest suit, maybe someone can clarify some things for me here. It seems like these equalities that we're looking for are things like the identity laws, but also extended by mathematical operations like bitshift being faster than division by 2.

Do these equalities carry all the way into general programming, to the point where we could for example say a short loop is equivalent to an unrolled loop, and thus suggest that optimization? Would we have to come up with an exhaustive list of equivalences, or is the list actually short and well known like they are in logic (Identity Laws)?

Will we at some point basically plug Egg into LLVM and have it find new optimization passes?

WJW · on March 26, 2021

Not necessarily new optimization passes, but currently the order of optimizations is more or less fixed (you can technically rearrange them between builds but almost nobody does this AFAIK except that one Uber dev optimizing iOS app size). Even worse, there is no globally optimal order as what is good for one codebase might not be optimal for another. E-graphs can find the optimum arrangement for all codebases without requiring excessive amounts of memory and so could produce better output from compilers.

They won't find new equivalences on their own though. I doubt that is even possible, you can't in general prove equivalence of functions without also solving the halting problem. Also, even just defining the equivalences we know about are often only valid in certain circumstances that I doubt a computer could figure out in the general case.

UncleEntity · on March 26, 2021

> They won't find new equivalences on their own though.

From Tate et al 2009:

Better yet, because optimizations are allowed to freely interact during equality saturation, without any consideration for ordering, our approach can discover intricate optimization opportunities that compiler writers may not have anticipated, and hence would not have implemented in a general purpose compiler.

webmaven · on March 28, 2021

> > They won't find new equivalences on their own though.

> From Tate et al 2009:

> Better yet, because optimizations are allowed to freely interact during equality saturation, without any consideration for ordering, our approach can discover intricate optimization opportunities that compiler writers may not have anticipated, and hence would not have implemented in a general purpose compiler.

That's not finding new equivalences, it is finding known equivalences that are nested or composed. I suppose you could look at it as a form of unobfuscating the equivalences.

tinco · on March 26, 2021

Ah that makes sense. So in LLVM's case, an optimization pass would be an equivalence.

gue5t · on March 26, 2021

Yes; see also the vellvm project, which develops proofs of a number of the equivalences LLVM uses: https://www.cis.upenn.edu/~stevez/vellvm/

HelloNurse · on March 27, 2021

The equivalences (between equivalent programs) correspond to specific rewrite rules. Equality saturation allows the consolidation of many such rules into a single optimization pass, without complicated ordering and assumptions, instead of arbitrary grouping that is likely to neglect opportunities.

nbcomplete · on March 26, 2021

I love this paper. Beyond including an interesting contribution, they give a fantastic introduction to the underlying ideas and algorithm that forms the background. Even if you only have time for the first few pages, it's a worthwhile read.

beaconstudios · on March 26, 2021

ooh this is serendipitous, I'm going to need to do some deep research into term rewriting soon.

mentally, I'm modelling term rewriting as a state space search where each edge transition is a single possible rewrite. As always with state space search, the goal is to find the global optimum.

It's funny how many problems come back to search heuristics.