Thanks! :) We should be very clear that the bulk of the work is Aaron Bembenek's...

chubot · on Aug 9, 2020

Ah OK thanks for the link. Since it depends on commercial software, I don't see a path to trying it (which is fine, because I probably don't have time anyway :-/ )

So are you saying that it's more conventional to serialize relations from C++ or Python, rather than serialize an AST as I was suggesting?

Your blog post mentions ASTs too, so I'm not quite clear on that point. I don't have much experience writing such analyzers, and I'd be interested if there is any wisdom / examples on serializing ASTs vs. relations, and if the relations are at the "same level" as the AST, or a higher level of abstraction, etc.

-----

FWIW I read a bunch of the papers by Yannis because I'm interested in experiences of using high level languages in production:

https://news.ycombinator.com/item?id=21666658

After all the reason I like shell and R is that they're both higher level than Python or JS :)

I like short code, and Datalog seems interesting in that regard. I also hacked on this Python type inferencer in Prolog awhile ago:

https://github.com/andychu/hatlog

I did get hung up on writing simple pure functions in Prolog. There seems to be a debate over whether unification "deserves" its own first-class language, or whether it should be a library in a bigger language, and after that experience, I would lean toward the latter. I didn't really see the light in Prolog. Error messages were a problem -- for the user of the program, and for the developer of the program (me).

So while I haven't looked at Formulog yet, it definitely seems like a good idea to marry some "normal" programming conveniences with Datalog!

mgreenbe · on Aug 10, 2020

I'd say it's conventional to reuse an existing parser to generate facts.

The AST point is a subtle one. Classic Datalog (the thing that characterizes PTIME computation) doesn't have "constructors" like the ADTs (algebraic data types) we use in Formulog to define ASTs. Datalog doesn't even have records, like Soufflé. So instead you'll get facts like:

``` next_insn(i0, i1). insn_type(i0, alloc). alloc_size(i0, 8). insn_type(i1, move). insn_dest(i1, i0). insn_value(i1, 10). ```

to characterize instructions like:

``` %0 = alloc(8) %0 = 10 ```

I'm not sure if that's you mean by serializing relations. But having ASTs in your language is a boon: rather than having dozens of EDB relations to store information about your program, you can just say what it is:

``` next_insn(i0, i1). insn_is(i0, alloc(8)). insn_is(i1, move(i0, 10)). ```

As for your point about Prolog, it's a tricky thing: the interface between tools like compilers and the analyses they run is interesting, but not necessarily interesting enough to publish about. So folks just... don't work on that part, as far as I can tell. But I'm very curious about how to have an efficient EDB, what it looks like to send queries to an engine, and other modes of computation that might relax monotonicity (e.g., making multiple queries to a Datalog solver, where facts might start out true in one "round" of computation and then become false in a later "round"). Query-based compilers (e.g., https://ollef.github.io/blog/posts/query-based-compilers.htm...) could be a good place to connect the dots here, as could language servers.