Introduction to Datalog

JoelMcCracken · on March 18, 2019

Can anyone recommend any implementation of Datalog (+ negation) that is not datomic?

I haven't tried datascript, which appears to support negation. Maybe I will try that if/when I revisit this interest someday.

x775 · on March 18, 2019

Hi Joel.

You can give http://www.dlvsystem.com/dlv/ a shot!

Alternatively, if you prefer open-source solutions, check https://abcdatalog.seas.harvard.edu/.

Due to a number of complications on my machine, I used DLV.

JoelMcCracken · on March 18, 2019

Ah, I usually automatically pass on closed source solutions (hence my dislike for datomic).

TY for the link to abcdatalog though!

qeshi · on March 18, 2019

Check out Datahike aswell, if you are interested in a durable datalog database.

https://github.com/replikativ/datahike

JoelMcCracken · on March 18, 2019

Cool, ty! Seems similar to DataScript.

stingraycharles · on March 18, 2019

It’s actually a port of Datascript — consider it a durable Datascript.

felixyz · on March 18, 2019

If you're interested in experimenting with Datalog rather than necessarily writing something for production, have a look at Datalog Educational System: http://des.sourceforge.net/

riza_on · on March 18, 2019

I have been playing with Souffle for my datalog-based static analysis project

https://souffle-lang.github.io/docs/home/

sdbrady · on March 18, 2019

Thanks for sharing. There is one very significant conceptual error early on, however, and it is captured first in this statement: "The :- means if and only if, or iff". `:-` means if - or more precisely represents material conditional - where the consequent is on the left and the antecedent is on the right. iff is logical biconditional.

YeGoblynQueenne · on March 18, 2019

Indeed, ":-" is meant to represent the left-facing arrow of implication. In logic programming papers it is common to typeset it as an actual arrow, for example:

  p(X,Y) ← q(Y,X)

etc.

x775 · on March 18, 2019

Hi Stephen. You are absolutely right, thank you for the feedback! I have edited accordingly.

maweki · on March 18, 2019

I always remind myself that it's not an if-and-only-if with this argument: since there could always be another rule that is satisfied to make that fact true, it can't be iff.

hombre_fatal · on March 18, 2019

Pretty cool deep-dive on Datalog.

The interactive tutorials on http://www.learndatalogtoday.org (Datomic's dialect) quickly sold me on the idea.

Though coming from Datomic, I'm curious how much of my knowledge is Datomic-specific rather than how you'd generally approach a database queryable with Datalog. For example, do you need four indexes like Datomic (https://docs.datomic.com/on-prem/indexes.html) to make Datalog queries fast?

nutjob2 · on March 18, 2019

If you have a lot of facts then you need indexes, otherwise you're scanning a lot of irrelevant data, many times over.

hombre_fatal · on March 18, 2019

Sure, was just curious where Datomic's EAV, AEV, AVE, VAE indexes fall between Datomic indexing impl detail and general Datalog indexing solution.

Datalog is fascinating, but the blog post makes me curious about more concrete impl-related follow-up questions.

nmadden · on March 18, 2019

Great post! Still working through it, but there is a slight error in the nested diagram at the start. Relational algebra has set difference, which is akin to negation-as-failure, but it lacks recursion. So the positive Datalog and RA circles should overlap without either containing the other. See http://www.lifl.fr/%7Ekuttler/elfe/biblio/datalog-overview-g...

x775 · on March 18, 2019

Hi there. Thanks so much for your feedback, and good catch! I will update the diagram accordingly.

x775 · on March 19, 2019

I can no longer edit the parent, but this has been updated. Thanks again!

burakemir · on March 18, 2019

Nice post. Still, I find the most accessible article describing datalog is "What you Always Wanted to Know About Datalog (And Never Dared to Ask)." by Ceri, Gottlob, Tanca (1989)

x775 · on March 18, 2019

Thanks for your feedback!

For those interested in the mentioned paper, see: https://www.utdallas.edu/~gupta/courses/acl/papers/datalog-p...

radomir_cernoch · on March 18, 2019

> The :- means if and only if, or iff.

Is it really the case?

  Human("Socrates").
  Animal("Turtle").
  Mortal(x) :- Human(x).
  Mortal(x) :- Animal(x).

Suppose :- means iff. Turtle is Mortal (lines 2+4, implication to the left). Because Turtle is Mortal, it must be a Human (line 3, implication to the right).

Is it really valid according to Datalog semantics?

x775 · on March 18, 2019

Hi! Thank you for highlighting this.

No, you and sdbrady who commented above are correct; the :- only means "if". I have edited accordingly and apologise for the misunderstanding!

bobjordan · on March 19, 2019

Sharing an interesting implementation in python which I stumbled upon yesterday. Repo: https://github.com/pcarbonn/pyDatalog Tutorial: https://sites.google.com/site/pydatalog/Online-datalog-tutor...

ComNik · on March 18, 2019

If model-theoretic semantics and the various ways to slice, dice, and extend Datalog are interesting to you, then almost any talk by Peter Alvaro might be as well.

In particular: https://www.youtube.com/watch?v=R2Aa4PivG0g

fspeech · on March 18, 2019

Can you specify symmetric and transitive closure in Datalog?

brian_cloutier · on March 18, 2019

Transitive closure is the first thing nearly every introduction to datalog (or Prolog, for that matter) will show you. All you had to do was click the link and scroll down:

    Edge("a", "b").
    Edge("b", "c").

    Path(x, y) :-
        Edge(x, y).

    Path(x, z) :-
        Path(x, y),
        Edge(y, z).

Symmetric closure, assuming I'm understanding correctly, is also trivial:

    SymmetricEdge(Left, Right) :-
        Edge(Left, Right).
    SymmetricEdge(Left, Right) :-
        Edge(Right, Left).

fspeech · on March 18, 2019

I was wondering about termination. With finite ground facts that appears not to be an issue. Complexity is another matter.

cfallin · on March 18, 2019

Re: complexity, read up on "semi-naïve evaluation" for the usual execution strategy, which basically involves keeping lists of newly-added tuples in each iteration of the fixpoint loop and only processing deltas related to those new additions. This, combined with good index inference for each relation (predicate), so that e.g. the node-neighbor lookup is O(log |V|), should result in efficient execution...

JadeNB · on March 18, 2019

Termination is achieved in brian_cloutier's example (https://news.ycombinator.com/item?id=19423320) by the use of different clauses (e.g., `Path` vs. `Edge`), so that all recursion is productive.

maweki · on March 18, 2019

Datalog always terminates because there is no way to derive new atoms.

brian_cloutier · on March 18, 2019

Yes, but there's more to it than just that. It's possible to write Prolog programs which never derive new atoms yet still fail to terminate.

It might be more accurate to say that Datalog programs cannot derive new atoms and this allows Datalog interpreters to use a search strategy which is guaranteed to terminate.

EDIT: Thinking about this more, I'm not sure why Prolog couldn't also use breadth-first search. So maybe both are necessary: Datalog not only disallows creating new atoms, but it also has a better search strategy; the combination of the two results in guaranteed termination.

maweki · on March 18, 2019

Datalog does not have a search strategy per se. The general "strategy" (or definition) is the fixpoint of the T_p operator that derives all one-step-derivable facts from the facts and the rules.

The minimal model (result of derivation) is defined as the fixpoint of this operation. And the minimal model is finite because of the fixed number of atoms.

Every other evaluation strategy must be equivalent to this. Both in terms of termination and minimal model.

Prolog could use breadth first instead of SLD (and there are other Prolog evaluation strategies that, for example, use tabling for derived facts) but then you might get an infinite number of backtracking points instead of terms of infinite depth. You haven't really won anything by doing that.

YeGoblynQueenne · on March 18, 2019

My understanding is that Datalog terminates because it does not allow functions as arguments to predicates. No functions means it's not possible to create infinite terms:

  P(f(x)).
  p(f(f(x)).
  p(f(f(f(x)))).
  ... etc

In fact I understand that termination is guaranteed even if a datalog program is executed by a Prolog interpreter. Or in other words, it's a result of the language semantics, not its implementation.

(but I might be wrong about this- corrections welcome).

maweki · on March 18, 2019

You are wrong.

p(X) :- p(X). does usually not terminate in a top-down (Prolog) system but does in a bottom-up (Datalog) system. Datalog doesn't even allow terms as predicate arguments so of course you can't construct them of infinite size. But also this isn't allowed (where it is in Prolog)

p(X) :- p(Y), X is Y + 1.

Given a fact p(0) you get p(X) true for any integer X or generate all positive integers. This doesn't terminate for some binding patterns.

YeGoblynQueenne · on March 18, 2019

>> p(X) :- p(Y), X is Y + 1.

is/2 is not Datalog though- it takes an arithmetic function as an argument; +/2 in your example. So the above would not terminate because it's Prolog, not because it's a Datalog program evaluated by Prolog.

>> Datalog doesn't even allow terms as predicate arguments so of course you can't construct them of infinite size.

Usual confusion: I think you mean "terms" as in Prolog terms, i.e. atoms of predicates (as opposed to terms in logic programming, i.e. variables, constants and functions). Datalog accepts constants and variables so with a finite vocabulary you can't create infinite atoms. I don't think that has to do with bottom-up or top-down evaluation.

To be honest, I haven't read any Datalog papers, but I'd be surprised if the termination guaranteed rested upon a specific implementation rather than the language semantics. Maybe not surprised- but it would be less interersting if you can only guarantee termination if you implement the language just so, vs. if it's a property of the language.

maweki · on March 18, 2019

The minimal herbrandt model does not have new atoms. Besides what is given in the rules and the starting facts (empty for least fix point).

So the "biggest" model is only as large as any atom in any position for any predicate and that is always finite for a finite number of sorts or predicates.

Is gave is/2 as an example of how Prologo allows you to generate new atoms/terms that have not been facts before.

YeGoblynQueenne · on March 18, 2019

Apologies- I misunderstood your example with is/2.

slifin · on March 18, 2019

Thank you, bookmarking this for later