Dhall 1.0.1: a total programming language specialized to configuration files

dozzie · on Dec 5, 2016

Reaction of a sysadmin: Nice. Let's not.

Configuration file should be passive data, not a programming language, and as easy to process in variety of languages as possible.

avmich · on Dec 6, 2016

I've got some experience working with similar configs using a DSL which look similar to a restricted Lisp.

> Configuration file should be passive data, not a programming language

Some interesting config file formats now represent nested lists. That's enough for S expressions representation.

It's arguable what configuration process does. Especially if it's (safely) not Turing-complete. Setting "passive" values of some parameters in today's configs could lead to quite complex changes in the configured program behavior.

> and as easy to process in variety of languages as possible.

S expressions are parsed in many languages e.g. here - http://rosettacode.org/wiki/S-Expressions . Don't see a problem with processing.

wintermute42 · on Dec 5, 2016

It's not a programming language, it's not turing complete and it is guaranteed to terminate. So what's the problem?

dozzie · on Dec 6, 2016

Still too complex for configuration. And at this point, only processable in Haskell.

shakna · on Dec 6, 2016

> Evaluation always terminates, no exceptions

How do you know for certain?

Where's the proof for the Halting Problem in this case?

Claiming safety through termination is nice, but not something that is simple to solve.

aseipp · on Dec 6, 2016

The language isn't Turing complete, and it offers no way to encode general recursion in any form (and it isn't offered by the 'compiler' as a primitive). The language is "strongly normalizing", meaning every term and program can always be reduced to a final "normal form", where no more reductions are possible (like how '2' is the normal form of '1+1') and this normal form exists for every program which can type-check. Therefore, the 'compiler' always terminates as well, when it is asked to evaluate the program. You don't even need to get to the point of talking about the halting problem, because the tool isn't asking if any particular program terminates. It does not need to. All programs that can be expressed in the language, by design, terminate.

But the details are more subtle. In particular you can write a configuration expression that takes 10 years to compute, but it's probably less likely you'll do this. So you also need a way of asking "Is this in normal form" too, for things like explicitly untrusted inputs. But that's pretty doable since you can traverse the input and easily see if any terms haven't been reduced.

shakna · on Dec 6, 2016

> it offers no way to encode general recursion in any form

I'd need to look into that a bit more, but at the moment I don't believe that is actually true.

Dhall provides anonymous functions, or the basics of lambda calculus.

This probably means that I can encode the Y-Combinator in it, resulting in anonymous recursion, though unless functions are first class, it'd by 500+ LOC.

Again, I haven't reviewed the internals enough to see, but if you have functions, that's normally enough to build infinite anonymous recursion.

Even if it isn't possible for some reason I haven't seen, claiming safety in that fashion probably isn't appropriate.

I'm also not certain that it isn't Turing Complete.

The C Pre-Processor doesn't supply a way to encode general recursion, but as we can see here [0], it is entirely possible, though to a limited depth. Leading to some debate as to whether or not it is Turing Complete.

If a configuration program created by Dhall is not guaranteed to terminate in a timely manner, then you completely lose the safety there.

Reduces to a normal fashion is completely useless as a guarantee if it takes 10 years, and is one of the reasons that simple markup/notation forms like YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather than something that can explode into O(n^2) time.

[0] https://news.ycombinator.com/item?id=4795542

aseipp · on Dec 6, 2016

> This probably means that I can encode the Y-Combinator in it, resulting in anonymous recursion, though unless functions are first class, it'd by 500+ LOC.

Try writing that function. It has a type like '(a -> a) -> a', which you need to implement. You should try it, but I'll spoil it for you: Dhall has no recursive types, nor does it have recursive values. The recursive Y-combinator cannot be written in Dhall itself, it must be provided by the implementation. You can write this in Haskell using open recursion/data types, but only because Haskell allows equirecursive types, and it's already inconsistent anyway. Dhall has neither of these properties: no form of recursion exists at all.

This is a property of System F, the core itself, which is quite well established.

That said, some weird bug in the implementation itself may cause some weird behavior (like most software). But System F itself? It's not turing complete and this is well known for a very long time as a mathematical result. And having written my own implementation of the Calculus of Constructions: it's quite easy to get the core logic correctly implemented. In fact, with a few abstractions, you can correctly implement the Calculus of Constructions -- which is even more powerful than Dhall -- in about ~50 lines of Haskell code.

> Again, I haven't reviewed the internals enough to see, but if you have functions, that's normally enough to build infinite anonymous recursion.

No, it isn't. Functions have nothing to do with it, at all. In the world of lambda calculus/natural deduction/logic, the relevant bit is not functions, it is the question: can I construct any possible value, out of thin air? Or, in other words, can I create a value of type 'forall a. a', somehow? Can I create a paradox -- a logical inconsistency? In the unified world of programming languages & logic, "general recusion" is actually a form of logical inconsistency inside the system, which allows you to prove anything -- of the kind Godel described.

This is, roughly speaking, the equivalent process to determining some logical system like ZFC is actually inconsistent: such as proving `True = False` inside ZFC with nothing else. If you can create that proof out of nothing but the basics, you've created something that shouldn't exist. It is a contradiction. If you can do this, you have proven inconsistency in the underlying theory -- because you can prove anything if you already have proven 'True = False' https://en.wikipedia.org/wiki/Principle_of_explosion

And, really, it is not "roughly" the same thing -- it is the exact same thing. To find a value of type 'forall a. a', in fact, is the exact same principle as proving or showing a paradox/inconsistency in an ordinary logic.

If what you said was true, that "functions lead to turing completeness", then the existence of modus ponens -- modus ponens being the logical equivalent of "function type" in the lambda calculus -- would imply that systems like ZFC are inconsistent. But modus ponens and inconsistency have nothing to do with each other, and the fact a theory contains Modus Ponens as a valid judgement rule does NOT mean it is inconsistent.

For example, the simply typed lambda calculus has functions. But it does not have any form of polymorphism, or type variables. Thus, the general Y combinator is untypeable in this language and cannot be written, and the STLC is not Turing complete: you cannot write a function of type 'forall a. (a -> a) -> a' inside the language. It's not even possible to abstract over types, because there are no type variables! Picking a specific type 'a' means nothing at all: '(Bool -> Bool) -> Bool' is extremely easy to implement and not inconsistent in any way ('fix k = k True', done, and perfectly consistent).

It's the "for every 'a'..." part, along with the ability to deduce 'forall a' (somehow, someway), that leads to general recursion. If you want the STLC to have recursion, your compiler must implement a magic version of 'fix' on its own.

> I'm also not certain that it isn't Turing Complete.

You should be, because it isn't. There's ample research on the design of typed programming languages, the Lambda Cube, System F, and turing completeness.

A good book to start with, just for the design of these systems, is "Types and Programming Languages" by Benjamin Pierce.

> The C Pre-Processor doesn't supply a way to encode general recursion, but as we can see here [0], it is entirely possible, though to a limited depth. Leading to some debate as to whether or not it is Turing Complete.

No, it doesn't lead to anything. The C preprocessor isn't turing complete. That's it. There is no scientific debate to be had about this, anymore than there is a scientific "debate" about what temperature water boils at. The replies to the post you quoted illustrate this directly and express the difference. Whether or not "it can recurse 10,000,000 if I allow it to do so by passing a flag, and 10,000,001 times if I changed the number 10,000,000 to 10,000,0001" is rather besides the point of whether it's Turing complete.

> Reduces to a normal fashion is completely useless as a guarantee if it takes 10 years, and is one of the reasons that simple markup/notation forms like YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather than something that can explode into O(n^2) time.

Correct. This is why being able to ask if any configuration is normalized is important (and asking this can be done quickly by simply seeing if there are any unreduced terms, rather than trying to actively reduce them). But let's not pretend YAML and JSON don't have their deficiencies in similar realms -- O(1) is nice and all, yet YAML is actually relatively complex to the point of introducing security vulnerabilities, and I've encountered my fare share of deficiencies/inconsistencies in JSON/YAML parsers.

You must also consider the usage scenarios. Purely for configuration, this is less likely to matter: you're unlikely to write a purposefully inconsistent/bizarre YAML file that fucks up your parser, much like you're unlikely to write a Dhall configuration which takes 10 years to reduce when evaluated. But when you are accepting arbitrary, untrusted inputs - YAML parsing security matters, just like Dhall evaluation time, and the ability to ask "is this normal form", does.

In most cases I don't see the differences or threat vectors being truly, meaningfully distinct, at all. If you trust your input sources, given the general way configuration files are used -- the evaluation time isn't really a problem. YAML O(1) or whatever doesn't really matter much either; it's generally a fixed cost anyway. But if you don't trust your input sources -- YAML should scare you just as much as Dhall.

That said, I don't think Dhall will offer a substantial benefit for most people who are already using YAML. So it's not like you should switch. To me it's sorta neat, but probably something I'd never use extensively just due to adoption factors.

But I'm afraid you're deeply mistaken on the design of the language, and don't seem to know much about programming language theory, at a glance. I don't blame you, since it's a rather dry field. But, to put it in perspective: "System F and the Lambda Cube alone are not turing complete, and always normalize" is so trivially obvious in the PLT world that if you asked someone to "prove it" to you, they would probably say "It's true by definition of the system itself, what are you even asking" before staring at you blankly (or, they realize you don't know the field, and that they're going to have to start from square 1, as far as computability/language theory goes, to explain why this is the case.) It's an old and well accepted result.

TAPL, the book I mentioned earlier is a good book to introduce yourself to many concepts like this and should help clear up any confusion you might have.

shakna · on Dec 7, 2016

> This is a property of System F, the core itself, which is quite well established.

So when I repeatedly asked for the aspect of the language design that guaranteed this safety, and you responded with "the design guarantees it", you meant polymorphic lambda calculus, that, as it lacks infinite terms, preventing the creation of recursion through unfolding.

Clearly stating the reliance of System F would go some way towards generating trust in the definition of the language.

> There is no scientific debate to be had about this, anymore than there is a scientific "debate" about what temperature water boils at.

A language is considered to be Turing complete if it can calculate any computable value that a Turing machine can.

Brainfuck was proven to be Turing complete by Muller, and languages such as Piet have proven their own completeness by emulating or translating arbitrary Brainfuck programs.

So, I'm afraid the question in regards to the CPP, is only is limited recursion acceptable for Turing completeness?

The Turing-Church thesis specifically states that it should ignore resource limitations, which leads to the question of whether the limited recursion is a result of available resource or the language at hand.

There are active debates on this topic, and it cannot be so readily dismissed.

However, the point I raised with this, is the claimed safety of Dhall, in that it doesn't guarantee timely normalisation of its terms.

Which I believe you may have also noticed. [0]

> much like you're unlikely to write a Dhall configuration which takes 10 years to reduce when evaluated.

True, however the guarantee of evaluation safety is false.

There is no safety there when you can run out the available resource, such as time.

And people do insane, unlikely things with a language, such as breaching the type safety of Scala, and in response, they developed DOT calculus, so that they might give those guarantees.

Trusting a programmer not to do something stupid, is often the wrong thing to do. [1]

There is a line which must be drawn, but you can't make guarantees that mislead the programmer to feel safe.

[0] https://github.com/Gabriel439/Haskell-Dhall-Library/issues/6 [1] https://github.com/kanaka/mal/tree/master/awk

Gabriel439 · on Dec 7, 2016

Author of Dhall here: while Dhall can't guarantee timely normalization Dhall can guarantee timely checking if a term is normalized and you can write a program using the Dhall library (or add a flag to the default Dhall compiler) that rejects imports that aren't in normal form. Checking for normal form is linear in the size of the source code.

As others have mentioned, Dhall is based on System F, which guarantees that all programs halt. You can't write the Y combinator or any other infinitely looping lambda calculus term in Dhall because it won't type check. The typical example of an infinitely looping lambda term is the following untyped lambda calculus term:

    (λx → x x) (λx → x x)

... which won't type-check in Dhall. It's actually not even legal Dhall code because you have to declare the type of all bound variables but there is no type you can assign to x that will make that type check. Even in a language with type inference it still won't type check as you will get a unification error

The theory behind Dhall's termination guarantees is sound. If a program loops indefinitely in Dhall it can only be because of an implementation bug causing Dhall not to match the underlying theoretical foundation. The theory itself is unbreakable.