How would Idris provide a solution to the situation in my example?

joel_ms · on May 22, 2016

Probably with a function with a type signature something like this (which is in Agda, another dependently typed language, but the idea holds):

    nGram : {l : Nat}{n : Fin l} -> Vec l Char -> n -> List (Vec n Char)

To break it up:

    "nGram" is the name of the function
    
    "{l : Nat}" means l is a natural number
    
    "{n : Fin l}" means n is a finite number smaller than l
    
    "Vec l Char" is the first argument, declared as a 
    vector of characters, with length l
    
    "n" is the second argument (previously declared as
    a finite number smaller than l)
    
    "List (Vec n Char)" is the return type, a list which consists 
    of vectors of characters where each vector is of length n

For an explanation on how and why this works you can check out these papers:

http://www.cse.chalmers.se/~ulfn/papers/afp08/tutorial.pdf

http://www.cse.chalmers.se/~peterd/papers/DependentTypesAtWo...

The first is more about programming in Agda while the second goes into how dependently typed languages relate to logic. Together they provide sufficient definitions of Nat, Fin, Vec and List to declare the nGram function.

If you're unfamiliar with ML-style syntax the above definiton probably looks weird, but can be written in a pseudo-java style like this:

   List<Vec<N,Char>> nGram<Nat L,Fin<L> N>(Vec<L,Char> input_string, N n){}

_skel · on May 22, 2016

This makes no sense. No language can know the value of a variable at compile time, unless that variable is a constant. If the function you have defined is passed a string s and an int n, where n is greater than len(s), there is no way the compiler can know that will happen in advance. It will be a runtime error. IllegalArgumentException is also a runtime error. You have not solved anything.

curryhoward · on May 22, 2016

Actually, joel_ms is correct; you can encode these "dynamic" properties of values in a static type system, in the same way that you can prove properties about a program on paper without actually running it. The connection between programs and proofs [1] is rather mind-blowing to learn about, and unfortunately no popular programming language supports this style of "certified" programming. But there are some interesting languages which do, such as Idris and Coq.

[1] https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

joel_ms · on May 22, 2016

Did you check out the papers I linked to? Because they explain how this "makes sense".

Dependently typed languages allow for type-level functions (actually they erase the separation between type-level and value-level). This is why you can define a type like "Fin n" meaning a number smaller than n. "Fin n" is a function that takes one argument "n" and returns a type that limits its values to the set of (natural) numbers smaller than n.

Arguably, Agda is more often used as a proof assistant, rather than as a programming language (although I've had the pleasure of watching Ulf Norell implement a limited programming language interpreter in Agda, using non-total functions, which was quite nice.) But Idris is a serious effort to create a language that's both dependently typed and performs decently when used as a regular programming language.

If you want to see some real world examples of using dependently typed languages, I would suggest looking at the CompCert C Compiler[0]. It is a C Compiler that's been formally verified using the Coq proof assistant, which is a precursor to both Agda and Idris. Coq is as far as I know the first dependently typed language.

To be fair, programming in a dependently typed language usually requires more effort upfront, but the ability to encode properties that are usually considered runtime errors into the type system does seem like an interesting future for programming languages.

[0] http://compcert.inria.fr/compcert-C.html

(edit: shout out to the excellently named sibling commenter "curryhoward", IMHO one of the most interesting results in programming language technology and type theory research!)

jestar_jokin · on May 22, 2016

Well, I don't know Idris in detail, so you're better off reading the wiki page I linked to, but I will try to explain further.

Idris allows you define types whose definition depends on a value - these are called "dependent types". This allows you to define a type such as "a pair of integers, where the second integer is greater than the first".

So, in your example, you could define an immutable String type that stores its length as a fixed value. Then, you define a function that accepts a String of length "len", and a Num (or Integer) with a value that must be less than "len". The type signature might look something like this (if I understand the syntax correctly):

    ngram : String len val => Num (a <= len) => Vect Num String

_skel · on May 22, 2016

A compiled language cannot know what the value of a variable is at compile time, unless the variable is a constant.

Therefore, there cannot possibly be a way to avoid the problem I described in my example, unless your program exists in a a closed world where there are no inputs, and the values of all variables are defined before compilation.

jestar_jokin · on May 22, 2016

Ah, you're referring to the halting problem? You're right, you can't know every possible value. The article I linked actually mentions how Idris approaches the problem:

    The word "total" invokes the totality checker[0] which will report an error if the function doesn't cover all possible cases or cannot be (automatically) proven to not enter an infinite loop.

So, it sounds like if you can't provably account for all possible inputs, the compiler will recognise this and throw an error. How? Magic! (And a solver, feeding your type constraints into the totality checker, I guess...)

(Also, bear in mind languages like Haskell use "lazy" or "non-strict" evaluation by default, where values aren't evaluated until they're actually needed; kiiiinda like Streams?)

I'm certainly not an expert on Idris, but I imagine dependent typing provides two benefits:

1) The type signature (which is separate from the implementation; kind of like an Interface in Java) can enforce constraints on your implementation. So, if you define a function with type sig of "Int a => Int b => (a + b)", your implementation must ensure that the returned value is equal to the sum of the two inputs.

2) You can look purely at the type signature to understand what a function is doing. Looking at that type signature above, I can see I want to use that function for addition, not multiplication.

Anyway, I'm absolutely not doing justice to these ideas (mostly because I'm not well acquainted with the languages). If you're curious at all, I would highly recommend taking a course on functional programming as a minimum; once I realised that names are meaningless in code, you start seeing everything as reducible to very similar type signatures, which makes you yearn for a language that lets you write your boilerplate once, in a polymorphic way (beyond generics).

[0] https://en.wikipedia.org/wiki/Termination_analysis

_skel · on May 22, 2016

> So, it sounds like if you can't provably account for all possible inputs, the compiler will recognise this and throw an error. How? Magic! (And a solver, feeding your type constraints into the totality checker, I guess...)

That would make many practical, real-world programs noncompilable. Any program which accepts data via API calls, or even reads strings from STDIN, cannot know what values are going to be passed to it before it runs. Compilers cannot predict the future.

curryhoward · on May 22, 2016

> A compiled language cannot know what the value of a variable is at compile time, unless the variable is a constant.

This statement is false. I recommend reading about dependent types, which enable exactly what you claim is impossible.

It's true that in every mainstream language, the type system is not powerful enough to reason about types that "depend" on values, such as the type of MxN matrices, sorted lists, prime numbers, etc. But don't limit your sense of what is possible by your experience with mainstream languages; most have rather limited type systems compared to what we know is possible in theory.

pka · on May 22, 2016

To explain in simpler terms, what you do is promote values to types first. I.e. when a user enters a string, you have to somehow prove, using formal logic, that n < l. After that, the proof is encoded in the types and you can use functions as ngram above that depend on this proof.

chrismonsanto · on May 22, 2016

You are incorrect. Your objection is like looking at a pen-and-paper proof that a program terminates and saying it is invalid because of the halting problem. If you want to read more about this subject, I recommend Adam Chlipala's "Certified Programming with Dependent Types": http://adam.chlipala.net/cpdt/

_skel · on May 22, 2016

> Your objection is like looking at a pen-and-paper proof that a program terminates and saying it is invalid because of the halting problem.

That is a true statement for any program that accepts input, strictly speaking. The halting problem is unsolvable.

roywiggins · on May 22, 2016

A one-state machine that immediately halts, will always halt, regardless of the input.

The halting problem cannot be solved in general, but some programs can be proven to halt (or not halt). That's how the first few Busy Beaver numbers were found.

For something more substantial: a correct program that implements a regular language recognizer will always halt eventually on any input, because it always makes progress and doesn't have any nonterminating loops. It consumes the whole input and stops.

The hard part is proving you wrote it correctly, but that's what coq et al are for.

chrismonsanto · on May 22, 2016

No, this is not true. Writing a program to check termination proofs is not difficult. Writing a program to construct termination proofs isn't possible in the general case. If someone hands you a formal pen-and-paper proof of termination it is quite possible to check it for correctness.

Analogous statements are true for programming in a dependently typed language.