Hacker News new | past | comments | ask | show | jobs | submit login
Idris 2 0.6.0 is now available for the JVM (github.com/mmhelloworld)
162 points by ska80 on Feb 27, 2023 | hide | past | favorite | 64 comments



It would be nice if this linked to the README rather than a dump of the commit log, which makes it hard to know what's new or interesting in this release. It would be double nice if the readme talked about JVM language interop or performance, as that's presumably the main selling points of running Idris on the JVM.


Hello, I am the author of Idris JVM. Thank you for your feedback. I will update the README to include a link for JVM interop. For now, this file https://github.com/mmhelloworld/idris-jvm/blob/main/libs/bas... , demonstrates how the interop looks like currently. The %foreign directive takes an FFI descriptor to talk to the respective backends. Here you can see few scheme and JVM descriptors starting with "scheme:" and "jvm:" respectively. The JVM descriptors there also show constructor calls `<init>`, instance methods `.lock` and static methods `getThreadData`.


Idris 2 is a purely functional programming language with first class types. This repository provides Idris 2 compiler targeting JVM bytecode so that Idris 2 compiler and Idris 2 programs can run on the JVM


To slightly elaborate: when this comment say "first class types" they're referring to dependent types. For programmers who are not aware: think about functions in functional programming languages. They're "first class citizens" in the sense that there is no difference between functions and any other values, such as `3` or `"hello"`. In dependently types languages, there is no difference between types and other values. Types are just any other value. For example, normally `Int` would be a type, but in these kind of programming languages `x: Int, x > 5` would also be a type even though the type depends on the value `x`.

In terms of software engineering, one bonus you get is being able to encode your unittests in your type system. For example, you can do (in Agda):

  _ : myFunction x y z == 33
  _ = refl -- refl is of type `x == x`
and compiler will run this test for you at compile-time while type-checking your program.

Two popular "practical" dependently typed languages are Agda and Idris. Of these two, Idris is relatively designed for software engineers, whereas Agda also wants to be a state-of-the-art automated theorem proving tool. I would strongly recommend checking both if you want to explore this area of programming languages. I wrote practical programs in Agda (e.g. language parsers, JSON parser etc) and I think it can make a programmer extremely productive, if you like this kind of workflow. Correctness is very easy to reason in these languages, and although it may look daunting at first, writing practical programs are very approachable (about as hard, maybe ever-so-slightly harder than writing code in Haskell). Have fun!


So I could also write all my code purely in the type system too, right?


Yes, all computation done at runtime can be done at compile time (within type system).

This is a problem in some cases e.g. if you write an infinite loop, this would mean compilation will infinitely loop. Different languages deal with this problem differently. E.g. in Agda, there is a "safe" subset of language that is not Turing Complete such that all programs can be algorithmically proven to be halting. But you can work around this via pragmas. Speaking from experience, the safe subset of Agda is all you need to write useful programs. You just need a tiny shell (maybe only few lines of code) that will handle IO and Haskell FFI. The rest of the code will be purely functional, safe, Turing Incomplete Agda.


I am wondering what kind of implications there are for Idris 2 to run on the JVM. For example: Does it still have TCO on the JVM? What about data structures? Does it have a set of functional data structures implemented, on top of Java ones? Or is it all implemented on top of Idris 2 "primitives"?


In the JVM backend, recursive calls are eliminated with loop. There is an option to enable trampoline for functions but I am currently working on replacing mutual recursion with loops so with that the trampoline option might be removed. As for data structures, there are Idris data structures like `List`, `SortedSet`, `SortedMap` etc. but only Idris `List` currently implements Java `List` interface.


Naturally it doesn't have TCO on JVM, at least not natively as bytecode representation for it.

The only possibility is to rewrite the code in similar way as clojure and Scala do it.


Which, to be clear, is not a zero-cost mechanical translation, and can be tricky and annoying when you start talking about things like generic monad transformer stacks.


I think TCO and JVM are orthogonal discussions -- by the time the bytecode is generated, the TCO may have already been performed.


I'm actually not sure about that. I think the Javascript compiler target also lacks full TCO. The Idris 2 compiler itself I think does not perform this optimization, because the Scheme reference backend does it already. I might be wrong about that, but I remember several active unresolved discussions on this topic in the community last year. I haven't been involved actively in it for several months, so my memory might be wrong.


The javascript back end handles mutually recursive tail calls. It finds connected components and rewrites them to use a trampoline-like thing. There is a nice explanation of how it works at the top of the file:

https://github.com/idris-lang/Idris2/blob/main/src/Compiler/...

One could argue that it's not "full" TCO, but it does cover a lot of use-cases. That rewrites to NamedCExp, so it could be reused by other backends.

The Java backend seems to be doing its own thing. There aren't a lot of comments or documentation in that repository, but I do see evidence that it's doing something interesting in src/Compiler/Jvm/Optimizer.idr:

    Pure $ if shouldTrampoline && hasNonSelfTailCall tailCallCategory
      then trampolineExpression True inlinedAndTailRecursionMarkedExpr
      else inlinedAndTailRecursionMarkedExpr


This link to the latest readme.md may be helpful.

https://github.com/mmhelloworld/idris-jvm/blob/latest/README...


Never seen this language before, but I can say I _really_ like the syntax. Just looking at it it's pretty easy to figure out what they were trying to do.


Have you used Haskell before? It seems almost identical syntactically at a glance, so I'm curious if that's what you mean by "what they were trying to do" or if it's your first time seeing syntax like this at all.


How does it interop with the existing Java libraries?


This file https://github.com/mmhelloworld/idris-jvm/blob/main/libs/bas... demonstrates how the interop looks like currently. The `%foreign` directive takes an FFI descriptor to talk to the respective backends. Here you can see few scheme and JVM descriptors starting with "scheme:" and "jvm:" respectively. The JVM descriptors there show constructor calls `<init>`, instance methods `.lock` and static methods `getThreadData`.


I can't speak for the JVM backend specifically, but in general Idris 2 code can call arbitrary functions in the backend with the "%foreign" pragma. So hopefully that would be supported for Java, although I don't know how it would work with classes.


saw that idris compiled to C . How realistic is it to build a small core library in idris that would then have hooks to external components in other PL using C as a bridging layer ?


Idris 2 does compile to C, but it uses its own reference-counting garbage collector and cannot currently be used to generate libraries. I believe this limitation is true for all of the compiler targets right now, except maybe Javascript (some work was actively in progress on this when I last checked).


One of most frequent usage of a type system is literal types.

Haskell, Rust,... and modern programming languages fail to do it. The only exception is Typescript.

For example, simple way to adding a extended boolean type:

type MyBoolean = null | true | false.

Or

type MyRange = 3 | 4 | 5

Curious.


> Haskell

That can be done in Haskell with algebraic data types (with constructors without arguments), no?

Languages that have enums: Java, Kotlin, Dart, Ada, C (not in the first version), Pascal (not in the first version), Modula-2, Visual Basic,...

Pascal had subrange types since the first specification. For example "var month : 1..12;"

I don't know anything about Typescript but from what I have found it seems that Typescript's literal types are similar to Pascal's subrange types, i.e., the new type is compatible to the basic type it is derived from (unlike enums in the above languages, which are completely new types). But there is no runtime check.


Typescripts types are more similar to OCaml's variant types[0]; sadly my knowledge of OCaml's type system is less than a tenth of what I would like so I cannot give precise parallels, but in typescript you can have type A = 0 | null | Array<string>, type B = Array<number> | 0 | Array<string> and construct the type A & B = 0 | Array<string>

In general I think that it is not possible to "just" add an type intersection operator to an Hindley–Milner type system without making it either incomplete[1] or unsound[2]

In typescript this is often used with records as {a:number} & {b:string} = {a:number, b:string}

[0] https://v2.ocaml.org/releases/5.0/htmlman/types.html#sss:typ...

[1][2] which typescript is

EDIT: fix typo


It's not quite the same. You can emulate literal string types using recursive ADTs (similar lists), storing one character at the time at the typesystem.

But the ergonomics are quite different.

For example, in Scala you can do this:

    val foo: "Must be this string" = "foo" // fails to compile
In Typescript it's similar. But how would "Must be this string" look in the case of a haskell ADT?


I am quite sure that Pascal had enumerations since the early days.

Section "6.1.1 Scalar Types", 1973 edition:

https://www.standardpascal.org/The_Programming_Language_Pasc...


My bad. In the newer edition they called it "enumerated types":

http://pascal.hansotten.com/uploads/books/Pascal_User_Manual...

I didn't realize that "scalar types" in the first edition were the same thing.


Your example in Haskell:

  —- ADTs in regular old Haskell
  data MyBoolean = MyNull | MyTrue | MyFalse

  —- Refinement types in Liquid Haskell
  {-@ type MyRange = {v:Int | ((v >= 3) && (v <= 5))} @-}



Why would I ever want to extend a type with null? That's exactly the opposite of what I want of a type system in the first place.


the type `null | true | false` is different from `true | false`, a type checker can assert that you handle the `null` case before using a function that wants a boolean. This is how rust handles it (with the Option<T> type).


That doesn’t make sense. `null | false | true` is not equivalent to `Option<bool>` or `Option<false | true>`. Just like `zero | one | two` is not equivalent to `Option<one | two>`.

Assuming that `true | false` is equivalent to something like `enum { true, false }`.


> Just like `zero | one | two` is not equivalent to `Option<one | two>`

Isn't it ? both cases represent a type than can express 3 variants


It is equivalent in the amount of information you can encode, but not in how you can use it.

Classical example is wrapping multiple times: Option<Option<one | two>>. If you have null | null | one | two, well... that just boils down to null | one | two.


Those two types are indeed equivalent by themselves. The problem comes when you want to store them somewhere nullable. `zero | one | two | zero` flattens back down to `zero | one | two`, you can't distinguish between the two zero/null cases.

On the other hand, `Option<Option<one | two>>` allows you to distinguish between None and Some(None).

This makes union types unsound in the presence of type parameters/generics.

TypeScript supports both anyway, because Hejlsberg cares more about being able to type existing JS antipatterns than about providing a sound type system.


> This makes union types unsound in the presence of type parameters/generics.

I'm not sure if "unsound" is a good adjective here. There are cases where this is actually desired behaviour and the rules can definitely be "sound".

For example, I might want to know what errors can appear, but not care where they come from. So `ErrorA | ErrorB` is what I want to see, not some nested structured that allows me to differentiate where ErrorA came from in case that there are multiple possible options.


I did not catch from you comment if you knew, but "sound" and "unsound" are specific concepts in type theory, and they are binary properties. A system either is or is not sound.


Yeah I know that.

So: > This makes union types unsound in the presence of type parameters/generics.

Sounds a bit strange to me. Why would union types + type parameters be generally unsafe? I doubt that that's true.


Sound type system is not one of the design goals of TypeScript

https://effectivetypescript.com/2021/05/06/unsoundness/


No.

There is an isomorphism between them, but they aren’t equivalent since for one you will have to match on the `Option` first in order to see whether it is `None` or `Some(Next)` and then inspect `Next` (if `Some(…)`).

Same reason that `Nothing | Pointer` is not equivalent to `Option<Pointer>`. And it makes a huge practical difference, since the first type allows for “nothing-pointer dereference” while the second one does not.


> And it makes a huge practical difference, since the first type allows for “nothing-pointer dereference”

That's not how strict TypeScript works. If you have a nullable you'll need to prove to the compiler first that it is not currently null before dereferencing.


Here’s what I originally replied to:

> the type `null | true | false` is different from `true | false`, a type checker can assert that you handle the `null` case before using a function that wants a boolean. This is how rust handles it (with the Option<T> type).

If the variant `null` here is handled specially in general in TS then yeah, I was wrong. However, I was mostly replying to the part about “this is how Rust handles it”.


In particular, you may want this to signify “I dont know” with NULL, just as in SQL. See the Wikipedia page for ternary logic for more info.


Note that 3VL (3 valued logic) is one of the more criticized aspects of SQL.

3VL is a gigantic leap of of complexity over 2VL (boolean). The number of possible operations is a lot higher and the choice of operations is not standardized. To quote wikipedia: "In two-valued logic there are 2 nullary operators (constants), 4 unary operators, 16 binary operators, 256 ternary operators" And we all agree which of those 16 is named what. "In three-valued logic there are 3 nullary operators (constants), 27 unary operators, 19683 binary operators, 7625597484987 ternary operators" There is no standard meaning of the 3'rd value and of the operations involving it.

That is not to say that 3VL isn't useful. I believe the reason for the criticism of 3VL in SQL it that the system made those choices for you and those choices are not always intuitive. Therefore a null in one attribute in one relation might mean something very different from a null in another attribute in another relation.

Also note that as opposed to digital signal processing (where binary and ternary refer to the number of possible values a bit/trit can have), when discussing logic arity (nullary, unary, binary, ternary, n-ary) refers to the number of arguments required by a function as opposed to the number of values the arguments can have (univalent, bivalent, trivalent, k-valent). This is confusing because the terms are used interchangeably.


Yes, so it's just about returning a (Maybe = Just bool | Nothing). No null needed.


exaclty, whether you call it 'Nothing' or 'null' does not matter, as long as it represents a distinct type than 'true' or 'false'


Null for me is a null pointer, and thus there's a difference. Anyway, if that's all you want than Haskell, etc. definitely handle that case.


The word "null" means "missing value" in e.g. SQL and most data science / analytics contexts. It's unfortunately an overloaded term with two different meaning. Conflating the two things is where we get the egregious mistake of using null pointers to represent null data!


Isn’t it contradictory to say it’s the most frequent usage, and in the next sentence that no modern programming language does it?


No modern PL explicitly allows for it or type checks for it. Examples are enum types and null pointers, which are used everywhere. Implicit usage includes things like natural numbers (ints more than 0), ranges, values of a type following some rule, etc.


Usually this is done by means of Enumerated types. As someone who didn't get conventional CS education, Enums were bit of me-too for me, but the moment I realised they can be used to make invalid states unrepresentable, I was hooked.

Sure, they can be better in java, but they cover a large enough ground, and as always, most people find that sufficient.


What's different to typical Algebraic-Types/Sum-Types/Nü-Enums in what they showed is that it's existing values being OR'd together, not types or new type constructors. That is presumably why they are obliquely mentioning it in the comments thread of an Idris release, because Idris is known for "Dependent Types", i.e. types that depend on actual values of existing types, not just existing types.


Many languages have enum types and algebraic data types (Java has both now). TypeScript is a little unique in that its core type system is much more structural than nominal (even more so than Haskell) -- and that's because of its JS heritage and interop -- but structural types have significant disadvantages as well advantages. The main advantage is succinctness as well as very powerful inference. The main downside is inscrutable type errors (if two types having the same structure are considered to be the same type, the compiler has little choice but to refer only to the structure in error messages rather than the type name).


> TypeScript is a little unique in that its core type system is much more structural than nominal (even more so than Haskell)

I'm a bit confused by the point about Haskell here -- IME Haskell's type system is pretty strictly nominal, not a lot of structural typing there


Function types are structural in Haskell.


Comment is getting some hate, but it’s somewhat interesting. This is something you can’t do in standard Haskell. You can certainly do: data MyRange = My3 | My4 | My5, then write a MyRangeToInt function.

But in Haskell (AFAIK, there may be a GHC extension) you cannot say to the type checker: I want to define a new type as a subset of the inhabitants of an existing type. That seems slightly more ergonomic in some situations if you then wanted to then run other functions that work on the original type — it at least saves you the conversion step.

Regarding MyBoolean, you’re now doing the previous trick, using a subset of the inhabitants of existing types, but they’re from different types and they’re being put in an untagged union, in Haskell this would be implemented with: type MyBoolean = Maybe Bool, where “Maybe” takes any type as an argument, IMO the Haskell way is a lot cleaner.


For the subcase where the literals are all members of a same overarching type (like your "MyRange" example), there has been some experimentation in this direction in Rust: https://github.com/rust-lang/rust/pull/107606


Hi, what do you mean when you say that literla types are both one of the most frequent uses of a type system, while at the same time modern languages doesn't support them?

Did you mean to say "most useful" rather than "most frequent"?


Anonymous union types? Scala has that. So did Ceylon.


Defining `MyType = 3 | "foo" | false` needs the language to allow arbitrary type changes of a variable, and while that's possible in JS it's generally not considered a good idea.

Rust forces you to define a single enum to represent this, and then you can implement the TryFrom trait to make it directly comparable to booleans, integers etc, but without having to deal with random `undefined` values popping up everywhere.

Is it less convenient? Yes, but so are seatbelts. Sometimes you need to build software where this tradeoff is worth it.


In your example, '3 | "foo" | false' is the type of the variable. What is the argument for this being a bad idea in general? You imply it's unsafe. How?


> In your example, '3 | "foo" | false' is the type of the variable

Depends. TypeScript would (usually) recognize 3 being a part of that union type, but it's still a normal integer that is fully compatible with other numbers. If you add 1 to it the variable is still a number but doesn't belong to this union type anymore. You could of course argue that this is just a consequence of TS being wrapped around JS, but whether the type changes between `3` and `false` or between `3` and `4`, it does change.

In Rust this is not allowed, hence you cannot just do an addition between e.g. an enum variant and a number unless you explicitly implement the interface to make them compatible, in which case the addition produces a new value with - again - an immutable type.

If 3, "foo" and false have different types by themselves then the language has to either allow changing the type of a variable or forbid a mix of them in one type.


Rust is more restrictive than it needs to be, in this case. What matters is whether it's possible to have a situation where the value of a variable or parameter is not concordant with its type in a given context and what it takes to make that happen.

Of course it's possible to tell the TS type checker to sod off in a given context (cast, use the "any" type, etc.), but without doing that, I think it's hard to contrive a situation such as I described above.

https://www.typescriptlang.org/play?#code/C4TwDgpgBAsiAq5oF4...


TypeScript lets you fit your types to your data, whatever that data is. If your data includes mixed types, then you can represent those. If I have a JSON array that mixes numbers and strings then I can write (number|string)[].

Most other statically typed languages force you to fit your data to your types. Many statically typed languages simply degenerate into dynamic typing when trying to consume external data, e.g. by representing JSON as an enumeration of every possible JSON value type.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: