Grids in Rust, Part 2: const generics

lopatin · on April 3, 2021

Very interesting to see that Rust's "const generics" can emulate some of the flagship use cases of dependently typed programming such as having the length of an array be part of it's static type!

mhh__ · on April 3, 2021

Isn't dependant types actually about being able to reason about the runtime value rather than at compile time i.e. D and C++ definitely don't have dependant types, or at least exciting kind.

Jhsto · on April 3, 2021

I feel like the real question here is what will the semantics of a programming language with dependent types look like without enforcing of GADTs all the way, and whether it will work. That is, thanks to GADTs functional languages with dependent types -- in a sense -- make traditional runtime look like static since the type is non-constant while the checking is static (meaning, dependent operations over a variable returns a variable with redefined type, not possible now in Rust). This seems to be the main difference for now, as Rust the type is constant. Yet, this might very well be enough for practical usefulness -- with GADTs all the way you can reason about nice properties such as termination (sometimes) and certain mathematical interest such as generalisations of programming, but this is likely uninteresting for most. Rust might very well apply the same kind of golden middle road to dependent types as it did for linearity via affine types. Will be interesting how this evolves further!

steveklabnik · on April 3, 2021

I am a bit under-educated about the exact specifics here, but I should note that this feature was originally literally dependent types.

* https://github.com/rust-lang/rfcs/pull/1657

* https://github.com/rust-lang/rfcs/issues/1930

* https://github.com/rust-lang/rfcs/pull/2000 (the design that was accepted, in the end)

(I am not actually 100% sure if rfc 2000 meets the by-the-book definition of "dependent types", my copy of Pierce has been collecting dust. Really gotta study up more on this stuff.)

habitue · on April 3, 2021

I believe in order to be dependent, a value needs to be able to affect the type somehow. So something like:

    fn make_array(n: usize) -> [u8; n]

would do the trick.

I am guessing this isn't actually the feature though, and that it's not allowing dependence in this way, but rather promoting some const values and const functions to the type level (which is really useful, but not exactly dependent types)

fluffything · on April 3, 2021

Thats the 3rd RFC in the trilogy. What has been shipped is a subset of the first RFC.

mhh__ · on April 3, 2021

Honestly I'm not totally sure either, however even if const-generics were technically dependant types enough to be called them with a straight face then I'm not sure if you get the benefits of them e.g. the textbook example seems to be static reasoning about the length of the result of appending two vectors of fixed but unknown lengths. The current proposal just seems like a convenient but theoretically shallow approximation of that due to the requirement of the const-parameter being known at compile time.

One roundabout way of doing this - which is very much possible in D but not worth doing because you can't really do useful work with it for the most part due to pain - is that if you extend (say) const-generics to work with basically anything evaluable at compile time, you can build up a data structure to represent and hold true some predicate about a runtime value in a type (as a template/generic parameter), then define how these combine under binary operations (trivial so far in D at least, impossible in C++, no idea about Rust - monomorphization differences aside), then use this datastructure for evil and profit (like giving invariants to the optimizer).

I would like to see how far the aforementioned idea can go, however it quickly becomes "write a compiler at compile time" or worse "write an SMT solver at compile time", both of which do not thrill me.

steveklabnik · on April 3, 2021

That's why the feature flag was "min_const_generics", we do plan on extending it more in the future, but you have to start somewhere, and even this tiny, tiny slice gives a lot of power.

nyanpasu64 · on April 3, 2021

I still want to try the ZZ language (https://github.com/zetzit/zz) someday. It compiles to C, and uses a SMT solver to prove that you don't index out-of-bounds at compile time. But I don't like how it lacks generics, uses C idioms, and compiles to C.

dathinab · on April 3, 2021

> In computer science and logic, a dependent type is a type whose definition depends on a value.

(source Wikipedia)

So rusts const generics are (very limited) dependent types.

C++ templates on the other hand you could argue are not types (but templates), but that is nit-picking.

Weather or not dependent types are can be runtime evaluated often is a separate question then weather or not a language has dependent types.

EDIT: The following part is partial nonsense, very POV dependent. I wanted to delete it but that wouldn't be right, so I just added this note.

Through not allowing run-time creation of dependent types makes them much more limited in usability. But then allowing it requires either reflections (still likely very limited/slow/etc.), or part of the type being stored as a "magic" field of the type instance (fast but is that still a dependent type?), or a interpreted language).

GrumpySloth · on April 3, 2021

Are you saying you consider std::array<int, 5> in C++ to be a dependent type? I'm pretty sure it's not right. I'm pretty sure that the "value" in that quote is, implicitly, a "runtime value", or "value of a variable", similar to how we often say "type" without saying "type of a variable". This "5" isn't the value of a variable, it's just an expression. If you continue on to the next paragraph on this Wikipedia page, you will read, first:

> The return type of a dependent function may depend on the value (not just type) of one of its arguments. For instance, a function that takes a positive integer n may return an array of length n, where the array length is part of the type of the array.

second:

> A dependent pair may have a second value of which the type depends on the first value. Sticking with the array example, a dependent pair may be used to pair an array with its length in a type-safe way.

Generally, it's the same nomenclature as with the explanation of static vs dynamic type systems: in static type systems, types are attributes of variables, functions and expressions, whereas in dynamic type systems types are attributes of values. And in the static type system of C, 5 has type int, because it's an expression. It representing a value is secondary. A side effect of it is: you can write a compile-time constant in C that will overflow its type, because the type is assigned based on the lexical category, not on the value.

As far as I understand, in a dependent type system it should be possible to implement a resizable array "std::vector<int, n>", where n is the length of the array (a variable), then write a "concat" function which concatenates two such arrays, and whose return (static) type depends on the lengths of the arrays passed as arguments. This is different than parametric polymorphism, such as the one that C++ has, since we can't know at compile-time all the possible lengths of arrays that we're going to get at runtime. Idris example here: https://idris2.readthedocs.io/en/latest/tutorial/typesfuns.h...

If someone has a better understanding of the topic, I'd be delighted to be shown wrong.

jcelerier · on April 3, 2021

> As far as I understand, in a dependent type system it should be possible to implement a resizable array "std::vector<int, n>", where n is the length of the array (a variable), then write a "concat" function which concatenates two such arrays, and whose return (static) type depends on the lengths of the arrays passed as arguments.

worksforme https://gcc.godbolt.org/z/x9GjE7KGM (ok, except the "resizeable" bit - but since we're doing functional programming we want immutable data structures, no ? :p)

GrumpySloth · on April 4, 2021

"n" is not a variable in your example. It's still just syntactic sugar for templates, where the n needs to be known statically. Even without the "resizable" bit, you still limit this function to operating on arrays of sizes known statically, which is a severe limitation, making the whole thing impractical.

mhh__ · on April 3, 2021

> C++ templates on the other hand you could argue are not types (but templates), but that is nit-picking.

Not only is it nitpicking, it seems wrong even in an academic sense of the term. Whether the template itself is a type or not probably depends on what theoretical language you use to discuss it.

The template is monomorphized into a new type per instantiation, yes, but the moment it actually has a number in it it's a type.

In D the template can quite happily accept pretty much anything as a parameter (particularly structs i.e. better code quality), and I'm surprised other languages don't allow this. Why have a bunch of template parameters on a line when you can just program the way you normally would.

steveklabnik · on April 3, 2021

> I'm surprised other languages don't allow this.

Rust doesn't (arbitrary values, anyway, const generics is the start of this being possible) because we don't have templates, we have generics. I agree if you're going with a template-based system, it makes a lot of sense to accept whatever, but if you aren't, then things are more complicated.

mhh__ · on April 3, 2021

I think the typechecking here should be possible for rust in the sense that it's just a value of a known type (this isn't like C++ hacks).

https://d.godbolt.org/z/Tb5jKv4ns

steveklabnik · on April 3, 2021

It's possible, (and I think might even work on nightly? Don't have the time to check right this second) but is just a lot more work to support than in a template-based system, that's all. We'll get there!

planetis · on April 3, 2021

I have used them in another language and in practise, itsnt really a useful feature. Say you implement a fixed point type or a matrix, more often than not you end up needing arithmetic ops between different "resolutions" or matrix dimensions. That blows both how many generic functions you write and the generic instatiations. In the end you are better of with moving the parameters in runtime.

humanafterall · on April 3, 2021

I think you can get the benefits of type safety while still using type inference or macros to avoid type impl explosion

tines · on April 4, 2021

C++ programmer learning Rust here: Is this just Rust starting to catch up to C++ in this area?

steveklabnik · on April 4, 2021

Pretty much, yes.

andoriyu · on April 3, 2021

I don't get how 1d array is slower than 1d vec. Both doing the same arithmetic, but vec has to do one extra operation.

The_rationalist · on April 3, 2021

Why do rust arrays have to have a size known at compile time? This restriction is not found in other programming languages. I'm sure this helps the borrow checker or bounds check but why precisely?

NobodyNada · on April 3, 2021

> This restriction is not found in other programming languages

It is. C used to have variable-length arrays as a required part of the standard, but that was made an optional feature in C99. C++ (like Rust) does not have variable-length arrays at all.

If you need a dynamically-sized array in any language, usually you allocate the array on the heap — for instance, with malloc/free, or with a vector type like C++’s std::vector or Rust’s Vec.

As for why, I’m not entirely sure — hopefully someone with more expertise than me can chime in. I’m not sure if it adds too much complexity in the compiler, or if they were simply deemed too unsafe even for C/C++ (since it’s way too easy to accidentally overflow the stack with VLAs).

jessa0 · on April 3, 2021

Slices in Rust are actually kinda similar to VLAs in C99, but usable in places other than aggregates, and use an additional word in pointers to them for the length, as a safety mechanism. You can even use slices as the last member of a struct, like VLAs, though they're not very ergonomic currently (you can only obtain references to them, as well as Boxes containing them, using "unsizing" coercion):

https://play.rust-lang.org/?version=stable&mode=debug&editio...

jessa0 · on April 4, 2021

Ok, turns out I was mixing up "flexible array member" with "variable-length array". Slices can be used on the stack like a VLA as well, though, with the unstable "unsized_locals" feature. This feature is controversial, however, because of the risk of unintentional stack blowup..

oconnor663 · on April 3, 2021

The Linux kernel devs have banned VLAs from their codebase, so you can get a lot of context from that discussion. (https://www.phoronix.com/scan.php?page=news_item&px=Linux-Ki...). One of the drawbacks of VLAs is that, maybe just because of an unfortunate choice of syntax, it's surprisingly easy to create one accidentally: https://lwn.net/Articles/749064/

kzrdude · on April 3, 2021

Vararrays were new (and required) in C99 and optional in C11

ynik · on April 3, 2021

Arrays in Rust are value types (same as std::array in C++, which also has a compile-time size). That means the array doesn't have a separate heap allocation, but is stored inline in the struct/stack frame containing the array. The compiler needs to know the size so that it can place other struct members/local variables after the array.

For dynamically-sized arrays you'd typically use a `Vec<T>` in Rust (or a `Box<[T]>`). These imply the array is stored as a separate heap allocation.

jimktrains2 · on April 3, 2021

The size of all stack-allocated items needs to be known at compile-time. It's the same reason you can't directly have a field in a struct that is the same type as the struct: the size can't be known at compile-time.

If you need an array of unknown size, then you can heap allocate it or use std::vec::Vec (usually via the vec! macro) to provide a nice interface for doing so.

If you need nested structs, e.g. for a tree, then you need to make the field, e.g. the children nodes, pointers or a Box type.

Rust tries to make what it's doing with memory obvious. Other languages with unknown array sizes do a lot of potentially unpredictable memory allocations and copies.

nsajko · on April 3, 2021

> Why do rust arrays have to have a size known at compile time?

Having a known-length array type in a programming language:

* allows the compiler to make certain optimizations using the information it has on the length, especially if the length is small

* allows the programmer to enforce constraints on the length of arrays at compilation time, which can make program correctness more evident and easier to accomplish

> This restriction is not found in other programming languages.

Ever heard about C or C++ or Go? Even dynamic languages like Julia allow defining such types.

dathinab · on April 4, 2021

They also have arrays which size is not known at compiler time (`[T]` instead of `[T; SIZE]`).

It's just that you (for now(1)) can't put dynamic sized types (DST) on the stack, but you can coerce a `&[T; SIZE]` to a `&[T]`. The later has the size encoded in the pointer. Which combined with rusts system to compiler time enforce proper aliasing and RAII allows nice sub-slicing like e.g the `split` at method which allows splitting a `&[T]` into two non-overlapping `&[T]`'s.

(1): There is ongoing work to allow DST on the stack in some limited circumstances which would help with some things, but I'm not the biggest fan of it. For example because this is in many but not all cases incompatible with `async/await` and the not yet existing generators.

moldavi · on April 4, 2021

Yes, Rust lacks a runtime-sized array.

I think the rest of these replies are presenting a false dichotomy; having a compile-time sized array doesn't change that.

Rust could easily have all three (compile-time-sized, runtime-sized, and Vec) but they chose not to. Probably because Vec is mostly good enough (albeit wasting a bit of memory).

brundolf · on April 4, 2021

A Vec is a length, capacity, and pointer. I don't see how you could implement Rust's safety guarantees with runtime-sized arrays that don't know their capacity (for bounds-checks). I guess you could theoretically do without the length/capacity distinction, so you could eliminate one value (but not two) by having "plain" runtime arrays. Overall I just can't imagine there are very many cases where eliminating a single number's worth of memory usage for an entire collection is worth having separate concepts with separate implementations, syntaxes, etc, not to mention all the downsides of not having reallocations/usage-length managed for you.

unanswered · on April 4, 2021

Length=capacity is already available, and always has been, as Box<[T]>

brundolf · on April 5, 2021

Can [T] be any length, decided at runtime?

unanswered · on April 5, 2021

brundolf · on April 5, 2021

That's news to me, and good to know. Seems moldavi's original comment was incorrect

steveklabnik · on April 5, 2021

I mean, it depends on the words you use. "Array" has a specific meaning in Rust, and that's [T; N]. You cannot have a run-time value for N, and so in that sense, they are correct. If by "array" you mean "any number of values laid out next to each other in memory," then they're wrong, but that's not usually what this term means in Rust specifically. Vectors and slices are also laid out this way, but they're not called "arrays."

Box<[T]> is close, but it's a pointer + length, where the pointer is to the heap, whereas [T; N] is a series of values, and can be on the stack. That's why it's a "boxed slice" and not an "array."

And yes, any slice type is runtime sized. That's why slices exist; they have a length stored in them to keep track of how long they are. This goes for &[T], Box<[T]>, Arc<[T]>, any of them.

brundolf · on April 5, 2021

So could I write something like this?

  // returns a heap-allocated buffer of length new_arr_length
  fn my_malloc<T>(new_arr_length: usize) -> Box<[T]> {
    todo!()
  }

The answer seems like no, looking at it, but it's possible there's a syntax for it I don't know about

I guess what it comes down to is: slices can't own their data, can they (I'm genuinely not sure)? If so, and this is indeed a slice, then it should be impossible for this to work in this way

Though, in that case it would also seem fairly pointless (as opposed to a &[T])

steveklabnik · on April 5, 2021

You can't, but not because of the slice stuff, but because you have to give it valid values. This works

  // returns a heap-allocated buffer of length new_arr_length
  fn my_malloc<T: Default + Clone>(new_arr_length: usize) -> Box<[T]> {
    vec![T::default(); new_arr_length].into_boxed_slice()
  }

> slices can't own their data, can they (I'm genuinely not sure)

The problem is that "slice" can mean both &[T] and [T]. [T] is an unsized type, so it needs to be behind a pointer. Putting it behind a & is the common case, and doesn't have ownership because &T doesn't own T, but you can also put it behind Box<T>, which does have owership over T.

> Though, in that case it would also seem fairly pointless (as opposed to a &[T])

Yes, it's very niche. I've never used one in all my years of Rust. But Box<[T]> is two thirds of the size of Vec<T> (no need for capacity, since capacity == length) and maybe there are cases where that is significant, for example.

brundolf · on April 5, 2021

I see! That makes sense now. Unsized types are one of the Rust concepts that I haven't gotten a good handle on yet, probably mainly because they don't come up very often in practice. The only other one I know of is str, and I'm not even sure how I'd go about using a plain str.

steveklabnik · on April 5, 2021

Totally. It works the same way: it's gotta be behind a pointer. Arc<str> would be a signature for a threadsafe interned string, for example.

    use std::sync::Arc;
    
    fn foobar(s: &str) -> Arc<str> {
        Arc::from(s.to_string())
    }

You almost always can only create these sorts of values by casting.

unanswered · on April 5, 2021

Most of what you read online is, especially about rust.

jlrubin · on April 3, 2021

https://doc.rust-lang.org/nightly/unstable-book/language-fea...

This should provide interesting reading -- it's a discussion on unsized local variables, e.g. VLA's on the stack.

andoriyu · on April 3, 2021

That's a requirment for any language that isn't allocating everything on heap. Nothing to do with borrow checker.

OutOfBounds errors aren't checked at compile time unless it's a `const` type IIRC.

kibwen · on April 3, 2021

> OutOfBounds errors aren't checked at compile time unless it's a `const` type IIRC.

There are circumstances where Rust can calculate the array index at compile time and, if it can, it will result in a compilation error if the index is out of bounds. Of course, this is a best-effort analysis and doesn't work for all possible cases. Here's some Rust code, the first three indexing operations fail at compile-time, but the last doesn't:

    let x = [1,2,3];
    
    x[4]; // compile error
    
    const FOUR: usize = 4;
    x[FOUR]; // compile error
    
    let y = 4;
    x[y]; // compile error
    
    fn foo() -> usize { 4 }
    x[foo()]; // runtime error

This analysis works on fixed-size arrays, but not on Vec.

The_rationalist · on April 3, 2021

I wonder if they could have used Smallvec, it sounds like the best of both worlds for performance: stack allocated up to a size.

Off-topic but the JVM will be the second platform to get const generics support -> https://www.reddit.com/r/java/comments/m2dfb5/parametric_jvm...

Are there other languages than rust that supports it?

nsajko · on April 3, 2021

> Are there other languages than rust that supports it?

I'm no expert on the terminology, but C++ templates can take values as template arguments (I think this is what is called "const generics" in Rust), and languages such as Julia are even more powerful.

fish45 · on April 3, 2021

> Are there other languages than rust that supports it?

Const generics are essentially a really small subset of fully dependent types, which are implemented in Idris 2 and other research languages.

mhh__ · on April 3, 2021

Unless you mean strictly generics, D has had the ability to do this and a lot more (e.g. accept user defined types as parameters) since at least 2014 or so

lenkite · on April 4, 2021

C++ has supported non-type template parameters for a very long time now. Since C++ 20, it also supports class literals to be used as non-type template parameters