Hacker News new | past | comments | ask | show | jobs | submit login

Why do rust arrays have to have a size known at compile time? This restriction is not found in other programming languages. I'm sure this helps the borrow checker or bounds check but why precisely?



> This restriction is not found in other programming languages

It is. C used to have variable-length arrays as a required part of the standard, but that was made an optional feature in C99. C++ (like Rust) does not have variable-length arrays at all.

If you need a dynamically-sized array in any language, usually you allocate the array on the heap — for instance, with malloc/free, or with a vector type like C++’s std::vector or Rust’s Vec.

As for why, I’m not entirely sure — hopefully someone with more expertise than me can chime in. I’m not sure if it adds too much complexity in the compiler, or if they were simply deemed too unsafe even for C/C++ (since it’s way too easy to accidentally overflow the stack with VLAs).


Slices in Rust are actually kinda similar to VLAs in C99, but usable in places other than aggregates, and use an additional word in pointers to them for the length, as a safety mechanism. You can even use slices as the last member of a struct, like VLAs, though they're not very ergonomic currently (you can only obtain references to them, as well as Boxes containing them, using "unsizing" coercion):

https://play.rust-lang.org/?version=stable&mode=debug&editio...


Ok, turns out I was mixing up "flexible array member" with "variable-length array". Slices can be used on the stack like a VLA as well, though, with the unstable "unsized_locals" feature. This feature is controversial, however, because of the risk of unintentional stack blowup..


The Linux kernel devs have banned VLAs from their codebase, so you can get a lot of context from that discussion. (https://www.phoronix.com/scan.php?page=news_item&px=Linux-Ki...). One of the drawbacks of VLAs is that, maybe just because of an unfortunate choice of syntax, it's surprisingly easy to create one accidentally: https://lwn.net/Articles/749064/


Vararrays were new (and required) in C99 and optional in C11


Arrays in Rust are value types (same as std::array in C++, which also has a compile-time size). That means the array doesn't have a separate heap allocation, but is stored inline in the struct/stack frame containing the array. The compiler needs to know the size so that it can place other struct members/local variables after the array.

For dynamically-sized arrays you'd typically use a `Vec<T>` in Rust (or a `Box<[T]>`). These imply the array is stored as a separate heap allocation.


The size of all stack-allocated items needs to be known at compile-time. It's the same reason you can't directly have a field in a struct that is the same type as the struct: the size can't be known at compile-time.

If you need an array of unknown size, then you can heap allocate it or use std::vec::Vec (usually via the vec! macro) to provide a nice interface for doing so.

If you need nested structs, e.g. for a tree, then you need to make the field, e.g. the children nodes, pointers or a Box type.

Rust tries to make what it's doing with memory obvious. Other languages with unknown array sizes do a lot of potentially unpredictable memory allocations and copies.


> Why do rust arrays have to have a size known at compile time?

Having a known-length array type in a programming language:

* allows the compiler to make certain optimizations using the information it has on the length, especially if the length is small

* allows the programmer to enforce constraints on the length of arrays at compilation time, which can make program correctness more evident and easier to accomplish

> This restriction is not found in other programming languages.

Ever heard about C or C++ or Go? Even dynamic languages like Julia allow defining such types.


They also have arrays which size is not known at compiler time (`[T]` instead of `[T; SIZE]`).

It's just that you (for now(1)) can't put dynamic sized types (DST) on the stack, but you can coerce a `&[T; SIZE]` to a `&[T]`. The later has the size encoded in the pointer. Which combined with rusts system to compiler time enforce proper aliasing and RAII allows nice sub-slicing like e.g the `split` at method which allows splitting a `&[T]` into two non-overlapping `&[T]`'s.

(1): There is ongoing work to allow DST on the stack in some limited circumstances which would help with some things, but I'm not the biggest fan of it. For example because this is in many but not all cases incompatible with `async/await` and the not yet existing generators.


Yes, Rust lacks a runtime-sized array.

I think the rest of these replies are presenting a false dichotomy; having a compile-time sized array doesn't change that.

Rust could easily have all three (compile-time-sized, runtime-sized, and Vec) but they chose not to. Probably because Vec is mostly good enough (albeit wasting a bit of memory).


A Vec is a length, capacity, and pointer. I don't see how you could implement Rust's safety guarantees with runtime-sized arrays that don't know their capacity (for bounds-checks). I guess you could theoretically do without the length/capacity distinction, so you could eliminate one value (but not two) by having "plain" runtime arrays. Overall I just can't imagine there are very many cases where eliminating a single number's worth of memory usage for an entire collection is worth having separate concepts with separate implementations, syntaxes, etc, not to mention all the downsides of not having reallocations/usage-length managed for you.


Length=capacity is already available, and always has been, as Box<[T]>


Can [T] be any length, decided at runtime?


Yes.


That's news to me, and good to know. Seems moldavi's original comment was incorrect


I mean, it depends on the words you use. "Array" has a specific meaning in Rust, and that's [T; N]. You cannot have a run-time value for N, and so in that sense, they are correct. If by "array" you mean "any number of values laid out next to each other in memory," then they're wrong, but that's not usually what this term means in Rust specifically. Vectors and slices are also laid out this way, but they're not called "arrays."

Box<[T]> is close, but it's a pointer + length, where the pointer is to the heap, whereas [T; N] is a series of values, and can be on the stack. That's why it's a "boxed slice" and not an "array."

And yes, any slice type is runtime sized. That's why slices exist; they have a length stored in them to keep track of how long they are. This goes for &[T], Box<[T]>, Arc<[T]>, any of them.


So could I write something like this?

  // returns a heap-allocated buffer of length new_arr_length
  fn my_malloc<T>(new_arr_length: usize) -> Box<[T]> {
    todo!()
  }
The answer seems like no, looking at it, but it's possible there's a syntax for it I don't know about

I guess what it comes down to is: slices can't own their data, can they (I'm genuinely not sure)? If so, and this is indeed a slice, then it should be impossible for this to work in this way

Though, in that case it would also seem fairly pointless (as opposed to a &[T])


You can't, but not because of the slice stuff, but because you have to give it valid values. This works

  // returns a heap-allocated buffer of length new_arr_length
  fn my_malloc<T: Default + Clone>(new_arr_length: usize) -> Box<[T]> {
    vec![T::default(); new_arr_length].into_boxed_slice()
  }
> slices can't own their data, can they (I'm genuinely not sure)

The problem is that "slice" can mean both &[T] and [T]. [T] is an unsized type, so it needs to be behind a pointer. Putting it behind a & is the common case, and doesn't have ownership because &T doesn't own T, but you can also put it behind Box<T>, which does have owership over T.

> Though, in that case it would also seem fairly pointless (as opposed to a &[T])

Yes, it's very niche. I've never used one in all my years of Rust. But Box<[T]> is two thirds of the size of Vec<T> (no need for capacity, since capacity == length) and maybe there are cases where that is significant, for example.


I see! That makes sense now. Unsized types are one of the Rust concepts that I haven't gotten a good handle on yet, probably mainly because they don't come up very often in practice. The only other one I know of is str, and I'm not even sure how I'd go about using a plain str.


Totally. It works the same way: it's gotta be behind a pointer. Arc<str> would be a signature for a threadsafe interned string, for example.

    use std::sync::Arc;
    
    fn foobar(s: &str) -> Arc<str> {
        Arc::from(s.to_string())
    }
You almost always can only create these sorts of values by casting.


Most of what you read online is, especially about rust.


https://doc.rust-lang.org/nightly/unstable-book/language-fea...

This should provide interesting reading -- it's a discussion on unsized local variables, e.g. VLA's on the stack.


That's a requirment for any language that isn't allocating everything on heap. Nothing to do with borrow checker.

OutOfBounds errors aren't checked at compile time unless it's a `const` type IIRC.


> OutOfBounds errors aren't checked at compile time unless it's a `const` type IIRC.

There are circumstances where Rust can calculate the array index at compile time and, if it can, it will result in a compilation error if the index is out of bounds. Of course, this is a best-effort analysis and doesn't work for all possible cases. Here's some Rust code, the first three indexing operations fail at compile-time, but the last doesn't:

    let x = [1,2,3];
    
    x[4]; // compile error
    
    const FOUR: usize = 4;
    x[FOUR]; // compile error
    
    let y = 4;
    x[y]; // compile error
    
    fn foo() -> usize { 4 }
    x[foo()]; // runtime error
This analysis works on fixed-size arrays, but not on Vec.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: