Rust 1.50

brundolf · on Feb 11, 2021

> Some types in Rust have specific limitations on what is considered a valid value, which may not cover the entire range of possible memory values. We call any remaining invalid value a niche, and this space may be used for type layout optimizations. For example, in Rust 1.28 we introduced NonZero integer types (like NonZeroU8) where 0 is a niche, and this allowed Option<NonZero> to use 0 to represent None with no extra memory.

I didn't know about this, and it's super cool. It's part of a broader pattern where the rigid constraints Rust can impose on code allow both the compiler and the user to do things that would be wildly dangerous in a less-strict language.

steveklabnik · on Feb 11, 2021

For some history here, the most famous example is Option<&T>. References cannot be null, and so the Option can use the null pointer to represent None.

At first, this was special cased in the compiler, but then generalized.

drran · on Feb 11, 2021

Sadly, it's not generalized enough to represent C-like enums in Rust, which can be declared as (in _theory_):

  enum Foo {
    A = 0,
    B = 1,
    C = 2,
    Unknown(i32),
  }

where Unknown field cannot contain values 0, 1, or 2, thus Rust compiler will be able to pack this enum into single i32, like in C.

johnsoft · on Feb 11, 2021

It's not really an enum if you can't enumerate all the values at compile time. Rust enums are not C enums.

I'd represent it like this:

    #[repr(transparent)]
    struct Foo(i32);

    impl Foo {
        const A: Foo = Foo(0);
        const B: Foo = Foo(1);
        const C: Foo = Foo(2);
    }

drran · on Feb 11, 2021

1) Enums in C are implemented that way, so C-like enums in Rust must implement it in the same way to be compatible with C.

2) This is required for forward compatibility. For example protobuf explicitly requires it.

3) Your code is not forward compatible. The proper way is to implement it is as i32 and then unwrap it, or:

  enum Foo {
    A = 0,
    B = 1,
    C = 2,
    _UNKNOWN_3 = 3,
    _UNKNOWN_4 = 4,
    _UNKNOWN_5 = 5,
    _UNKNOWN_6 = 6,
    // ... repeat few dozen times, to be forward compatible
  }

Example: https://github.com/apoelstra/rust-bitcoin/blob/c37ab1f9c2392...

P.S.

IMHO, Rust should allow to define enums as:

  enum Foo {
    A = 0,
    B = 1,
    C = 2,
    _ , // Unknown values which must be handled by default case
  }

or

  enum Foo {
    A = 0,
    B = 1,
    C = 2,
    _OTHER , // Unknown values which must be handled
  }

because current workaround is usable for i8, maybe even for i16, but not for i32.

andrewaylett · on Feb 11, 2021

Rust has an annotation to mark that an enum is non-exhaustive[1], and a mechanism for declaring an enum as being laid out in a manner compatible with C[2]. I've not tried using them together :).

In general, it _is_ possible for any C data layout to be represented in Rust -- but it's not necessarily the case that the representation has the same name. And it's also not the case that we can safely pass memory from C to Rust without validating the content, even if the representation is equivalent for all valid values.

[1] https://blog.rust-lang.org/2019/12/19/Rust-1.40.0.html#non_e...

[2] https://rust-lang.github.io/unsafe-code-guidelines/layout/en...

drran · on Feb 12, 2021

To my surprise, it works partially on latest stable Rust:

  #[non_exhaustive]
  #[repr(C)]
  #[derive(Debug)]
  #[allow(dead_code)]
  enum Foo {
    A = 0,
    B = 1,
    C = 2,
  }
  
  pub fn main() {
    let bar = unsafe { std::mem::transmute::<i32, Foo>(33) };
    println!("{:?}", bar);
    
    let s = match bar {
        Foo::A => ("A", bar as i32),
        Foo::B => ("B", bar as i32),
        Foo::C => ("C", bar as i32),
        _ => ("Unknown", bar as i32),
    };

    println!("{:?}", s);
  }

  Standard Output
  
  C
  ("Unknown", 33)

pornel · on Feb 11, 2021

Rust enums are not compatible with C, even with `#[repr(C)]`. Values not explicitly present in the enum are not allowed. Casting them to a Rust enum is UB, and it does actually cause miscompilation, because `match` can be a jump table.

lilyball · on Feb 12, 2021

In what way is johnsoft’s code not forwards-compatible?

> Your code is not forward compatible. The proper way is to implement it is as i32 and then unwrap it

That’s what they’re doing. It’s a newtype for i32 with a few named values but it can represent any i32, and you can unwrap it with .0 to get the i32 value.

steveklabnik · on Feb 11, 2021

Yes, if you look at the actual PR for this change, you'll see that doing this is using a rustc-specific interface, for similar reasons.

azornathogron · on Feb 11, 2021

For that use-case can you get away with just setting #[repr(i32)] on the enum and leaving Unknown out of the list?

I guess the compiler can't help you keep known and unknown values separate with that structure though, so maybe it's not enough.

kzrdude · on Feb 11, 2021

If you do that, then 3 etc is not a valid value for the enum. Only valid values for that type are the variants.

k__ · on Feb 11, 2021

Do C/C++ use such optimizations or is their type system too brittle?

steveklabnik · on Feb 11, 2021

I am a non-expert in the exact semantics of C and C++ layout, but as far as I can recall off the top of my head, they do not, because the general semantics are "lay out the struct in the order as declared, add padding for alignment reasons, done."

Whereas the semantics in Rust are "we can do whatever we want unless you add an explicit #[repr] attribute, in which case you choose the semantics of layout and we don't mess with it."

It's not about the strength of the type system, it's about history.

MaulingMonkey · on Feb 11, 2021

Theoretically you could roll your own such optimizations manually in a C++ codebase by abusing partial specialization. That is, you could write:

    template < typename T >
    class optional< std::vector<T> > {
        // ...
    };

And write an implementation that (ab)uses knowledge of the exact layout of `std::vector<T>` to avoid an extra bool. This would of course be a breaking change to the ABI of optional<std::vector<T>> if implemented by std::optional, so compiler vendors tend to avoid such things.

There is one stdlib example of this, but sadly it's more infamous than inspirational: std::vector<bool>. The standard relaxes several guarantees of std::vector<T> in the case of T == bool, to allow it to bit-pack the booleans instead of using sizeof(bool) - e.g. an entire byte - per element.

Some of those relaxed guarantees make std::vector<T> unsuitable for interop with C style APIs taking pointers + lengths if T might be a bool. Worse still, many (most?) implementations don't even implement the space optimization those guarantees were relaxed for - a worst of both worlds situation. If you actually need such space optimizations, you'll reach for an alternative such as boost::dynamic_bitset which provides them consistently.

steveklabnik · on Feb 11, 2021

This is a good point; given that it's automatic in Rust, I assumed that they parent was talking about it happening automatically, but you are right that you absolutely can do this manually, and it's good to have that nuance. After all, this PR required manual intervention to get the automatic parts to kick in!

MereInterest · on Feb 11, 2021

Sort of. You're right on what the abstract C++ machine does, but C++ also has the "as if" rule. A compiler can make whatever changes it wants so long as all observable effects are as if it had generated exactly the program as requested. This is the rule that allows for compiler optimizations to be done. Values stored in memory are not considered to be observable effects, so the compiler is allowed to make whatever changes it wants on that end.

However, to the best of my understanding, most types are exposed to other compilation units, so it is hard to establish invariants about how they are used. Unless the compiler can verify that nowhere in the program ever takes the address of a boolean variable in a struct, it isn't allowed to rearrange the struct to avoid having that extra boolean. That might be possible at link-time for statically linked programs, but I'm not sure.

Also, my knowledge is mostly from programming in C++ and watching CppCon talks, so I may be out of date from the latest optimization techniques.

tjalfi · on Feb 11, 2021

That particular optimization is unlikely but there have been C compilers that optimized struct layout.

179.art, one of the SPEC2000 benchmarks, has some poorly laid out structs. Sun introduced targeted optimizations for this benchmark and several other vendors also did so. I have also read papers about profile-guided struct reordering but don't have a citation on hand.

GCC also had a pass[0] for this optimization but it may have been removed.

[0] https://www.research.ibm.com/haifa/dept/_svt/papers/golovane...

k__ · on Feb 11, 2021

I see, thanks.

Pretty cool to see that Rust can do more optimizations than other languages.

And it seems to play out well. I fondly remember Java being touted as being superior to C/C++ because it can do runtime optimizations, which will outperform C/C++. Somehow this never happened.

brundolf · on Feb 11, 2021

There are some cases, like the Option<&T>::None case, where C/C++ do roughly the same thing just without any type guards. So in this case: pointers in C/C++ can simply be null, which is not expressed at a type level but is lumped in as a special case of the pointer value itself instead of being a separate bit. i.e. the optimization in this case is only necessary for Rust because it normally represents the enum variant as a separate component of the value in addition to its contents. The optimization brings it back in line with C/C++ from a memory usage standpoint.

However, because there are no type guards for this stuff in C/C++ you're generally constrained by convention/intuitiveness, when picking these special values, in order to (hopefully) guard against accidental misuse. So in the NonZero case above, you probably wouldn't do that because it's pretty unusual, and you'd therefore use more memory.

And then, even when the "special values" are chosen "reasonably", they sometimes get misused anyway and we end up with bugs like last year's trivial sudo exploit: https://bit-sentinel.com/to-sudo-or-not-to-sudo-demystifying...

> Reading the manual for those functions we learn that -1 is a special value

> Supplying a value of -1 for either the real or effective user ID forces the system to leave that ID unchanged.

> If one of the arguments equals -1, the corresponding value is not changed.

> So, what happens when we send -1 as a parameter? Well, somewhere along the underlying code lines is a convention that -1 is represented as 4294967295, thus either values will get the wanted result.

Edit: In practice the above may be more common in C than modern C++

lpapez · on Feb 11, 2021

When it comes to std::optional, the answer is no. In fact, std::optional<T&> is explicitly forbidden (see here: https://www.fluentcpp.com/2018/10/05/pros-cons-optional-refe...)

It is used in some other places though. For instance, std::string implementations typically employ small-string optimization (SSO) which means that there might be a limit on string::size() since some of the bits inside that value are used for different purposes (example: https://akrzemi1.wordpress.com/2014/04/14/common-optimizatio...). The "niche" in this case is the realization that strings are unlikely to be exabytes in size, so there is no reason to always store zeros in high bits of _size.

steveklabnik · on Feb 11, 2021

Fun trivia fact: Rust's stdlib String type cannot, and therefore does not, support SSO.

pdimitar · on Feb 11, 2021

I recently wondered if a library like `smol_str` will be upstreamed one day. I feel people would rejoice if they can use stack-allocated strings (up to a certain size).

Do you think that's possible?

steveklabnik · on Feb 11, 2021

Anything is possible, but it's not clear to me what advantage being in the standard library would actually bring. It is a pretty niche use-case (and I work in embedded these days), and it is pretty trivial to use a package.

Only the libs team can really say, and I'm not on that team.

pdimitar · on Feb 11, 2021

I'm looking at it through the lens of "let Rust have the best possible newcomer experience", which of course isn't a sentiment that has to be shared by the creators.

I suppose I appreciate SmallString's ability to alias itself to Rust's String.

But I'd immediately agree that this adds hidden complexity and the potential for it to bite you in the worst possible moment.

brundolf · on Feb 11, 2021

Going along with the other replies, I disagree that this would improve the experience for newcomers. SmallString is a hyper-optimization for specialized use-cases which (I would assume) is rarely worthwhile. I would actively discourage people from using SmallString without first profiling to identify Strings as their bottleneck, especially newcomers. String will almost always be "fast enough".

pdimitar · on Feb 11, 2021

Agreed that it's a very likely unneeded optimization. As shared in another sibling comment, I got frustrated with Rust's strings and figured that maybe one more complexity on top wouldn't be that bad. But thinking of it now, yeah, it's definitely a bad idea.

pornel · on Feb 11, 2021

I like that Rust's String is not trying to be smart. It makes it predictable and easy to reason about.

Because Rust has &str as the lowest common denominator for all string types, "non-standard" strings aren't difficult to use.

pdimitar · on Feb 11, 2021

I agree that less smartness is a good thing, it's just that sometimes I get a bit frustrated even with basic usages of Rust's String and &str. It gets easier with time but the first few months were a big struggle.

steveklabnik · on Feb 11, 2021

Even with that perspective, you can disagree. Offering even more string types may be more confusing for users.

pdimitar · on Feb 11, 2021

Yep. I was kind of thinking out loud but was curious about your perspective. Thank you.

steveklabnik · on Feb 11, 2021

Totally! That's how progress is made, thinking out loud :) Thank you too!

drran · on Feb 11, 2021

If you need [no_std] version, use SmallString/SmallVec. Otherwise, use SmartString, SmolStr, etc.

colejohnson66 · on Feb 11, 2021

> The "niche" in this case is the realization that strings are unlikely to be exabytes in size,...

That’s what we say now, but a few dozen years down the line? Old Mac OS versions (pre X) would store extra bits of information in the MSBs of a pointer. “32 bit clean” Mac OS had to be created to allow usage on computers with more than a few dozen megabytes. That was an all too common problem when multiple gigabytes of RAM became commonplace.

Granted, I highly doubt we’ll see exabyte level string lengths, but you never know.

Fun fact: x86-64 mandates pointers be in “canonical” form where all unused bits of the address are the same as the MSB. So on a processor that has 48 address lines, the upper 16 must match the 48th bit. If you try to access memory with a “non canonical” pointer, the processor will throw an exception. This is because they don’t want people using them for flag bits and having a repeat of the “unclean” software problem.

TL;DR: I agree, but never say never

Dylan16807 · on Feb 11, 2021

There are two problems with using high pointer bits as flags.

The small problem is that your program will be limited in memory space. That's usually okay. If you want a vast expansion of memory space you probably need to start using different algorithms anyway.

The big problem is that your program depends on the CPU ignoring those bits, and now it can't run at all on updated hardware. As you noted, x86-64 solves this problem by forcing programs to clean up pointers before actually using them.

lpapez · on Feb 11, 2021

I think we have quite some before us until the need for these high bits becomes reality :)

Now that you mention pointers, Apple is using unused bits in their values for signing and authentication purposes https://googleprojectzero.blogspot.com/2019/02/examining-poi...

Pretty neat in my opinion

KMag · on Feb 11, 2021

And it's too bad that most kernels (including Linux and Windows) put userspace in the low half of the address space. In the upper half of a 48-bit address space (or even 51-bit), all valid pointers, if interpreted as IEEE-754 doubles, are NaN, so you get NaN boxing without performing any arithmetic. (Though, if you want NaN-boxed nullable pointers, you'll need to pick a special value for nullptr.)

colejohnson66 · on Feb 12, 2021

Why are you treating pointers as floating point numbers?

KMag · on Feb 13, 2021

Pointers aren't being treated as doubles. There are 2**53-2 bit patterns for NaN values, and most FPUs (including x87 and the Aarch64 FPU) generate only 2 of those, leaving 2**25-4 values available as a variant on tagged pointers called NaN-boxing/nun-boxing.

NaN-boxing / nun-boxing is a common optimization, as seen in Firefox's SpiderMonkey, Safari's JSC, LuaJIT, etc. If all of your valid pointers are at the top of the address space, you can use 0x0 as the constant you either add/subtract or xor with your values to box/un-box. In other words, Nan-boxing/unboxing becomes a no-op.

Drup · on Feb 11, 2021

Hi steve, since you are around: is there a (formal?) description of how this "niche" layout optimization behaves on ADTs in general, or any documented work on the topic?

steveklabnik · on Feb 11, 2021

I think there is, but I cannot find it right now. If you make a post on internals.rust-lang.org, I bet someone can point you to the right place.

johnisgood · on Feb 12, 2021

You may want to check out Ada, then: https://docs.adacore.com/spark2014-docs/html/ug/en/source/ty...

You can catch a hell lot of things at compile-time. Correctness is huge in Ada/SPARK. You write safety-critical software in it. I have a comment on Ada/SPARK in great detail, but I do not have the time to find it sadly. :(

Consider this:

  Total : Integer range 0 .. Integer'Last

"Any attempt to assign a negative value to variable Total results in raising an exception at run time. During analysis, GNATprove checks that all values assigned to Total are positive or null."

Analysis here means static analysis.

You can do stuff like this, too, for example:

  subtype Natural  is Integer range 0 .. Integer'Last;
  subtype Positive is Integer range 1 .. Integer'Last;

The above is actually declared in the visible part of package Standard.

volta83 · on Feb 12, 2021

Does Ada perform layout optimizations on the niches ?

johnisgood · on Feb 13, 2021

I think the terminology is completely different, so I am not quite sure. Perhaps someone who knows Ada/SPARK and its compiler better may be able to answer you.

volta83 · on Feb 11, 2021

I write a lot of code using the

    #[rustc_layout_scalar_valid_range_start(x)]
    #[rustc_layout_sclaar_valid_range_end(y)]
    struct MyInt(i32);

to create integers with only valid values in range [x, y) that benefit from the niche optimizations.

Like, have literally a project with almost 100k LOC which is all built on top of this.

brundolf · on Feb 11, 2021

Wow, I did not know about those!

I'm surprised how little a Google search turns up for them; the best I found was a mention (but no description or examples) in docs.rs: https://docs.rs/rustc-ap-syntax_pos/634.0.0/rustc_ap_syntax_...

Do these enforce against out-of-bounds values by panicking, the way arrays do? Or are they unsafe?

jleahy · on Feb 11, 2021

They're not only unsafe, but as far as I'm aware they're a compiler implementation detail that isn't really supposed to be used in user code (along with rustc_nonnull_optimization_guaranteed). It's actually how NonZero is implemented.

See https://github.com/rust-lang/rust/blob/master/library/core/s...

brundolf · on Feb 11, 2021

Ah, that explains things

tmzt · on Feb 11, 2021

It would be nice to have a #[range(a..b)] version of this that could be standardized.

volta83 · on Feb 12, 2021

> Or are they unsafe?

They are unsafe. The safe API of your type must prevent constructing invalid values.

oblio · on Feb 11, 2021

The revenge of Pascal integer ranges! :-D

johnisgood · on Feb 12, 2021

It makes me chuckle when they perceive it to be something of an innovation when it has been in languages long before Rust, in a much better way; I mean, just look at Ada, really. You even have formal verification in Ada/SPARK.

volta83 · on Feb 12, 2021

I don't use these hints for verification; in fact, these hints don't verify anything.

It is trivial to create an integer type in Rust that only accepts certain values, and doing so doesn't require these hints, e.g.,

    #[bounded_int(start, end)]
    struct MyBoundedInt(i32);

creates a type called MyBoundedInt that never will take a value outside the range [start, end); this can be proved formally, i.e., there is no safe Rust program that can violate that. No need to run any kind of verification on top.

The place where Rust differs from Ada, and the reason I use these attributes, is to enable the layout optimizations that Rust performs (e.g. Option<NonZero> having the same size in bytes as NonZero, and many others).

bogeholm · on Feb 12, 2021

Rust doesn’t claim to have invented this, or many of the other aspect of the language at all really.

From the Rust reference (https://doc.rust-lang.org/reference/influences.html):

> Rust is not a particularly original language, with design elements coming from a wide range of sources. Some of these are listed below ...

johnisgood · on Feb 13, 2021

> Rust doesn’t claim to have invented this, or many of the other aspect of the language at all really.

Oh, I did not mean to suggest that Rust does, just some people do, because they are not familiar with other programming languages and whatnot.

TazeTSchnitzel · on Feb 11, 2021

Not only is it more efficient, it aligns with a safety principle I like: “make invalid states unrepresentable”.

pjungwir · on Feb 11, 2021

I don't use Rust often (sadly), but I really appreciate its goals and its steady progress. I read every one of these Rust 1.x stories on HN. Around Christmas I made an n-player chess clock with Rust and websockets [0] (because my friends are really slooow Agricola players :-), and it was incredibly easy compared to my first projects 5 years ago. I can't believe it's been that long. So thank you to all the folks working on it, and also thank you to everyone helping out 5-year newbies like me. Especially steveklabnik, I think you've helped me personally several times. :-)

[0] https://github.com/pjungwir/multiclock

weinzierl · on Feb 11, 2021

I've been doing Rust for a while now, but this is the first time I've heard about the niche concept[1]. Sounds really useful.

I know that other languages have subranges or refinement types but niches seem to solve a different problem - namely to use the "holes" in a data type for optimization. Does any other language have some comparable concept to niches in Rust?

KirillPanov · on Feb 12, 2021

In Standard ML the bitwidth of "int" is machine-specific, like in C's "int" and Rust's "isize". Most implementations of Standard ML choose the bitwidth to be one bit less than the machine's natural word width, so that the GC mark bit can be stashed in there. For example, 63-bit ints on a 64-bit machine.

http://www.itu.dk/~sestoft/mosmllib/Int.html#precision-val

I recall that MLton would stash data in the lower log_2(alignment) bits of pointers for types with alignment requirements (or architectures with memory load alignment requirements), but can't find a reference. There was a paper (ICFP, I think?) about using these bits for reference-counting GC with a fallback to mark+sweep when the count hit the maximum value. Don't think it ever got implemented in a production compiler.

I'm pretty sure the "None:Option<T> as null pointer when T is represented as a pointer" optimization happens under the hood in a lot of Haskell+ML compilers.

Rust is the first language I've come across that managed to include pretty much all the cool features ML had.

FuckButtons · on Feb 12, 2021

This sounds like a really interesting idea that I hadn’t really encountered before. Do you have any suggestions for further reading?

masklinn · on Feb 12, 2021

The second paragraph is pointer tagging, it’s used quite a bit, including in the upper bits for some (e.g. objc stashes the ref count in the class pointer).

I think Ocaml also does something interesting with integers, IIRC they’re 1 bit less than the normal range and a shift stores a flag in the freed low bit: if the bit is set it’s a shifted integer, otherwise it’s an unshifted pointer to an integer.

pimeys · on Feb 11, 2021

I hope 1.51 brings the new cargo resolver to stable. I was fighting with a problem recently, solved in the new resolver. If we'd think a dependency such as:

  [target.'cfg(target_os = "macos")'.dependencies]
  foo = { version = "1", features = ["x"] }

  [target.'cfg(not(target_os = "macos"))'.dependencies]
  foo = { version = "1", features = [] }

With the current resolver, the dependency `foo` will always be compiled with a feature `x`, no matter which target os you're using. Debugging this took me a while and I was surprised it's actually just how the current resolver works.

varajelle · on Feb 11, 2021

I am happy that bool::then is stable. I can simplify quite some code by removing the `else { None }` branch

k__ · on Feb 11, 2021

Could you give a before/after example?

Fiahil · on Feb 11, 2021

Before:

if predicate() { Some(thing) } else { None }

After:

predicate().then(|| thing)

Shadonototro · on Feb 11, 2021

i prefer the Before one, it is clear what it does, it's branching, common, even in plain english or any latin language

the second, that magic is not clear, i'll keep using the Before personally

momothereal · on Feb 11, 2021

IMO it depends if you're already in a chaining context. I wouldn't use it in a simple if-else branch, but it also removes the need to use if-else when you're already in a 10-function long chain.

Before:

if a.b().c().d().e().f() { Some(g().h().i()) } else { None }

Now:

a().b().c().d().e().f().then(|| g().h().i())

Even better if you have multiple boolean returns in the chain.

csomar · on Feb 11, 2021

The naming can be confusing since "then" is usually used for async operations.

steveklabnik · on Feb 11, 2021

That ship had already sailed in Rust https://doc.rust-lang.org/stable/std/?search=then

hobofan · on Feb 11, 2021

Option and Result already have a `and_then` in addition to the futures methods you mention, so it should just always be assumed that then/and_then have slightly different meanings depending on the context.

alfiedotwtf · on Feb 12, 2021

I used to like explicit if/then and using match, but chaining really does reduce code. It can sometimes take a while to golf it down, but damn is it satisfying when it’s all so compact. Followed by a `cargo fmt` and it feels way more elegant than branching

trevor-e · on Feb 11, 2021

Can you also please explain what the "||" is doing? That looks really weird to me coming from Swift and don't see it in the docs.

edit: thanks all, should have known it was a lambda :D, in Swift we can omit this. My first thought seeing "||" was some type of OR logic since it's in the context of if/else

Sharlin · on Feb 11, 2021

Rust lambdas look like this:

    |ar,gs| expr

So `|| expr` is just a lambda function that takes no arguments and evaluates to `expr`.

varajelle · on Feb 11, 2021

It is a lambda function with no parameters.

The argument of then() is a function which is evaluated only if the bool is true.

laszlokorte · on Feb 11, 2021

empty parameter list of a closure.

rust: |x| x+x

swift: {x in x+x}

rust: || 42

swift: { 42 }

oblio · on Feb 11, 2021

Rust is awesome but I swear the language developer have no aesthetic eye. I mean, they do have one, but it definitely isn't that of Elvis (and definitely not that of Mort).

https://web.archive.org/web/20080218051638/http://www.nikhil...

staticassertion · on Feb 11, 2021

const generics are a feature I've been looking forward to since 1.0, really cool to see that work making it to stable.

This solves another of the ergonomic issues in Rust. It really feels like within 2021 Rust will hit a point where it feels totally consistent.

CodesInChaos · on Feb 11, 2021

Unfortunately the version of const generics stabilized in 1.51 has many limitations. The core problem seems to be that evaluating const functions can panic, while generics should only produce errors when constraints are not met, not when instantiating them.

The current limitation I find the most annoying is that you can't use associated constants in generics.

est31 · on Feb 11, 2021

const generics themselves will only become stable in the upcoming release, 1.51. This release only adds implementations of specific traits to arrays of all lengths. Users themselves can't declare types yet that are generic on numbers or write impl blocks generic on numbers.

felipellrocha · on Feb 11, 2021

I hit this limitation recently, and although it was a bummer, it’s good to hear that a solution is being worked on

staticassertion · on Feb 11, 2021

Ah! Bummer, but also, that's enough to deal with what felt like a big inconsistency in Rust - inability to work well with arrays.

conradludgate · on Feb 11, 2021

I'm already using them in nightly and they are amazing.

The only thing left that I want with generics is variadics/tuples. The only way currently to implement a trait over a generic tuple is to write a macro and call it for however many sized tuples you want. I've not seen any convincing RFCs about it yet though so I'm not confident we'll get them any time soon

remexre · on Feb 11, 2021

I use https://docs.rs/frunk/0.3.1/frunk/ to get around this in the meantime.

PoignardAzur · on Feb 12, 2021

I wouldn't call it a convincing RFC, but I did recently write an analysis on the subject:

https://gist.github.com/PoignardAzur/aea33f28e2c58ffe1a93b8f...

slmjkdbtl · on Feb 11, 2021

I'm not familiar with Rust's rfc / implementation / go stable flow, but I'm curious how f32::clamp took so many years to go stable

vlang1dot0 · on Feb 11, 2021

Looks like the two main issues were:

1. Some crates in the ecosystem have "extension traits" that add the same method to `f32`. Adding this into std will cause conflicts with those methods so users will need to disambiguate or remove their use of those extension traits. (This is allowed breakage under the stability guidelines)

2. Should this use this two arguments or a range (`x.clamp(1.0, 5.0)` or `x.clamp(1.0..5.0)`)?

I think part of the reason this took so long was that by having widely used crates like `numtools` in the ecosystem that provided this functionality, it took a lot of the pressure off having this in std/core.

steveklabnik · on Feb 11, 2021

You can read the relevant history here: https://github.com/rust-lang/rust/issues/44095

Short summary:

First delay was ecosystem breakage; many people had defined their own functions with the same name. We're allowed to make these changes but tend to try not not unnecessarily break people.

It then sat for about a year. The original author was a bit tired from all of the work to get to that point, and reasonably let it sit.

There was then a small discussion about taking individual parameters vs ranges.

Six months after that, it finally actually landed, thanks to some other changes that would help mitigate the breakage.

It then sat for a while, until the libs team proposed merging. That brought up a lot of the previous design questions, which some people thought weren't resolved to their satisfaction.

It then finally got stabilized in October of last year, and then it had to wait for the release trains.

So, TL;DR: a surprising amount of details for such a small feature, which can lead to burnout, along with not enough people wanting to push it over the line.

slmjkdbtl · on Feb 12, 2021

Thanks for the reply Steve! Have been enjoying your works / content :D.

_34vz · on Feb 11, 2021

im still waiting for fixes for https://github.com/rust-lang/rust/issues/40552 and https://github.com/rust-lang/rust/issues/75263 in one of these releases

lots of upvotes for them too since its for privacy in the compiler

steveklabnik · on Feb 11, 2021

I am not aware of anyone actively working on them, though I am also not on the compiler team.

Looks like someone who cares about this needs to step up! You can make it happen sooner than later. It is one of the best things about open source.

_34vz · on Feb 12, 2021

except it needs someone very experienced with rust and compilers to fix which isnt me. even then if its ignored now whats to say a pr wont be too

haolez · on Feb 11, 2021

I was just wondering... is there any hope to have minimalist Rust compilers? Something like TinyCC for C[0]?

[0] https://tiny.cc/

kibwen · on Feb 11, 2021

Your link there is to a link shortener, I think you mean TCC. :)

It really depends on what you're actually asking for. Do you want, for example, a "slimmer" Rust compiler that jettisons all the stuff that supports older language editions? Do you want a "simpler" compiler that only uses straightforward algorithms at the expense of compiler speed? Or do you want a faster compiler that does less optimization at the expense of code generation quality? The latter, at least, is something that the Cranelift backend for rustc hopes to achieve.

haolez · on Feb 11, 2021

Yes and I've lost the window to edit my comment. Thanks for the correction :)

pornel · on Feb 11, 2021

I don't think so. The type system is pretty advanced, and type inference is required to compile it. Macros, proc macros, and modules handle tons of details, and aren't just textual inclusion like in C.

The borrow checker is technically optional (as proven by mrustc), but if you wanted to implement it, it's no longer a set of simple scope rules, but more like a flavor of Prolog inside the compiler.

These things are awesome for the power and usability of the language, but aren't tiny.

twic · on Feb 11, 2021

The closest thing i am aware of is mrustc:

https://github.com/thepowersgang/mrustc

But AIUI that is not aimed at being a Rust compiler you would actually use day-to-day, just at being a way to bootstrap a Rust toolchain without needing a Rust compiler.

masklinn · on Feb 11, 2021

An interpreter might make more sense, as compiling rust in a straightforward manner (without optimisations) is really slow and takes a lot of space. It might not be slower to just interpret it, and avoiding binary generation might be advantageous.

steveklabnik · on Feb 11, 2021

I am not aware of anyone pursuing any.

no_wizard · on Feb 11, 2021

Disclaimer: I'm new to Rust and lower-level programming of this type in general. The 'lowest' level language I've worked with on the regular before this is C#[2]

I know Rust already has Tuples[0], so I'm assuming Const Generic Indexing For Arrays[1] is a happy path for the compiler to optimize what amounts to a finite sized Generic Tuple? (Finite size in terms of memory not elements)

Excellent feature, I just seem some (admittedly only high level) surface overlap in this language feature.

[0]: https://doc.rust-lang.org/rust-by-example/primitives/tuples....

[1]: https://blog.rust-lang.org/2021/02/11/Rust-1.50.0.html#const...

[2]: Which, while generally accepted to be a strongly typed language, is not near the level of Rust

DasIch · on Feb 11, 2021

Arrays and tuples are very different. Arrays are homogeneous structures, all entries have the same type. Tuples are heterogeneous, so entries can have different types.

Blikkentrekker · on Feb 11, 2021

They're not very different in that arrays can be thought of as a more specialized form of tuples that can thus coerce to slices.

In theory, arrays do not need to exist, and could in theory be expressed as tuples, but that would lead to rather unpleasant syntax. `(A, A, A, A, A, A, A, A, A, A)` is certainly not as nice as `[A; 10]`; — Rust could even have chosen to make the latter syntactic sugar for the former and make them interchangeable.

The crux to this, is that in Rust, unlike in many other languages, the length of an array is static and known at compile time. Many languages lack such a datatype, as it isn't strictly needed per the aforementioned reasons.

steveklabnik · on Feb 11, 2021

... and in some sense, tuples are also isomorphic to structs.

Product types: woo!

andrewflnr · on Feb 12, 2021

There's at least one language (Haxe?) that uses exponentiation syntax for array types, playing in the product type idea: (A,A,A,A) = A^4

no_wizard · on Feb 11, 2021

Thanks, after further thought, I misread the example code they posted, and realized this. I'm still getting used to how Rust specifies generic vs the way you would specify something as generic in TypeScript (a language I am far more familiar with as I use it every day in my job and all my other projects). That's what got me.

Addendum Explanation (feel free to skip readers):

When I say that, this is what I mean:

In TypeScript, an accepted thing about generic is that I can have multiple declared types either via a union e.g.

`TypeA | TypeB`

or intersection

`TypeA & TypeB`

In general practice, it is assumed you can do this, and not vice versa. If this is supposed to be avoided, you typically will use a constraining Type or specify a specific interface / type instead of making it outright generic.

This is not always the case with a language like Rust, it's much finer grained, so I must try and get out of the habit of thinking this way. It's the opposite, that generic typing is constrained by the structure you put them in, as I understand it.

This of course, is a super general overview of what I'm getting at, and there is a lot more nuance to TypeScript as well, but my everyday practice of using the language for the least 6 years or so has proven this to be the case often. I'm finding (and I do like this for what it's worth) Rust is not like this outwardly.

matt_kantor · on Feb 11, 2021

I'm interested in understanding your addendum because I often find myself teaching programmers new languages by mapping concepts from languages they already know.

Here's a TypeScript analogy that might help clarify things: https://www.typescriptlang.org/play?#code/C4TwDgpgBASgrgZ2AG...

I'm not sure what you're getting at when mentioning unions & intersections. Are you comparing a type like `Foo<number | boolean>` with `Bar<number, boolean>`? Those are conceptually very different (even in TypeScript), but maybe I'm misunderstanding.

The big difference is structural vs nominal typing and the fact that Rust doesn't really have subtyping (besides lifetimes). TypeScript types are defined by their structure—an object type you define can be compatible with a different object type from a separate library that never heard of you, just by nature of sharing the same properties. However Rust types are delineated by their names/identities—two separate types are not directly interchangeable even if they're defined identically.

no_wizard · on Feb 11, 2021

Yeah they're definitely different. My point being that in TypeScript though, its generally assumed (unless a constraint type is used or an explicit type / interface) is used, you can generally get away with using a union or an intersection to specify a generic type value.

So for instance, `Foo<number | boolean>` or `Bar<number, boolean>` wouldn't be out of place on say, a function's return type depending on what it does. Not that you should but you most certainly can. In some cases (like dealing with fetch results) its infinitely useful when doing foundational library work on a project / app / library. In other cases, I may want to use something like an intersection type to compose a return type or value type of two interfaces that are similar enough to be merged into one, or I have a composition function that merges data structures together etc.

Fundamentally, with Rust, I have found, and again, (again, disclaimer: I'm really new at this language), that saying something is generic does not imply you can saturate type values like the examples I gave above (the real thing I was getting at). Its not common place to see this (at least, not that I've read or seen yet). Typically, there is a level of explicitness even within generics, like mentioned with the array. It has to be homogenous so A union type is a no go (at least, homogenous in my mind means of one uniform type). An intersection type might be idiomatic but I don't know if Rust even has this equivalent in that way. The way I think of it myself is that the compiler is trying to do the right thing not just for the developer but for the program, so these constraints likely tie back to memory safety and efficiency, so yes, the interchangeability aspect throws me for a loop when reading (it's the same words but a vastly different context) sometimes.

Even C# was less strict in this way (but still a bit more strict than TypeScript, though you could just box and unbox the very annoying to see in practice most of the time `object` type since all types derive from the C# object type). F# was better in that you could do type aliasing which made certain things better (like generic Record types. I still believe F# is a better language overall in terms of ergonomics. I wonder if I could just leverage that instead of Rust, but I think the binaries would be much big (easily 50mb plus) for what I'm trying to achieve, which is web dev tools, ala things like SWC)

matt_kantor · on Feb 11, 2021

TypeScript-style untagged unions don't exist in (safe) Rust. Instead, "or" is done with enums (which are tagged/discriminated unions—https://en.wikipedia.org/wiki/Tagged_union). One difference is that in TypeScript `number | number` reduces to just `number`, but Rust enums don't work that way (`enum Foo { A(f64), B(f64) }` has two distinct variants).

Check out these equivalent-ish programs:

TypeScript: https://www.typescriptlang.org/play?#code/C4TwDgpgBA6gTgQzJO...

    type Wrapper<T> = {
        value: T
    }
    
    function toBoolean(x: Wrapper<boolean | number>): boolean {
        if (typeof x.value === "number") {
            return x.value !== 0
        } else {
            return x.value
        }
    }
    
    function unwrap<T>(x: Wrapper<T>): T {
        return x.value
    }
    
    const a = { value: 1.23 }
    const b = { value: true }
    toBoolean(a)
    toBoolean(b)
    const c = { value: "whatever" }
    unwrap(c)

Rust: https://play.rust-lang.org/?version=stable&edition=2018&gist...

    struct Wrapper<T> {
        value: T,
    }
    
    enum BooleanOrNumber {
        Boolean(bool),
        Number(f64),
    }
    
    fn to_bool(x: Wrapper<BooleanOrNumber>) -> bool {
        match x.value {
            BooleanOrNumber::Boolean(b) => b,
            BooleanOrNumber::Number(n) => n != 0.0,
        }
    }
    
    fn unwrap<T>(x: Wrapper<T>) -> T {
        x.value
    }
    
    fn main() {
        let a = Wrapper { value: BooleanOrNumber::Number(1.23) };
        let b = Wrapper { value: BooleanOrNumber::Boolean(true) };
        to_bool(a);
        to_bool(b);
        let c = Wrapper { value: "whatever" };
        unwrap(c);
    }

In Rust you also have traits to play with: https://play.rust-lang.org/?version=stable&edition=2018&gist...

    struct Wrapper<T> {
        value: T,
    }
    
    trait Boolable {
        fn to_bool(self) -> bool;
    }
    impl Boolable for bool {
        fn to_bool(self) -> bool {
            self
        }
    }
    impl Boolable for f64 {
        fn to_bool(self) -> bool {
            self != 0.0
        }
    }
    
    fn to_bool(x: Wrapper<impl Boolable>) -> bool {
        x.value.to_bool()
    }
    
    fn unwrap<T>(x: Wrapper<T>) -> T {
        x.value
    }
    
    fn main() {
        let a = Wrapper { value: 1.23 };
        let b = Wrapper { value: true };
        to_bool(a);
        to_bool(b);
        let c = Wrapper { value: "whatever" };
        unwrap(c);
    }

I find there's a decent mental shift switching between TypeScript and Rust, despite some superficial similarities. I use both languages in different codebases and it usually takes me a bit to adjust when hopping back and forth. Both type systems have useful worldviews, but they're pretty different, and they nudge you towards different ways of structuring your code.

zucker42 · on Feb 11, 2021

I don't know if you have the right impression of what the feature does.

Consider if you write a generic function which takes as an argument any type that has an indexing operation. In the past that function couldn't take an array as an argument. Now it can.

mkesper · on Feb 11, 2021

Seems to have vanished, was put out too early?

OK, can see it now, too. Caches...

steveklabnik · on Feb 11, 2021

No, it's there, not too early. Best guess is that you have an old cache. This happens sometimes, from what I hear from release team folks. Should sort itself soon.

no_wizard · on Feb 11, 2021

(I'm noting this separately from my other comment)

I found that not using the ISP DNS on your router helps with this problem on several sites, this being one of them, among other benefits

I recommend everyone use something like OpenDNS, Google's public DNS resolver (if you're comfortable with that) or CloudFlare via 1.1.1.1 (and associated addresses), or any other myriad of reputable third party DNS services

no_wizard · on Feb 11, 2021

Here's the full release notes on GitHub as an alternate if you're having trouble with the link:

https://github.com/rust-lang/rust/blob/master/RELEASES.md#ve...

PartiallyTyped · on Feb 11, 2021

Still vanished for me, here is a webarchive link.

https://web.archive.org/web/20210211144406/https://blog.rust...

nitsky · on Feb 11, 2021

With all the exciting changes coming to Rust soon, I think this release would be appropriately called "the calm before the storm".

mlindner · on Feb 14, 2021

What are those? I thought Rust was finally stabilizing and stopped so many rapid fire changes.

steveklabnik · on Feb 14, 2021

The initial version of const generics is in the next release.

We have stabilized, anything added does not break. But we are still adding new things, though they are mostly minor at this point. This is one of two or three big features that people are still expecting us to add.

vlovich123 · on Feb 11, 2021

Are there any plans to extend the ability so that user code can use niche values that aren't 0? Unless I failed at figuring out how to find the relevant docs, I only see NonZero formally documented but this release is clearly utilizing -1 instead (& I'm assuming it's not just doing +1/-1 math when accessing just to leverage NonZero).

steveklabnik · on Feb 11, 2021

Off the top of my head, I don't think there is, but not due to some objections to the feature, but because doing this is a pretty niche feature (pun absolutely intended.) Nobody has been motivated enough to do the design and consensus building work.

vlovich123 · on Feb 11, 2021

This came up a ton when I was doing particle filters on mobile. We had optional doubles that were initialized to NaN but NaN isn't a valid value. Being able to use optionals instead would be great (to avoid having to backtrace where an accidental NaN propagation may have started from) but the memory hit was impractical. I bet some parts of the science community might need this but the need is so spread out (each individual application) that it might not be as obvious as a popular library identifying the need.

I wish I knew language design & compiler development better & had the time to propose it & see it through.

conradludgate · on Feb 11, 2021

FiniteFloat or NotNaN would be nice additions to the stdlib

jsheard · on Feb 11, 2021

NonMax types would be nice to have, since the natural "null" value for an index into some linear data structure is the highest one

In the meantime it can be done by wrapping NonZero and storing !value, but that's a (tiny) runtime overhead on each get/set

da_big_ghey · on Feb 11, 2021

I am looking to learn Rust, but am wondering, what resources are good for learning the latest language version? I don't want to pick up something that teaches me an old version so I then have to go and get used to the new one.

steveklabnik · on Feb 11, 2021

New versions of Rust come out every six weeks, so you basically can never find something that is covering the absolute latest head.

However, new features are additive, so you won't learn stuff that's incorrect, you just may not be aware of later, new things.

lmkg · on Feb 11, 2021

I agree that versions aren't worth worrying about, but which edition is something I would pay attention to.

_xrjp · on Feb 11, 2021

Yeah, however I think that a good one could be 2018 Edition IMO.

vincenv · on Feb 11, 2021

I found the Rust Programming Language book [1] very helpful for learning, and with rust installed you can open it by typing rustup doc --book.

[1] https://doc.rust-lang.org/stable/book/

baq · on Feb 11, 2021

pick any stable version released in the last few months, or perhaps the one shipped with a recent version of your favorite distro (if running linux) - changes are very rarely important enough to absolutely have to know them.

1.50 is a nice round number and for that reason i can recommend it :)

Waterluvian · on Feb 11, 2021

So does all this const work not break any existing usage but promotes the functions to "we can know more about and do more with at compile time"?

steveklabnik · on Feb 11, 2021

Correct.

cevans01 · on Feb 11, 2021

Couldn't any negative number be used as a niche for file descriptors in Unix? Could Option<File> use -2 to specify None?

duckerude · on Feb 11, 2021

Based on https://internals.rust-lang.org/t/can-the-standard-library-s... it looks like that's probably the case, but that it's not beyond all doubt that it holds across all Unix-likes.

> On Linux, the answer is pretty obviously no. Linux file descriptors are stored in an array of structs that has its capacity bounded to INT_MAX, so any negative int would either be considered nonsense (if treated as negative) or be higher than INT_MAX (if it was bit-reinterpreted as an unsigned value).

> The Single UNIX Specification explicitly says that open can't return a negative file descriptor, and says in the page for dup2 that you should get EBADF if you try to claim a negative file descriptor.

> Unfortunately, their description of file descriptor allocation never explicitly says that the negative range is out of bounds, but open references this algorithm while simultaneously claiming that it never gives a negative result.

> Also, Wikipedia says it can't be negative, but they don't give a source on that particular claim (urgh!).

-1 was picked as a conservative choice that's enough for the common case of Option.

baq · on Feb 11, 2021

i don't know, but -1 has the prettiest two's complement representation of all negative ints :)

coldtea · on Feb 11, 2021

  pub fn clamp(self, min: f64, max: f64) -> f64

Rust has generics iirc, so why there's this?

masklinn · on Feb 11, 2021

I guess because floats implement PartialOrd, not Ord, and what would you return if one of self, min, and max is non-ordered? Returning an `Option<T>` would usually be inconvenient.

Therefore clamp is implemented on Ord (meaning there's always a value to return), and there's an efficient implementation on floats which can define its behaviour with respect to NaN.

If you want the actual details,

* https://internals.rust-lang.org/t/clamp-function-for-primiti...

* https://github.com/rust-lang/rust/issues/44095

pornel · on Feb 11, 2021

Also the float implementation was chosen to optimize well with SSE, even in presence of NaNs.

crazypython · on Feb 11, 2021

[flagged]

ibraheemdev · on Feb 11, 2021

Hacker news is a democracy in that the people decide what gets to the top. Just because you do not like the topic does not mean others share your sentiment.

password321 · on Feb 11, 2021

Seems like anything to do with Rust these days.

AlchemistCamp · on Feb 11, 2021

Really too bad that after another minor version, I'm still seeing:

  Warning: can't set `control_brace_style = ClosingNextLine`, unstable features are only available in nightly channel.

It seems like a trivial and desirable feature but it's only been available in nightly for over a year now. The poor (and rigidly enforced) default on this same issue was why I ripped prettier out of every project I work on.

Also, I don't know how hard it is to contribute to this kind of issue, but I'm glad to put in some time (with my still newbie level of Rust skills) if that can help resolve it.

johnisgood · on Feb 12, 2021

I wonder how difficult https://github.com/rust-lang/rfcs/blob/master/text/0066-bett... is given that it is from 2014-05-04.