When zero cost abstractions aren't zero cost

WiSaGaN · on Aug 10, 2021

According to this comment: https://old.reddit.com/r/rust/comments/p0ul6b/when_zero_cost... , `rustc`/`llvm` is indeed able to optimize the wrapped example to a big `memset`. Sure in this specific case, it is different from the first one which delayed the zeroing, but that is not unintelligently `clone` many times as claimed in the article.

stormbrew · on Aug 10, 2021

I don't really feel like this is an issue of "zero cost abstractions". The baseline expectation should be that it will do something reasonable, which in this case is probably `if Copy do bit-setting with memset/bzero if possible, otherwise clone() loop` which is what it appears to do according to a reddit comment linked to elsewhere.

I'm not really sure it follows, though, that you should ever expect more than that. It's nice if you can get it but the presence of a very specific optimization shouldn't be taken as the new "zero cost" baseline, and I don't think it should be taken as given that a newtype will inherit all possible optimizations of its enclosed type.

erk__ · on Aug 10, 2021

I think a proper solution would be to expose the trait `IsZero` [0] as that is used to figure out when it is correct to use `calloc` instead of having to do `malloc` -> `memset` [1]. I don't know if there is any plans for exposing that trait, but it would be a nice thing to see.

[0]: https://github.com/rust-lang/rust/blob/master/library/alloc/...

[1]: https://github.com/rust-lang/rust/blob/master/library/alloc/...

conradludgate · on Aug 10, 2021

I think you don't need an IsZero trait. It needs to know if the bytes are zero. A custom implementation of IsZero might not hold that guarantee.

However, if the type is Copy and all the bytes are zeros, then you know it can be calloc

tialaramex · on Aug 10, 2021

(As you likely know) Normally in Rust the trait is a gentleman's agreement. If my type does not have total Equivalence then I don't implement the Eq trait, and so if a type has Eq then it's promising to deliver total Equivalence. You can, in fact, just defy this and make types that do something "completely wrong" while implementing the syntax of the Trait and I provide a suite of them so that other people don't need to do that experiment:

https://crates.io/crates/misfortunate

However, Rust does have two other tricks up its sleeve:

1. Unsafe traits. You can declare a trait to be "unsafe". Implementations of this trait likewise need that "unsafe" keyword and it alerts the programmer that, as with unsafe function calls, they are responsible for actually getting the implementation details correct. Not many standard library traits are unsafe because that's a considerable burden, but a few are, including Send and Sync.

You wouldn't learn much by implementing unsafe traits wrongly in a library like misfortunate, it's the same lesson as a C++ library that scribbles on random memory addresses.

2. Rust knows whether you implemented a trait by hand or merely derived it. In a few cases it can be reasonable for the language to distinguish these cases for built-in traits, since in the former case it knows the derived trait does what is needed even though a manual implementation may not.

It is possible that future improvements to Const Generics will rely on this latter idea. So long as you derive Eq rather than implementing it, Rust can reason that your type really is suitable as a constant type parameter whereas misfortunate::Always is not suitable (it would cause mayhem if permitted because of its idea of what "equality" is) despite claiming to be Eq.

tialaramex · on Aug 13, 2021

> Rust knows whether you implemented a trait by hand or merely derived it. In a few cases it can be reasonable for the language to distinguish these cases for built-in traits, since in the former case it knows the derived trait does what is needed

Too late to edit. This should clearly have said, in the latter case not the former case. In the hand implemented case the compiler has no idea whether your implementation has the desired properties.

conradludgate · on Aug 10, 2021

Good idea with unsafe traits. Maybe something like

  unsafe trait IsZero: Copy {
    fn is_zero(&self) -> bool;
  }

  #[derive(Copy, IsZero)]
  struct Foo {
    a: u8,
    b: u16,
  }

which could derive into

  unsafe impl IsZero for Foo {
    fn is_zero(&self) -> bool {
      true
        && self.a.is_zero()
        && self.b.is_zero()
    }
  }

conradludgate · on Aug 10, 2021

There needs to be some thought into how it works with Enums too. Rust currently has no built in standard for enum defaults (you have to implement default yourself). It would be nice to somehow mark one of the variants to be represented with the 0 discriminant (Like how Option::None is implemented)

estebank · on Aug 10, 2021

The RFC for that has been recently accepted[1] and the implementation is already available in nightly[2].

[1]: https://rust-lang.github.io/rfcs/3107-derive-enum-default.ht...

[2]: https://github.com/rust-lang/rust/pull/86735

conradludgate · on Aug 10, 2021

Unfortunately there's still no guarantee for which variant will be represented by 0

erk__ · on Aug 10, 2021

You may also have cases where no variant is 0 so this case is probably not as straightforward

estebank · on Aug 10, 2021

The comment is fair. There have been some talks about being able to allow end users specify niches so that if you could have something like

    enum E<T> {
        Empty = 42,
        Some(NonFortyTwo<T>),
    }

in the same way that the following is of size T today

    enum E<T> {
        Empty = 0,
        Some(NonZero<T>),
    }

but that's not possible today.

tialaramex · on Aug 10, 2021

Yes, I can see uses for this. Forty-two is obviously not something you'd actually want, but I can see needing numbers in the range 0-100 and being frustrated that since I need 0 I can't carve out a niche even though I don't mind losing 101 through 255 from a u8 instead of making my type wider.

Still the existing tricks get a lot done for relatively little work. I have used Option<NonZeroUsize> to do roughly what I'd use a single integer for in C, but making explicit that zero isn't just zero, but "I dunno, invalid". Would I have done that even if it cost more space? Probably, but it was cool that I didn't even need to consider that, "Zero cost abstraction".

erk__ · on Aug 10, 2021

Yeah something like this would be how I would like to see it, the trait is already there and is pretty much that (without the Copy bound though as that is not needed.) Hopefully it will end up public and can be implemented with a derive.

Measter · on Aug 10, 2021

You mean by checking the bytes of the data directly kinda like in this godbolt example: https://godbolt.org/z/78cvfshva

the8472 · on Aug 10, 2021

> However, if the type is Copy and all the bytes are zeros, then you know it can be calloc

That's not as easy to check as one would think since Copy types can still contain uninitialized bytes.

PoignardAzur · on Aug 10, 2021

Yeah, but in the common case all the uninitialized bytes are in the padding at the end of the struct (since Rust can re-arrange fields).

Especially in a case like in the article, where the type is very small (one byte), and the obstacle is that the author is using a newtype.

the8472 · on Aug 10, 2021

Knowing that they usually are at the end doesn't help since you still need to write a general runtime check that expresses "are all the bits for this type zero or uninit". Afaik currently there is no way to do this without causing UB. This would need some new intrinsic.

dspillett · on Aug 10, 2021

> The baseline expectation should be that it will do something reasonable

Surely baseline is that it does something correct. Reasonable (or just "not pathologically stupid" if you separate the two, with reasonable next) comes after. To my mind an optimising compiler is a compiler first, an optimiser second.

brundolf · on Aug 10, 2021

> the desire to optimize one specific case while neglecting the general case

This feels like a recurring pattern in Rust's design whether we're talking about performance, what's accepted by the compiler, etc. I'm not sure it's a bad thing exactly, but it leads to a lot of unintuitive footguns. Of course the consequence of these footguns is de-optimization or a compiler error, instead of a runtime exception or a memory error. But nevertheless it can make for a confusing landscape of behavior to navigate.

Should they not try and spot-optimize common cases where they can, just for the sake of predictability? No, I don't think so. And so I don't know what the answer is. I just often (and even more so when I was new to it) find myself surprised by Rust's general-rules-that-actually-have-notable-exceptions. Though who knows, maybe it's just not possible to make a language this powerful that doesn't run into this problem.

PoignardAzur · on Aug 10, 2021

> Should they not try and spot-optimize common cases where they can, just for the sake of predictability? No, I don't think so. And so I don't know what the answer is. I just often (and even more so when I was new to it) find myself surprised by Rust's general-rules-that-actually-have-notable-exceptions. Though who knows, maybe it's just not possible to make a language this powerful that doesn't run into this problem.

I think the industry as a whole has already accepted "try to optimize common cases on a best-effort basis, at the cost of unpredictable performance cliffs in unusual cases" as the default paradigm.

From multi-tiers JITs, auto-vectorizing, all the way down to branch prediction, source code is a leaky abstraction. If you don't care about performance, you still get the benefits most of the time. If you do care, then you need to peel back the abstraction and have a good understanding of what the compiler/JIT/CPU does behind the scenes.

brundolf · on Aug 10, 2021

Clarification: "Should they not try and spot-optimize common cases where they can, just for the sake of predictability? No, I don't think so."

I meant they "shouldn't not", as in I don't think it would make sense to do otherwise from what they're doing

codetrotter · on Aug 9, 2021

There are many useful and interesting comments in the discussion about this on r/rust.

https://old.reddit.com/r/rust/comments/p0ul6b/when_zero_cost...

erk__ · on Aug 10, 2021

Some months ago I ran into the same problem in slightly different clothing

  // 1st-way
  vec![0; len];

  // 2nd-way
  std::iter::repeat(0).take(len).collect::<Vec<_>>();

As told about in the blog post the first one will be able to be optimized via specializations to a single calloc call. The other here cannot use the same specialization as it does not seem to be able to specialize on the type of iterator yet. This means it will be a malloc followed by a memset when compiled.

godbolt: https://godbolt.org/z/8bM7zz4E7

the specific specialization: https://github.com/rust-lang/rust/blob/master/library/alloc/...

erk__ · on Aug 10, 2021

Doing some further research you can make the optimized initialization of the WrappedByte vector by using it as a iterator.

  vec![0u8; len]
    .into_iter()
    .map(|x| WrappedByte(x))
    .collect::<Vec<_>>()

which will give you a list of WrappedBytes but initialized with calloc. I assume this is because llvm can see that the map is a identity function in this case and then can optimize it out.

Godbolt: https://godbolt.org/z/hxnnPcjvG

topspin · on Aug 10, 2021

That we might expect compilers to boil this statement down to calloc(n, s) is astonishing. The subtilty compilers is amazing.

tialaramex · on Aug 10, 2021

If you write some of the "clever" pop count algorithms in C or C++, good compilers will go "Oh you're doing bit counting" and on a modern platform with a bit counting CPU instruction your whole algorithm turns into one CPU instruction.

[In Rust the standard library provides count_ones() on integer types, and that's actually safely wrapping an intrinsic pop count feature]

madmoose · on Aug 10, 2021

C++ also has std::popcount since C++20. It should probably have had that from the beginning.

the8472 · on Aug 10, 2021

In this case it's a combination of compiler magic and specializations in the rust standard library to make it easy for the compiler to understand that this particular iterator shape is a NOP.

pdimitar · on Aug 10, 2021

What does that post prove except that "allocating memory that must be zeroed on use but is never used" is million times faster than "allocate memory and zero it right away"?

Sure, I can see how the code doesn't immediately tell you the difference but after you understood it, isn't it completely logical?

The title seems clickbaity and not accurate when you dig into the article.

Hackbraten · on Aug 10, 2021

Users who write Rust code don’t necessarily know the underlying optimization (nor should anyone expect that they do).

Do you think the title is still inaccurate even from that point of view?

markenqualitaet · on Aug 10, 2021

As a Rust beginner, I would never assume code I write to not be executed, or cut out by the compiler. I know this sometimes happens, but I wouldn't count on it for lack of understanding. I don't think that's anyone's expectation who doesn't know the inner workings well, really.

pdimitar · on Aug 10, 2021

The entire article could basically be boiled down to your first sentence + the two code examples.

And IMO the title is still inaccurate because it attacks a general promised/desired property of Rust (and other languages) and I honestly don't see the connection. Looks like the observed behaviour is an unfortunate side effect of the `newtype` thing and is very prone to fixing and optimization in future versions. It's not a failure of the "zero-cost abstraction" ideal which is being chased after every day and is still in flux.

YawningAngel · on Aug 10, 2021

> nor should anyone expect that they do I'm not a systems programmer, so I might be wildly off base in what I'm about to say. However, isn't it in fact expected of systems programmers that they do understand in detail what the compiler will do with their code?

jcranberry · on Aug 10, 2021

Yeah, that was obviously not an optimization. You can't allocate 16 gigs in 5 micros. It might be problematic if you were relying on the lazy allocation of course.

The second example doesn't seem compelling to me either...is a newtype of a reference type a common application?

hitekker · on Aug 10, 2021

> The point of this post is not to bash the Rust team, but rather to raise awareness. Language design is a difficult process full of contradictory tradeoffs. Without a clear vision of which properties you value above all else, it is easy to accidentally violate the guarantees you think you are making. This is especially true for complex systems languages like C++ and Rust which try to be all things to all people and leave no stone of potential optimization unturned.

This paragraph is worth highlighting. Say yes to everything and vision becomes meaningless. Can't move in one direction when you're pulled by an infinite number of stakeholders on all sides.

throwaway17_17 · on Aug 10, 2021

This gets a strong second form me. I don’t particularly care for Rust, but that is personal preference. This point stands strong for a wide swath of languages (both compiled and interpreted). There is a seemingly endless pull to be all the things, to all the people for a lot of languages. I would much prefer seeing a smaller set of highly focused languages with literally seamless interop between them.

gridspy · on Aug 10, 2021

The article is not about what you think it's about.

1. Turns out that zeroing u8 is a special case so if you allocate 17 GB of your custom structure Rust is gonna init that for you.

2. Okay, so you don't understand how RefCell works? The caller to this function should be holding r, not passing it into the function. Of course that will just correctly Panic, so it's not wonderful. But that is the point of RefCell.

Generally more stuff runs fast in Rust than in other languages, but you can always find bad counterexamples in any language.

celeritascelery · on Aug 10, 2021

> 2. Okay, so you don't understand how RefCell works? The caller to this function should be holding r, not passing it into the function. Of course that will just correctly Panic, so it's not wonderful. But that is the point of RefCell.

That’s not the interesting part. The point is that rust won’t optimize a lifetime inside a struct the same way it will optimize an actual reference. It can’t assume that the lifetime will be valid for the whole function (as it can with ref). This limits the amount of reordering you can do in the optimizer.

The example you were looking at was explaining why that is the case.

gridspy · on Aug 10, 2021

Okay, thanks for clarifying that.

Ultimately I'm happy for Rust to only be so smart, of course there are going to be limitations. But you are right, the author was making a clear example and my comment doesn't make sense in that context.

spoiler · on Aug 10, 2021

Isn't this (at least in part) addressed by NLLs[1], or am I misunderstanding?

[1]: https://github.com/rust-lang/rfcs/blob/master/text/2094-nll....

the8472 · on Aug 10, 2021

> However, when v has an arbitrary type, it will instead just clone() it for every single element of the array, which is many orders of magnitude slower. So much for zero cost abstraction.

Arguably the WrapperType, i.e. any arbitrary type is the general case and thus the baseline performance. `u8` is the special case (hence specialization) that performs the alloc_zeroed optimization. So it's not really that the abstraction adds a cost. It's just that the special case removes a cost paid by everything else.

In the future the vec initialization might be fixable, but this requires turning potentially undefined values into some valid if arbitrary bit patterns (i.e. llvm's freeze) to compare it against zero.

CountHackulus · on Aug 10, 2021

There are no zero cost abstractions. (According to Chandler Carruth, and he makes a convincing point). https://www.youtube.com/watch?v=rHIkrotSwcc

josephcsible · on Aug 10, 2021

Most people consider "cost" in "zero-cost abstraction" to refer to runtime cost. That presentation is saying that if you let it refer to other kinds of cost too, then nothing is zero-cost, but to me that just means he picked a bad definition.

dkersten · on Aug 10, 2021

And even then, I've often heard it stated as "zero cost if you don't use it", eg, exceptions have zero runtime cost unless an exception is actually thrown, so, given that they should only be thrown in exceptional cases, that really shouldn't impact your program. (Whether or not its truly zero cost depends, I guess, eg does memory count as cost? What if it makes something not fit in cache?)

bigcheesegs · on Aug 10, 2021

I love that exceptions are the classic case for this, because it's not even true that "zero-cost exceptions" are zero _runtime_ cost on the non exceptional path. The most trivial example is they block vectorization.

TeMPOraL · on Aug 10, 2021

Well, exceptions shouldn't be the classic case here, because - as you say - they're typically not zero cost.

Zero-cost abstraction is what you get when your abstraction gets compiled away to nothing. Like properly written value wrappers in C++, or (presumably) newtype in Rust. These things disappear from final assembly.

jcelerier · on Aug 10, 2021

The definition of zero-cost abstractions, as introduced in C++, was that "it can't be slower that code than the equivalent hand-written code" (e.g. code not using the abstraction).

In that regard, exceptions are interesting as if you're on the happy path (which should be 99.999% of the time - a normal program that uses exceptions as error handling method should not encounter any exception if you do a `catch throw` on an average run of the software), they can cost less than return-value-based error handling (https://nibblestew.blogspot.com/2017/01/measuring-execution-...). If you're on a "sad path", though, they will cost more.

What is pretty sure is that since compilers learned to put the "sad" path in .cold section, the code size issue has become a 100% non-issue, the "sad" path won't bloat the hot, exception-less path ; in my experience, exceptions are in cases that matter a negative-cost abstraction.

anon1608 · on Aug 10, 2021

As always with microbenchmarks...

  struct Error* create_error(const char *msg) {
    struct Error *e = malloc(sizeof(struct Error));
    e->msg = strdup(msg);
    return e;
  }

Is it measuring exceptions vs return codes, or creation cost of std::runtime_error(const char *) (small string opt?) vs malloc() + strdup?

PragmaticPulp · on Aug 10, 2021

That’s a disingenuous misinterpretation of what “zero-cost” means. Zero-cost refers to runtime performance of compiled code.

It’s also disingenuous to pretend there isn’t some downside to not using the abstraction. Obviously you need to evaluate the tradeoffs.

High level code itself is a sort of abstraction. We could write raw assembly all the time to remove that abstraction and associated costs, but clearly that’s not very productive.

quotemstr · on Aug 10, 2021

Chandler is simply wrong here. Every single line of code has a cost. There's nevertheless massive benefit to that cost not occuring on millions of end user machines and instead only once in the mind of the developer.

jasonhansel · on Aug 10, 2021

Personally, this is why I think specialization is a bad idea. It means that minor refactors can have very strange impacts on performance, and that every API now has an (often undocumented) set of types for which it is unexpectedly much more performant.

still_grokking · on Aug 10, 2021

What would be the alternative?

Not specializing at all would make your code awful slow (as you would need some quite complicated and costly treatment of generics at runtime).

"Specializing by hand" is not realistically doable besides toy programs.

And not having generics in the language at all will result in "solutions" like in Go…

I see no way around (semi-)automatic specialization done by the compiler.

the8472 · on Aug 10, 2021

You already get this with optimizing compilers. They do not guarantee to optimize all equivalent patterns equally.

worik · on Aug 10, 2021

Arguing against zero cost abstractions should use as a counter argument an abstraction that is advertised as zero cost.

I do not think wrapped types are so advertised.

quotemstr · on Aug 10, 2021

Wrapper types in C++ really are zero cost. Rust is simply broken here, sorry. That happens when you throw away 30 years of work to reinvent the wheel because you like variable names before types or something.

adwn · on Aug 10, 2021

> That happens when you throw away 30 years of work to reinvent the wheel because you like variable names before types or something.

That's an extremely ignorant and incorrect take on Rust, and your phrasing makes me believe that you're doing this intentionally.

yyyk · on Aug 10, 2021

The 'zero cost' attitude is one of C++ problems. Zero cost abstractions are never really entirely zero cost. You still pay, often in something that you didn't bother to measure, which (in C++) is usually compile time or programmer productivity or programmer cognitive cost or compiler complexity.

There's a reason a lot of the complaints about C++ mention bloat or the difficulty of practically choosing a safe subset (you could use a subset, but you're very likely dependent on coworkers, existing codebases and library authors which may not share your subset). A lot of people added 'zero cost' stuff, and all of that has a cost.

For example, lets say Rust's newtype pattern worked perfectly everywhere, and all of the author's examples run fine as u8 with no runtime cost. There would likely still be a small cognitive cost to learn this, a small cost to unwrap the types (for other programmers reading the code), and a tiny cost in compile time. Typedefs are worth it, and it's all very reasonable to pay this!

But there's still a cost, and when people never believe there's a cost a language ends up like C++.

10000truths · on Aug 10, 2021

'Zero cost' is generally meant to mean 'zero runtime cost'. For most workloads, an increase in compile times is okay for an optimized production build. In the rare cases where that's not enough, I find that using precompiled headers and splitting code into translation units helps significantly.

yyyk · on Aug 10, 2021

Yet many people never look at other costs, so de facto they're assuming 'insignificant costs for everything' which isn't true.

For example, C++'s template language as originally implemented was technically 'zero cost', but C++ programmers paid for it a lot for a long time, in inscrutable error messages and slow compile times (this was fixed to a large degree with modern implementations and standards).

jjoonathan · on Aug 10, 2021

People didn't understand that they were paying in programmer / compile time / bad error messages? I don't believe that for a second. Those costs are extremely visible. As visible as it is possible to be. They wouldn't get more visible if you dressed them in reflective jackets, slapped a police light bar on top, and turned on a siren to make sure everyone was looking in the right direction. Who undergoes that kind of suffering and doesn't even notice? Nobody.

In contrast, the benefits of Zero Cost Abstraction are quite subtle. "Why would I want that? Ever heard of Moore's law? Caring about perf is so 1990s!" goes the immediate thinking. If you never have to write high-perf code, that reasoning is even correct! Of course, there are still many places where performance does matter, and being able to use high level language features on the very innermost loops, the places that halve or quarter the throughput of your $6000 graphics card(s) if you carelessly toss in even a single call of overhead, is quite something to those in a position to take advantage.

Since the caveats of ZCA are obvious and the benefits are subtle, I think it's perfectly fair to use the term as a way to draw attention to the latter.

tikhonj · on Aug 10, 2021

The problem with this viewpoint is that the baseline is pretty arbitrary. Using a newtype carries a conceptual cost of wrapping and unwrapping... but using u8 directly carries the conceptual cost of remembering what any given u8 means! What makes that the baseline and newtype wrapping a cost over that as opposed to using domain concepts in types as the baseline and using native types the cost?

yyyk · on Aug 10, 2021

Well, imagine that your code is serializing and deserializing someone else's type. You probably care more in that context about the underlying type and not the newtype, and the newtype doesn't help you.

Now, I did say this option was worth it. And there are cases of being too conservative (Golang?). But when designing and programming, one should look at the tradeoff. Abstraction is not always worth its costs [EDIT: and sometimes you can have less costs by a better abstraction, but being aware of the costs helps to think about the better abstraction].

dnautics · on Aug 10, 2021

that's exactly what should keep you awake at night. Because the conceptual cost is arbitrary and not meaningfully measurable in a quantitative sense, people tend to elide over it, but though it's not measurable that cost is no less real and might bubble up in the form of someone losing money, or in some cases, causing physical harm.

PragmaticPulp · on Aug 10, 2021

Zero cost refers to runtime cost, as in the compiled code.

> You still pay, often in something that you didn't bother to measure, which (in C++) is usually compile time or programmer productivity or programmer cognitive cost or compiler complexity.

Obviously everything is about tradeoffs. If an abstraction is making everything worse for you, then don’t use it. I don’t think it’s fair to try to list every possible downside of an abstraction, as there are obviously also downsides to not using abstractions otherwise we wouldn’t have these options.

Zero cost refers to the compiled code and runtime performance.

wvenable · on Aug 10, 2021

'zero cost' is what makes some things possible. If your perspective is one where CPU and memory is effectively infinite then yes, who cares about zero cost. The last C++ project I worked on I had like 32k of storage and some kilobyte sized amount of RAM. If C++'s abstractions weren't zero cost it would have been impossible to use in this environment. Java, C#, Go -- none of these are even close to possible.

Why wouldn't I just use C? Because of improved programmer productivity and readability of the code.

The reason C++ exists as a language today (and why Rust is it's direct competitor) is because of zero-cost abstractions. Because in many situations programmer productivity and compiler complexity is second to performance.

yyyk · on Aug 10, 2021

My takeaway wasn't 'never abstract'. That's obviously absurd. Even C itself has abstractions which it tries to make zero cost. Only that it's sometimes not worth it in the language, and C++ neglected to look at the other costs in the past.

wvenable · on Aug 10, 2021

No, that's the point of C++. If you want a language that doesn't do zero cost abstractions, there are plenty to chose from. If you want a language a step above C that gives you fine-grained control over exactly what code is generated and memory allocated than you have very few choices.

You're basically just arguing that C++ shouldn't be C++ (and Rust shouldn't be Rust). We already have plenty of languages that provide high-level "costly" abstractions. The reason we need C/C++/Rust is for that zero-cost aspect.

Historically, there were much fewer choices for languages and arguably C++ has been used for building applications that didn't need zero-cost features. But now we have plenty of alternatives and C++ still exists for that valuable niche that it provides. This is the reason Rust was created -- to provide this level of control without the baggage of C++.

psyc · on Aug 10, 2021

Bjarne and the C++ community are very clear on what the zero overhead principle is. If it isn’t holistic enough for you, that’s no fault of the principle. Your objection is either off-topic to what ‘zero cost’ means, or equivocation.

lenkite · on Aug 10, 2021

Why did a Rust defect become such a diatribe on C++ ? As far as I am aware, using a typedef/using type does not slow down zero-based initialisation of primitive types in C++. (last time I checked a few years ago - will apologise if this has changed).

tialaramex · on Aug 10, 2021

Rust wrapper types aren't like C++ typedef. If you just want an alias for the u8 type in Rust, you get the exact same performance:

type Tiny = u8;

The compiler knows a "Tiny" is just a u8 anyway and exactly the same code is produced.

The wrapper type is a distinct type, which is why you'd want one because it can have different properties from whatever it's wrapping.

jcranberry · on Aug 10, 2021

The newtype is analogous to what I've heard called a "strong typedef" in c++ lingo.

Using/typedef declarations in c++ only introduce a new name, not a new type.

yyyk · on Aug 10, 2021

Sorry, got bit a bit too much by C++ templates and a few other things. Rust views itself as a successor language, and I wouldn't like it to make the same mistakes.

nathias · on Aug 10, 2021

There is no zero cost abstractions.

hsn915 · on Aug 10, 2021

This is .. amazing? Using what essentially should amount to a strong typedef .. causes code to be millions of times slower.

This is worth bookmarking.

tialaramex · on Aug 10, 2021

This is mostly in fact an example of a big problem with benchmarks. What's measured here is the difference between never using an object (and so the actual work to prepare it can be elided, albeit by the OS not the programming language) and doing all the work up front in the expectation you'll use it, but then not using it.

u8 turns into "Hey, Linux kernel, zero these pages if I ever read them" (and then never reading them) whereas the opaque type turns into memset() zeroing all the pages.

Not doing any work is in fact a million times faster but your real program would have needed to do work, otherwise why bother having the vector? Whereupon the benefit disappears.

Where possible design your benchmarks to really do the thing you think you're measuring. If what you're measuring is nothing then be sceptical about supposed "performance" measured for that, since it's nothing, you're probably exploring the same space as the people who wanted to find out how much the human soul weighs (trick question, there is no such thing, but they put a lot of effort into trying to measure it anyway).

hsn915 · on Aug 10, 2021

Does any of this matter?

It's just an example to show that zero cost abstractions are actually not zero cost at all.

The language would have you believe the abstraction is zero cost. If you were to really believe it, you would never think about whether you need to allocate a vector of u8 or your custom byte type. They should be one and the same. That's the point of the promise of zero cost abstractions.

By the way, pre-allocating a lot of virtual memory upfront is not unheard of.

In this situation, to avoid paying the cost of the abstraction, you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8, then cast the result to a vector of my custom byte type. Maybe the compiler will not like the cast so I have to create an "unsafe" block and do some pointer casting? (I don't know if that's what you would need to do in rust, or if it would have been something else).

catlifeonmars · on Aug 10, 2021

> you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8

No, this is premature optimization. Rather, you would write the code in the most obvious way, and then profile it to figure out which optimizations are worth doing. At that point you can comment your code explaining why it looks all wonky :)

HexDecOctBin · on Aug 10, 2021

And applying this thinking everything means that your entire code is pervaded with slowness with no obvious hot spots, and you jump with joy since it obviously means that your code is as efficient as it possibly could be (because otherwise the profiler will show some peaks, right?).

catlifeonmars · on Aug 11, 2021

I know this was meant as (light) sarcasm, but have you ever profiled a nontrivial program that turned out to have zero peaks? My gut tells me this would be really difficult to do by accident. The same way that writing crypto code that is resistant to timing attacks is hard.

HexDecOctBin · on Aug 17, 2021

I have definitely seen code where the peaks are very shallow, and flattening them does nothing for the performance.

hsn915 · on Aug 10, 2021

You only need to encounter this problem a few times before you stop trusting the compiler at all.

qw3rty01 · on Aug 10, 2021

This definitely brings it closer, but on my machine, touching the entire array still ends up being 2 seconds slower (5.4s for the calloc side, 7.8s for the clone side)

irishsultan · on Aug 10, 2021

It's not using clone but memset. Of course memsetting memory to 0 when the operating system already set it to 0 is still silly and something that could be further optimized, but it's not cloning things.

qw3rty01 · on Aug 11, 2021

It does clone: https://stdrs.dev/nightly/x86_64-pc-windows-gnu/src/alloc/ve...

it might be optimized to a memset, but it's the clone codepath, as opposed to the u8 specialization codepath as discussed in the article

But the original commenter is talking about how the benchmark isn't useful because it doesn't touch memory, but you get similar results even if you do touch memory.

readams · on Aug 10, 2021

I think it's more accurate to say that there's an optimization that applies only to a very specific situation that's millions of times faster than without the optimization.

It's a bit strange to say that before the u8 specialization this was not a problem and it's suddenly a problem now.

ixtli · on Aug 10, 2021

Very well said. I actually came here to write a comment criticizing the post because the fact that this was “negative” has to do with the order in which the experiment took place. Flip it and you have a different sentiment. A similar point can be made but an explanation of “it’s very slow” presumes an assumption that every statement in rust will be maximally optimized via specialization which I consider to be an unreasonable expectation.

throwaway17_17 · on Aug 10, 2021

I’m not arguing against you, but why do you feel like expecting maximal optimization is unreasonable? Is it just when limited to specialization or do you think it applies more broadly in terms of general optimization?

hsn915 · on Aug 10, 2021

I expect the special case handling for u8 to also apply to types that are just a strong typedef around u8.

bilkow · on Aug 10, 2021

Rust has "typedefs" and those are called type aliases.

The intention of the newtype pattern is to specifically create a new type without the behaviour of the wrapped type, so if there's a special case just for u8, I would expect it not to be on the newtype.

jcranberry · on Aug 10, 2021

A "strong typedef" is not a typedef.

PragmaticPulp · on Aug 10, 2021

It’s misleading. One version is reserving the memory and will zero it upon use. The other version allocates the memory and zeros every byte of the memory up front by actually writing zeroes to it.

It’s basically competing abstractions. Letting the kernel reserve pages and only zeroing them when used will hide the cost and amortize it across access.

It’s worth knowing about but it’s also an artificial problem revealed by benchmarks.

Dylan16807 · on Aug 10, 2021

It's not really a matter of code going faster or slower, though. The real difference in speed is infinity/undefined.

One version initializes the memory and the other one doesn't. There's a lot of subtle ways to cause that in different languages.

tialaramex · on Aug 10, 2021

Technically it postpones initializing the memory, and then it exits before that is needed.

If you actually use the memory, both cases work out the same. There are too many languages where you can easily make the mistake of reading uninitialized memory, and that can't happen here, if you ask to sum() the vector you'll get a zero answer in both cases... and both programs will be similarly slow.

tialaramex · on Aug 10, 2021

(Note that you can't actually "sum" the code sample's wrapper type because it lacks the appropriate operations. I spent a while trying to actually make a working example where I trusted I was measuring something "real" and not artefacts and I eventually gave up. Definitely begin with an actual problem, so that you know what you're trying to do and can measure whether you're doing it or else you may just be wanking)

jokethrowaway · on Aug 10, 2021

That first example is misleading.

It's doing different things: In the first case it's not doing anything and it will request memory dynamically. In the second case it's zeroing 16gb of RAM.

The explanation linked at the end explains it better.

davrosthedalek · on Aug 10, 2021

Yeah, but taking >5s to zero 16GB is still atrociously slow.