Introducing a new, advanced Visual C++ code optimizer

kibwen · on May 5, 2016

Undefined behavior notice:

  > Historically, Visual C++ did not take advantage of the 
  > fact that the C and C++ standards consider the result 
  > of overflowing signed operations undefined. Other 
  > compilers are very aggressive in this regard, which 
  > motivated the decision to implement some patterns which 
  > take advantage of undefined integer overflow behavior. 
  > We implemented the ones we thought were safe and didn’t 
  > impose any unnecessary security risks in generated code.

  > A new undocumented compiler flag has been added to 
  > disable these optimizations, in case an application 
  > that is not standard-conformant fails: 
  > -d2SSAOptimizerUndefinedIntOverflow-. Due to security 
  > concerns, we have seen cases where these patterns 
  > should not be optimized, even though following the C 
  > and C++ standards allows us to [...]

It sounds like they're not doing anything that GCC/Clang aren't already, but if you've only ever compiled with MSVC then be careful not to overlook this.

gratilup · on May 6, 2016

There are only a few places where the undefined behavior is exploited right now. For those cases, if there is overflow in the original expression the program is pretty much screwed anyway; they are also done by LLVM/GCC. The case that mostly concerns people when they hear about undefined behavior (a+C > a -> true) was avoided, it would silently break too many applications.

TwoBit · on May 6, 2016

Since when does an overflow mean the program is screwed? Perhaps I misunderstand your meaning, but notvall integer overflows, and in fact probably only a small percentage of integer overflows, mean that a program is going to die or have some similarly bad behavior.

gratilup · on May 6, 2016

What I meant to say is that if you have an overflow in those cases, the results of the expression is definitely not what you wanted - this is going to propagate and "damage" other expressions. Applying the optimization in that case might produce a different result. An example is this new transformation from the blog post: (a * C1) / C2 -> a * (C1/C2), where a * C1 might overflow. For 8 bit numbers with a = 106 and C1 = C2 = 17 we get

initial: (106 * 17) / 17 = 10 (overflow) / 17 = 0

optimized: 106 * (17 / 17) = 106 * 1 = 106

So in this case the optimized version gives the expected result - it's still different than the initial expression, so it falls under the "undefined overflow" optimizations category.

ryuuchin · on May 6, 2016

IMO they should make -d2SSAOptimizerUndefinedIntOverflow- a permanent option (i.e. not a hidden flag). I also think they should automatically enable it (that is, disable the optimizations) if the /sdl option is passed to the compiler (I'll accept this instead of making it a permanent flag). The logic being that you really don't care about the performance if you're using /sdl to begin with and would rather it just generate safer secure code.

Edit: Also keep in mind that to disable these optimizations the complete flag is "-d2SSAOptimizerUndefinedIntOverflow-" minus quotations. That is it includes the dash/minus at the end to disable it.

daemin · on May 6, 2016

Why would you have it automatically enabled?

If you write MSVC only code you can add it to your compiler options for the projects that use this new compiler.

If you write cross platform code this is (potentially) one less thing that is different between compilers.

protomikron · on May 5, 2016

Interesting. Can anybody explain in a few sentences (if this is possible), why we have undefined behaviour in the first place? I know there are optimization reasons, but do we have data that confirms that turning on stricter flags (I think all major compilers let you avoid undefined behaviour) breaks lots of software?

PeCaN · on May 6, 2016

(Not guaranteed to be correct)

Long story short: historical accident. Modern safe systems languages like Rust and Ada¹ avoid undefined behavior unless you explicitly ask for unsafe code, and even then there aren't many _undefined_ things you can do (violate strict aliasing I guess).

Most undefined behavior could be implementation-defined, but the edge cases from porting C+Unix from the PDP-11 (which I understand to have been... quirky) to many other systems happened before the spec, then the spec authors had to make the Ideal C Machine a sort of subset of all the existing behaviors. Thus, some things that could've been implementation-defined ended up totally undefined (god knows what computers did on signed overflow back then—the compiler may not have been able to guarantee anything: hence undefined) and we've kinda stuck that way ever since.

That said, the people who shame the compiler writers for "exploiting undefined behavior to optimize microbenchmarks and look good" annoy me to no end. That's just how the spec turned out, put up or shut up (or use a nicer language that isn't trying to stab you in the back as soon as you let down your vigilance...).

Here's a fun history of C: http://pastebin.com/UAQaWuWG

¹ Strictly speaking, I recall the Ada spec has some wording similar to undefined behavior, but it's not something you hit often.

kibwen · on May 6, 2016

  > even then there aren't many _undefined_ things you can do

There's actually plenty of things you can do in an `unsafe` block in Rust that counts as undefined behavior (I don't know how you "explicitly ask for unsafe code" in Ada, but I'm confident the same is true there as well). Some of this falls out of the fact that Rust compiles down to LLVM IR, which itself has undefined behavior, but others are inherent (like demanding the preservation of the aliasing guarantees, as you mentioned).

For anyone curious about seriously using unsafe code in Rust (which is to say, using unsafe code in Rust at all, in any capacity), I recommend reading through the Rustonomicon: http://doc.rust-lang.org/nightly/nomicon/

pjmlp · on May 6, 2016

> I don't know how you "explicitly ask for unsafe code" in Ada

You need to import a virtual package and some times also make use of specific pragmas.

Ada, and Modula-3 are very much "into your face" in what concerns unsafe code.

For example, if you intend to do a conversion between data types without guarantee that the target can can hold a valid data representation, you need to import Ada.Unchecked_Conversion.

And do something like this:

http://www.adaic.org/resources/add_content/docs/95style/html...

Same applies to everything else deemed unsafe.

Ono-Sendai · on May 6, 2016

Undefined behaviour allows faster code. It also allows handling of situations that would otherwise have to be defined in some 'awkward' way, such as throwing an exception on out-of-bounds array reads.

Edit: To add to my reply - for example, Java has almost no undefined behaviour. (The single exception AFAIK is some kind of multithreaded data race situation). The cost paid is that e.g. bounds checks must be done (in general) on array accesses, as the behaviour is defined to be an exception thrown in that case. Similarly for null reference dereferences.

pjmlp · on May 6, 2016

Yes, but for 99% of code written out there there is nothing to gain from faster code due to undefined behaviour.

I am coding since the mid-80's, always been a fan of Algol lineage of programming languages, with C++ being the only exception to it, as it allows many similar features.

Never have I had a situation where better performance thanks to undefined behaviour was an advantage for the use case we were trying to solve.

I think outside games, HPC and winning compiler benchmarks there hardly an advantage exploiting it.

pcwalton · on May 7, 2016

> Yes, but for 99% of code written out there there is nothing to gain from faster code due to undefined behaviour.

That's not true. Load-load forwarding benefits tons of code that isn't "games, HPC, or compiler benchmarks".

pjmlp · on May 7, 2016

Sorry if I disagree.

If the solution delivered is within the time constrains required by the customer, any time spent optimizing ms out of the application is just wasted money.

jbb555 · on May 6, 2016

It's because C is supposed to be a thin wrapper over the CPU. So when you write '+' on two ints, it's supposed to do whatever the CPU's ADD instruction does. This can vary from CPU type to CPU type and the standard didn't want to impose behavior on an implementation that would be impossible or slow to implement on a given CPU.

For example most CPUs wrap from most positive to most negative on integer overflow, but one kind might just clamp the value at the most positive value. Or give an error, so forcing them to behave in a given way would make it hard to write efficient C programs on some platforms.

Recently some compiler writers decided that they could interpret undefined behavior to mean "Do anything we like".

So for example a sane compiler on x86-64 would compile this :-

    int test(int x)
    {
        return(x > x + 1)
    }

into a code which adds one to a variable then compared it with the original, and this absolutely can return both false and true. However compiler writers have gone Aha! Overflow is undefined, so we'll "optimize" this into always returning false. That is faster code, who wouldn't want faster code!

Personally I think this was never the intent of the standard and compiler writers are abusing the meaning of undefined in order to sneak in optimizations but realy they are just producing compilers that are less and less trustworthy as they no longer do that people want or need.

gratilup · on May 6, 2016

That's why that patterns is not applied in the new optimizer, unless the Bit Estimator proves x+1 does not overflow. After it was implemented to behave like what other compilers do, I analyzed several applications and libraries and it would have introduced silent security problems - I explained in the blog post.

rkangel · on May 6, 2016

Undefined behaviour exists because the creators of the C language wanted to leave compiler writers as much flexibility as possible to ensure that they produced code that was fast.

C was always a language that was intended to be used across multiple hardware targets and different processors have different behaviours. Restricting behaviour on certain arithmetic operations might require slow code to 'work around' the processor and generate compliant code. The creators didn't want that so specified as little as possible, leaving the rest to be 'Undefined'.

matthewaveryusa · on May 6, 2016

Signed data implies that at least 1 bit represents the signess of the underlying bytes. C did not want to impose which bit must represent the sign of the integer because that may have implications on the performance of hardware that was built to represent signess a different way.

ahoka · on May 6, 2016

It's a bit more complicated than that (in case of two's complement). BTW, is there anything modern that's not using two's complement?

gsnedders · on May 7, 2016

Pretty sure some DSPs don't. (In most cases of "is there anything modern", the answer is some DSP.)

loeg · on May 5, 2016

Makes me wonder why they called it "d2SSAOptimizerUndefinedIntOverflow" and not the existing `-fwrapv`.

gratilup · on May 6, 2016

Visual C++ does not use the GCC naming for flags. The flag will likely be renamed to something shorter in the final release in Update 3.

cokernel_hacker · on May 5, 2016

So they finally got SSA? Neat, right up there with LLVM and GCC.

It also sounds like they have something like LLVM's InstCombine pass, it'll be interesting to compare which cases they handle. Despite it being a peephole pass, InstCombine is actually one of the most important scalar optimizations in LLVM's arsenal.

I also wonder if they form SSA in the face of C++ and/or SEH exceptions.

Like if you have something like:

  bool f() {
    int x = 2;
    try {
      throw 0;
    } catch (...) {
      x = 4;
    }
    return x & 1;
  }

Will they insert a PHI of 2 and 4? The MSVC of today cannot optimize the return to a constant.

gratilup · on May 5, 2016

Hi, I'm the author of the blog post and optimizer. We had SSA since around 1999-2000, just that some important components like the expression optimizer were written before that and not updated with support for SSA. The difference is a lot more than just SSA, the blog has just a few examples. The new framework started as being somewhat similar to InstCombine, since then it got much larger and it will be the place where more optimizations will be integrated, hopefully replacing some older ones. To answer your question: SSA is built with both types of exceptions and the new optimizer works as expected, except a few cases where it would be unsafe. It seems that the rejection rules are a bit too aggressive in your example, it's obviously safe to optimize in this case - thanks for the example!

cokernel_hacker · on May 5, 2016

Cool, good to know! Another interesting example is:

  bool f(bool b) {
    int x;
    try {
      if (b) {
        x = 2;
        throw 0;
      }
      x = 4;
      throw 0.0;
    } catch (...) {
    }
    return x & 1;
  }

Here, a phi is needed on the catch.

Out of curiosity, can you give details regarding how your EH representation looks like in SSA?

gratilup · on May 8, 2016

The EH info looks the same when in SSA form - when building SSA the right PHIs are inserted for the values potentially defined in EH blocks.

cokernel_hacker · on May 5, 2016

For reference, clang targeting MSVC mode can do this:

  $ clang t.cpp -target x86_64-pc-win32 -S -emit-llvm -o - | opt -S -sroa -instcombine | grep 'ret i1'
    ret i1 false

inDigiNeous · on May 5, 2016

First time I'm hearing about SSA. Is it something me as a coder should be thinking about while writing code, or is it purely a compiler phase ? Thanks.

cokernel_hacker · on May 5, 2016

This is not something you should think about writing code. Unless that code comprises a compiler :)

SSA is a program representation which is easy for compilers to analyze and optimize. See https://en.wikipedia.org/wiki/Static_single_assignment_form for more.

munificent · on May 5, 2016

Purely a compiler hacker thing.

https://en.wikipedia.org/wiki/Static_single_assignment_form

mrich · on May 5, 2016

Thumbs up to Microsoft for investing that much into their C/C++ toolchain.

jjawssd · on May 5, 2016

I believe it is just a C++ toolchain and the C part is neglected. Has something changed?

cremno · on May 5, 2016

Since VS2013 some useful and popular C99 language features are supported. And then there's Clang/C2 but it can't be used via command-line yet.

Their standard library (besides being intentionally not fully compliant) is missing some C11 features though. I guess we have to wait at least for C++17:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p006...

xvilka · on May 5, 2016

Not all of them, and today is 2016, important C11 feature, like _Generic still not supported. And it would be awesome, if MSVC will support GNU extensions, like Clang does. For now I recommend to use clang-cl wrapper: http://clang.llvm.org/docs/MSVCCompatibility.html

pjmlp · on May 6, 2016

For Visual C++ only to the extent as required by the C++ standard.

Clang/C2 is exactly the path for those that still care about C on Windows.

terrymah · on May 5, 2016

This is both C and C++ - they share a common backend (c2.dll) where this work was done. They have different front ends (c1.dll / c1xx.dll).

zoren · on May 5, 2016

I believe the Bit Estimator example could be better:

  int test(unsigned char a) {
    short b = a;    // b: 00000000________, a: ________ 
    b <<= 4;        // b: 0000________0000 
    b |= 3;         // b: 0000________0011
    return b != 0;  // -> return true   
  }

The analyzer doesn't need any information about the two low bits of b to know b != 0. Had it been b ^= 3 instead of b |= 3, it would be a different matter.

gratilup · on May 6, 2016

I tried to have a simple example, maybe it was too simple. You are right, an OR with a constant is enough to know it is not zero. For this case the Bit Estimator also knows b = [3, 4083], writing something that takes advantage of this info would have been more interesting.

AaronFriel · on May 6, 2016

The tl;dr of this seems to be:

    * Visual C++ now has very aggressive SSA-based optimization rules. 

    * One of these optimizations is the bit estimator, which tracks the state of each bit of local variables.

I'm not aware of LLVM/GCC or other compilers using a bit estimator. I found some hits in the CS literature but I'm not aware of any other compilers using it, and it seems like a pretty potent optimization!

theresistor · on May 6, 2016

LLVM has had the same thing for a very long time, under the name SimplifyDemandedBits

flamedoge · on May 9, 2016

tl;dr +generated a lot more CMOV's. While conditional moves are a cheap way to get performance on poorly branch predicted programs, speculatively inserting them is _NOT_ a good idea for general performance as it may hurt performance (because of serializing conditional path) and should only be considered with Profiling information.

dkopi · on May 5, 2016

Love the a % 2 != 0 ? 4 : 2; optimization. This used to be one of my favorite interview questions for low-level positions - conditional assignment without branches.

wmu · on May 8, 2016

Nice question, I had to think for a while. :)

m4dc4pXXX · on May 5, 2016

It bothers me that they say implementing SSA eliminates the need for "complicated" data-flow analysis. Data-flow has a really nice theory behind it and enables a lot of optimizations. For example, dead-code elimination. But otherwise, kudos to MS!

pbiggar · on May 5, 2016

Except they are exactly right!

pbiggar · on May 5, 2016

I should add more context here. A data-flow analysis framework has significant overlap with an SSA-based analysis framework, except that the SSA-based one leads to simpler analyses which are more powerful.

A good comparison is CCP (conditional copy propagation) vs SCCP (sparse CCP) - the SCCP version is naturally flow-sensitive (that's a property of SSA form), so it can eliminate entire branches, leading to more dead code being eliminated.

rayiner · on May 5, 2016

Aren't SSA analyses generally only "kinda flow-sensitive" since values are only renamed where control flow merges, not where it splits? E.g.

  int x = ...
  if(x > 0) {
    use(x)
  else {
    use(x)
  }

In a DFA where you track information about each variable at each program point, you could push the constraint implied by the condition into each branch. If you do a sparse analysis on SSA form, that doesn't come naturally, right?

nikic · on May 5, 2016

To handle this case some SSA implementations add a concept of "pi" nodes, which are used to artificially split variables on branches that establish some useful data flow property.

    x1 = ...
    if (x1 > 0) {
        x2 = pi(x1 & RANGE[1..])
        use(x2)
    } else {
        x3 = pi(x1 & RANGE[..0])
        use(x3)
    }
    x4 = phi(x2, x3) // if used

I have placed the pi nodes in the blocks, but semantically they are placed along the control edge.

Ref e.g. e-SSA in the ABCD value range inference algorithm.

pcwalton · on May 5, 2016

That's correct; it doesn't come for free.

polskibus · on May 5, 2016

I wonder what is the level of .NET's CLR optimization in comparison to this new optimizer for C++. I hope they will reuse some of those ideas to make .NET better.

gratilup · on May 6, 2016

The optimizer will be ported to .NET Native soon, it should help code quality quite a lot. The JIT is a completely different matter though: it's a different project, done by a different team - copy-paste of the source code is definitely not going to work. JIT compilers also have different priorities, unlikely they would port all parts of a large optimizer.

micahbright · on May 5, 2016

Hopefully, they will just recompile the binaries, and the CLR will just get faster...

daeken · on May 5, 2016

That's not really the way it works. The CLR compiles .NET code to native machine code (either at runtime or ahead of time), which means you need to implement these optimizations at the code generation layer of the CLR. You can make the CLR implementation itself faster, but that's negligible compared to the run time of the generated code.

micahbright · on May 5, 2016

Are you saying the "code generation layer" of the CLR isn't itself written in c++?

All of this stuff trickles down without a doubt. I realize it isn't going to magically optimize the bytecode, however, the thing that runs the bytecode will be faster.

ynik · on May 5, 2016

The CLR doesn't have any bytecode interpreter that could get faster with this change. It's a JIT compiler that compiles the bytecode to native code.

micahbright · on May 6, 2016

Interpreter or not, the JIT compiler will certainly get faster with the c++ optimizations. The libraries that are linked against will get faster. The underlying OS components will get faster. Since .NET assemblies are mostly just glue code between framework code, which is mostly c++, everything should get a speedup out of this.

_RPM · on May 6, 2016

Why do they call it "Visual C++", What does the term "Visual" mean to them? Also, is their compiler standard compliant? (c++11, c++98)

Strom · on May 6, 2016

Visual C++ is just the product name, it's branding. It has no functional meaning beyond that.

As for the standards, there's pretty good partial support for C++11/14/17. One fairly recent blog post by them has some tables. [1]

[1] https://blogs.msdn.microsoft.com/vcblog/2016/01/22/vs-2015-u...

StephanTLavavej · on May 6, 2016

Fun fact: the compiler's version (19) is larger than the IDE's version (14) because the compiler predated the addition of "Visual" to the branding. (And the IDE skipped 13, while the compiler didn't.)

TwoBit · on May 6, 2016

Years ago Microsoft introduced Visual Basic, which had some drag and drop programmability. It was popular and so Microsoft management decided to put "Visual" in front of everything. And now it means nothing. Kinda like how IMax used to mean large movie screens, and it was popular so now that company calls everything the do "IMax" and it now meanscnothing.

pjmlp · on May 7, 2016

To add a bit more of context to TwoBit's answer, since the mid-90's, the only C++ compiler that was really visual is C++ Builder[1].

It is great for GUI applications, you get to develop C++ applications as if they were Delphi, VB ones, with all the power of C++.

For Microsoft, the lame MFC was already visual enough. To be faire to Microsoft the original MFC was much more powerful, .NET style, but many didn't like it that way, hence the thin layer over Win32 APIs[2].

However since Windows 8 and the introduction of Windows App Model, Visual C++ has actually become Visual. When developing Windows apps based on the new APIs introduced to replace Win32, you can develop pure native applicatins in C++ using XAML and the exactly the same RAD tooling as .NET WPF applications.

[1] https://www.embarcadero.com/products/cbuilder

[2] https://groups.google.com/forum/#!topic/microsoft.public.vc....