This is wrong, you absolutely can have an *idea* of what a C (and most of the ti...

undershirt · on Oct 17, 2019

From: https://queue.acm.org/detail.cfm?id=3212479

> Compiler writers let C programmers pretend that they are writing code that is “close to the metal” but must then generate machine code that has very different behavior if they want C programmers to keep believing that they are using a fast language

Crinus · on Oct 18, 2019

That article relies on the flawed premise that because modern CPUs do not expose their real inner workings then C is not a low level language. However this is irrelevant because as a programmer you do not have any access below what the CPU itself exposes - if the CPU exposes an API (its instruction set) that pretends to be serial then it doesn't matter if underneath the seams things happen in parallel since you simply are not given any control over that. From the perspective of someone who is working against such an instruction, C is a low level language since there is little lower level between it and what is exposed to the programmers by the CPU.

Beyond that it doesn't really invalidate anything i wrote and is only tangentially relevant to my comment (where i didn't even mentioned C as a low level language, i only said that you can have an idea of what sort of instructions a C compiler will generate for a piece of code if you study its output for a while), why did you post it without any comment of your own?

gpderetta · on Oct 17, 2019

To be fair, most of the set of optimization parameters enabled for the various for -Ox, including x=fast, is usually documented.

At least in GCC though, there are a few optimizations included in the various -O flags that have no corresponding fine grained flag (usually because they affect optimization pass ordering or tuning parameters).

Crinus · on Oct 18, 2019

Yes they are documented, though the documentation is really something like "-fawesome-optimization, enabled by default on -O3" and the "-fawesome-optimization" has documentation like "enables awesome optimization" without explaining much more than that.

And even then pretty much every project out there uses "-Ofast" instead of whatever "-Ofast" enables without caring about what it does or how its behavior will change across compilers.

gpderetta · on Oct 18, 2019

-Ofast enables fast-math optimizations and generally is not standard compliant. I hope projects do not deliberately enable it without thinking (as they say, it is hard to make stuff fool proof because fools are so resourceful).

Crinus · on Oct 18, 2019

My point was that options like -O<number> and -Ofast aren't the actual optimization switches, they turn on other switches and you do not know what you'll get - essentially wishing for fast code and hoping you'll get some (i mentioned -Ofast explicitly because of its name).

For example according to the documentation in GCC 7.4 -O3 turns on:

    -fgcse-after-reload
    -finline-functions
    -fipa-cp-clone
    -fpeel-loops
    -fpredictive-commoning
    -fsplit-paths
    -ftree-loop-distribute-patterns
    -ftree-loop-vectorize
    -ftree-partial-pre
    -ftree-slp-vectorize
    -funswitch-loops
    -fvect-cost-model

whereas in GCC 9.2 -O3 turns the above, plus:

    -floop-interchange 
    -floop-unroll-and-jam 
    -ftree-loop-distribution 
    -fversion-loops-for-strides

So unless you control the exact version of the compiler that will generate the binaries you will give out, you do not exactly know what specifying "-O3" will do.

Moreover even though you do know the switches, their documentation is basically nothing. For a random example what "-floop-unroll-and-jam" does? The GCC 9.2 documentation combines it with "-ftree-loop-linear", "-floop-interchange", "-floop-strip-mine" and "-floop-block" and all it says is:

> Perform loop nest optimizations. Same as -floop-nest-optimize. To use this code transformation, GCC has to be configured with --with-isl to enable the Graphite loop transformation infrastructure.

...what does that even mean? What sort of effect will those transformations have on the code? Why are they all jumbled in one explanation? Are they exactly the same? Why does it say that they are the same "-floop-nest-optimize"? Which option is the same? All of them? The -"floop-nest-optimize" documentation says:

> Enable the isl based loop nest optimizer. This is a generic loop nest optimizer based on the Pluto optimization algorithms. It calculates a loop structure optimized for data-locality and parallelism. This option is experimental.

Based on the Pluto optimization algorithms? Even assuming that this refers to "PLUTO - An automatic parallelizer and locality optimizer for affine loop nests" (this is a guess, no other references in the GCC documentation as far as i can tell), does it mean they are the same as the the code in pluto, that they based on the code and are modified or that they are based on the general idea/concepts/algorithms?

--

So it isn't really a surprise that most people simple throw out "-Ofast" (or -O3 or -O2 or whatever) and hope for the best. They do not know better and they cannot know better since their compiler doesn't provide them any further information. And this is where all the FUD and fear about C's undefined behavior comes - people not knowing what exactly happens because they are not even told.