Hacker News new | past | comments | ask | show | jobs | submit login

This is wrong, you absolutely can have an idea of what a C (and most of the time, C++) compiler will generate. You may not know the exact instructions, but if you are familiar with the target CPU you can have a general idea what sort of instructions will be generated. And the more you check the assembly that a compiler generates for pieces of code, the better your idea will be.

Note that you almost never need to care about what the entirety of a "complex program" will generate - but often you need to care about what specific pieces you are working on will generate.

The C language itself might be defined in terms of an abstract machine, but it is still implemented by real compilers - compilers that, btw, you also have control over and often provide a lot of options on how they will generate code.

And honestly, if you have "absolutely no idea what kind of machine code" your C compiler will generate then perhaps it it will be a good idea to get some understanding.

(though i'd agree that it isn't easy since a lot of people treat compiler options as wells of wishes where they put "-O90001" and compiler developers are perfectly fine with that - there is even a literal "-Ofast" nowadays - instead of documenting what exactly they do)




From: https://queue.acm.org/detail.cfm?id=3212479

> Compiler writers let C programmers pretend that they are writing code that is “close to the metal” but must then generate machine code that has very different behavior if they want C programmers to keep believing that they are using a fast language


That article relies on the flawed premise that because modern CPUs do not expose their real inner workings then C is not a low level language. However this is irrelevant because as a programmer you do not have any access below what the CPU itself exposes - if the CPU exposes an API (its instruction set) that pretends to be serial then it doesn't matter if underneath the seams things happen in parallel since you simply are not given any control over that. From the perspective of someone who is working against such an instruction, C is a low level language since there is little lower level between it and what is exposed to the programmers by the CPU.

Beyond that it doesn't really invalidate anything i wrote and is only tangentially relevant to my comment (where i didn't even mentioned C as a low level language, i only said that you can have an idea of what sort of instructions a C compiler will generate for a piece of code if you study its output for a while), why did you post it without any comment of your own?


To be fair, most of the set of optimization parameters enabled for the various for -Ox, including x=fast, is usually documented.

At least in GCC though, there are a few optimizations included in the various -O flags that have no corresponding fine grained flag (usually because they affect optimization pass ordering or tuning parameters).


Yes they are documented, though the documentation is really something like "-fawesome-optimization, enabled by default on -O3" and the "-fawesome-optimization" has documentation like "enables awesome optimization" without explaining much more than that.

And even then pretty much every project out there uses "-Ofast" instead of whatever "-Ofast" enables without caring about what it does or how its behavior will change across compilers.


-Ofast enables fast-math optimizations and generally is not standard compliant. I hope projects do not deliberately enable it without thinking (as they say, it is hard to make stuff fool proof because fools are so resourceful).


My point was that options like -O<number> and -Ofast aren't the actual optimization switches, they turn on other switches and you do not know what you'll get - essentially wishing for fast code and hoping you'll get some (i mentioned -Ofast explicitly because of its name).

For example according to the documentation in GCC 7.4 -O3 turns on:

    -fgcse-after-reload
    -finline-functions
    -fipa-cp-clone
    -fpeel-loops
    -fpredictive-commoning
    -fsplit-paths
    -ftree-loop-distribute-patterns
    -ftree-loop-vectorize
    -ftree-partial-pre
    -ftree-slp-vectorize
    -funswitch-loops
    -fvect-cost-model
whereas in GCC 9.2 -O3 turns the above, plus:

    -floop-interchange 
    -floop-unroll-and-jam 
    -ftree-loop-distribution 
    -fversion-loops-for-strides
So unless you control the exact version of the compiler that will generate the binaries you will give out, you do not exactly know what specifying "-O3" will do.

Moreover even though you do know the switches, their documentation is basically nothing. For a random example what "-floop-unroll-and-jam" does? The GCC 9.2 documentation combines it with "-ftree-loop-linear", "-floop-interchange", "-floop-strip-mine" and "-floop-block" and all it says is:

> Perform loop nest optimizations. Same as -floop-nest-optimize. To use this code transformation, GCC has to be configured with --with-isl to enable the Graphite loop transformation infrastructure.

...what does that even mean? What sort of effect will those transformations have on the code? Why are they all jumbled in one explanation? Are they exactly the same? Why does it say that they are the same "-floop-nest-optimize"? Which option is the same? All of them? The -"floop-nest-optimize" documentation says:

> Enable the isl based loop nest optimizer. This is a generic loop nest optimizer based on the Pluto optimization algorithms. It calculates a loop structure optimized for data-locality and parallelism. This option is experimental.

Based on the Pluto optimization algorithms? Even assuming that this refers to "PLUTO - An automatic parallelizer and locality optimizer for affine loop nests" (this is a guess, no other references in the GCC documentation as far as i can tell), does it mean they are the same as the the code in pluto, that they based on the code and are modified or that they are based on the general idea/concepts/algorithms?

--

So it isn't really a surprise that most people simple throw out "-Ofast" (or -O3 or -O2 or whatever) and hope for the best. They do not know better and they cannot know better since their compiler doesn't provide them any further information. And this is where all the FUD and fear about C's undefined behavior comes - people not knowing what exactly happens because they are not even told.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: