Hacker News new | past | comments | ask | show | jobs | submit login

-Ofast enables fast-math optimizations and generally is not standard compliant. I hope projects do not deliberately enable it without thinking (as they say, it is hard to make stuff fool proof because fools are so resourceful).



My point was that options like -O<number> and -Ofast aren't the actual optimization switches, they turn on other switches and you do not know what you'll get - essentially wishing for fast code and hoping you'll get some (i mentioned -Ofast explicitly because of its name).

For example according to the documentation in GCC 7.4 -O3 turns on:

    -fgcse-after-reload
    -finline-functions
    -fipa-cp-clone
    -fpeel-loops
    -fpredictive-commoning
    -fsplit-paths
    -ftree-loop-distribute-patterns
    -ftree-loop-vectorize
    -ftree-partial-pre
    -ftree-slp-vectorize
    -funswitch-loops
    -fvect-cost-model
whereas in GCC 9.2 -O3 turns the above, plus:

    -floop-interchange 
    -floop-unroll-and-jam 
    -ftree-loop-distribution 
    -fversion-loops-for-strides
So unless you control the exact version of the compiler that will generate the binaries you will give out, you do not exactly know what specifying "-O3" will do.

Moreover even though you do know the switches, their documentation is basically nothing. For a random example what "-floop-unroll-and-jam" does? The GCC 9.2 documentation combines it with "-ftree-loop-linear", "-floop-interchange", "-floop-strip-mine" and "-floop-block" and all it says is:

> Perform loop nest optimizations. Same as -floop-nest-optimize. To use this code transformation, GCC has to be configured with --with-isl to enable the Graphite loop transformation infrastructure.

...what does that even mean? What sort of effect will those transformations have on the code? Why are they all jumbled in one explanation? Are they exactly the same? Why does it say that they are the same "-floop-nest-optimize"? Which option is the same? All of them? The -"floop-nest-optimize" documentation says:

> Enable the isl based loop nest optimizer. This is a generic loop nest optimizer based on the Pluto optimization algorithms. It calculates a loop structure optimized for data-locality and parallelism. This option is experimental.

Based on the Pluto optimization algorithms? Even assuming that this refers to "PLUTO - An automatic parallelizer and locality optimizer for affine loop nests" (this is a guess, no other references in the GCC documentation as far as i can tell), does it mean they are the same as the the code in pluto, that they based on the code and are modified or that they are based on the general idea/concepts/algorithms?

--

So it isn't really a surprise that most people simple throw out "-Ofast" (or -O3 or -O2 or whatever) and hope for the best. They do not know better and they cannot know better since their compiler doesn't provide them any further information. And this is where all the FUD and fear about C's undefined behavior comes - people not knowing what exactly happens because they are not even told.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: