-O2 to -O3 has in some benchmarks made things worse. In others it is a massive win, but in generally going above -O2 should not be done without bench marking code. There are some optimizations that can make things worse or better for reasons that compiler cannot know.
Over-optimizing your "cold" code can also make things worse for the "hot" code, eg by growing code size so much that briefly entering the cold space kicks everything out of caches.
I have often lamented not being able to hint to the JIT when I’ve transitioned from startup code to normal operation. I don’t need my Config file parsing optimized. But the code for interrogating the Config at runtime better be.
Everything before listen() is probably run once. Except not ever program calls listen().