Visual C++ has had things like profile-guided optimization for over a decade int...

gaius · on Aug 19, 2010

It is not unusual to get 2x performance with icc vs gcc. But of course, their priorities are different. Sun's SPARC compilers are (were?) also excellent. The benefits of targetting only one architecture and developing the compiler in close collaboration with the hardware designers.

gcc is "free" but is it free enough to double the bill for hardware? Benchmark and find out...

beza1e1 · on Aug 20, 2010

I have never seen such a speed difference. Can you reference anything?

gaius · on Aug 20, 2010

Benchmarks we did at my last company (trading app in C++). But like I say, try it for yourself. There's a reason people still pay $$$ for compilers when GCC is free!

beza1e1 · on Aug 20, 2010

Well, i am doing compiler research, which means testing is mostly SPEC CPU. Sometimes i wonder if those measurements show reality, because every serious compiler is heavily optimized for these special cases. There is no 2x difference there.

acqq · on Aug 21, 2010

You should definitely measure something from the real life.

I measure something as simple as a loop in which some short floating point calculations is made (actually only two additions per loop and one FP comparison) and my measurements on the latest Intel i5 give (in seconds):

0.31 VS2010 -O2 SSE2

0.35 VC 6 -O2

0.44 cygwin gcc-4.3.2 -O2

0.95 cygwin gcc-3.4.4 -O2

I haven't tested the later 4.x gcc but the results are clear -- even VC 6, 12 years old, is better than a quite recent gcc, at least for FPU calculations.

MatzeB · on Aug 23, 2010

gcc 3.X is also really old by now. Apart from that you should really use -O3 -march=native -msse2 -fomit-frame-pointer for best gcc performance. Anyway at some point you have to look at the generated assembly to really know what is going on.

astrange · on Aug 25, 2010

-march=native implies -msse2 (when possible). You might have meant -mfpmath=sse. -fomit-frame-pointer is probably not the default in VC6, so that might be unfair.

(It actually is the default for linux and darwin now, because DWARF unwind tables obsolete frame pointers for most uses.)

gaius · on Aug 20, 2010

Calculating yield curves is pretty "real world", as is transactions/sec. If the benchmark is representative, then it is a good benchmark, and if the compiler is optimized for it, then it is optimzed for real world use cases too.

sb · on Aug 19, 2010

AFAIR from a course at university, the status quo (about five years ago) was that gcc did not implement many of the advanced optimization techniques, such as IPS (integrated pre-pass scheduling). I have not checked myself, just promulgating--albeit competitive--hearsay...

astrange · on Aug 19, 2010

gcc and llvm don't have a technique with that name, but it sounds a little like an implementation detail to me. What is it prior to and/or integrated with?

gcc supports profile-guided optimization just fine, llvm has some code for it but I'm not sure if it's hooked up. Neither of them use iterative techniques for optimization - they're already too slow as it is for most people, anyway.

sb · on Aug 19, 2010

IPS means that instruction scheduling is performed before register allocation, but with register allocation kept in mind to reduce register pressure.

There was some other stuff, but I cannot remember (I am actually glad to have been able to come up with IPS.)

astrange · on Aug 19, 2010

See http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00003.html.

Instruction scheduling of any kind doesn't really help on x86 anyway, and register pressure is usually surprisingly good already (since temporary values are moved close to their uses when combining instructions). I think the most important thing missing thing is rematerialization - recalculating values instead of saving them on stack would save a lot of memory loads.

ced · on Aug 19, 2010

I wonder, which compilers does Google use?