Hacker News new | past | comments | ask | show | jobs | submit login

Visual C++ has had things like profile-guided optimization for over a decade internally to MSFT and nearly that long shipping externally (http://blogs.msdn.com/b/vcblog/archive/2008/11/12/pogo.aspx).

In general, the Microsoft and Intel C++ compilers have pretty incredible performance on the x86 platform. The last time I talked to the Intel folks, they didn't really consider GCC to be competitive. It's possible things have changed in the last five years on that front; apologies if I'm misrepresenting the state of GCC optimization quality.

The PGI Fortran folks are pretty incredible in the parallel space.

On embedded platforms, most serious companies seem to buy the Green Hills C++ compiler.

One of the program-management types from the Visual C++ team could probably do this much better justice than my quickly-fading memories. But, I think the HN crowd would be quite shocked by the market share that commercial compilers have, even on *NIX platforms.




It is not unusual to get 2x performance with icc vs gcc. But of course, their priorities are different. Sun's SPARC compilers are (were?) also excellent. The benefits of targetting only one architecture and developing the compiler in close collaboration with the hardware designers.

gcc is "free" but is it free enough to double the bill for hardware? Benchmark and find out...


I have never seen such a speed difference. Can you reference anything?


Benchmarks we did at my last company (trading app in C++). But like I say, try it for yourself. There's a reason people still pay $$$ for compilers when GCC is free!


Well, i am doing compiler research, which means testing is mostly SPEC CPU. Sometimes i wonder if those measurements show reality, because every serious compiler is heavily optimized for these special cases. There is no 2x difference there.


You should definitely measure something from the real life.

I measure something as simple as a loop in which some short floating point calculations is made (actually only two additions per loop and one FP comparison) and my measurements on the latest Intel i5 give (in seconds):

0.31 VS2010 -O2 SSE2

0.35 VC 6 -O2

0.44 cygwin gcc-4.3.2 -O2

0.95 cygwin gcc-3.4.4 -O2

I haven't tested the later 4.x gcc but the results are clear -- even VC 6, 12 years old, is better than a quite recent gcc, at least for FPU calculations.


gcc 3.X is also really old by now. Apart from that you should really use -O3 -march=native -msse2 -fomit-frame-pointer for best gcc performance. Anyway at some point you have to look at the generated assembly to really know what is going on.


-march=native implies -msse2 (when possible). You might have meant -mfpmath=sse. -fomit-frame-pointer is probably not the default in VC6, so that might be unfair.

(It actually is the default for linux and darwin now, because DWARF unwind tables obsolete frame pointers for most uses.)


Calculating yield curves is pretty "real world", as is transactions/sec. If the benchmark is representative, then it is a good benchmark, and if the compiler is optimized for it, then it is optimzed for real world use cases too.


AFAIR from a course at university, the status quo (about five years ago) was that gcc did not implement many of the advanced optimization techniques, such as IPS (integrated pre-pass scheduling). I have not checked myself, just promulgating--albeit competitive--hearsay...


gcc and llvm don't have a technique with that name, but it sounds a little like an implementation detail to me. What is it prior to and/or integrated with?

gcc supports profile-guided optimization just fine, llvm has some code for it but I'm not sure if it's hooked up. Neither of them use iterative techniques for optimization - they're already too slow as it is for most people, anyway.


IPS means that instruction scheduling is performed before register allocation, but with register allocation kept in mind to reduce register pressure.

There was some other stuff, but I cannot remember (I am actually glad to have been able to come up with IPS.)


See http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00003.html.

Instruction scheduling of any kind doesn't really help on x86 anyway, and register pressure is usually surprisingly good already (since temporary values are moved close to their uses when combining instructions). I think the most important thing missing thing is rematerialization - recalculating values instead of saving them on stack would save a lot of memory loads.


I wonder, which compilers does Google use?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: