You should definitely measure something from the real life.
I measure something as simple as a loop in which some short floating point calculations is made (actually only two additions per loop and one FP comparison) and my measurements on the latest Intel i5 give (in seconds):
0.31 VS2010 -O2 SSE2
0.35 VC 6 -O2
0.44 cygwin gcc-4.3.2 -O2
0.95 cygwin gcc-3.4.4 -O2
I haven't tested the later 4.x gcc but the results are clear -- even VC 6, 12 years old, is better than a quite recent gcc, at least for FPU calculations.
gcc 3.X is also really old by now. Apart from that you should really use -O3 -march=native -msse2 -fomit-frame-pointer for best gcc performance. Anyway at some point you have to look at the generated assembly to really know what is going on.
-march=native implies -msse2 (when possible). You might have meant -mfpmath=sse. -fomit-frame-pointer is probably not the default in VC6, so that might be unfair.
(It actually is the default for linux and darwin now, because DWARF unwind tables obsolete frame pointers for most uses.)
I measure something as simple as a loop in which some short floating point calculations is made (actually only two additions per loop and one FP comparison) and my measurements on the latest Intel i5 give (in seconds):
0.31 VS2010 -O2 SSE2
0.35 VC 6 -O2
0.44 cygwin gcc-4.3.2 -O2
0.95 cygwin gcc-3.4.4 -O2
I haven't tested the later 4.x gcc but the results are clear -- even VC 6, 12 years old, is better than a quite recent gcc, at least for FPU calculations.