I too find with Java (most of my career I worked with it) if you change hardware just slightly you can also get wildly different results making other users benchmark conclusions on GC and JIT not very helpful. Like your armpits the only benchmarks that don't stink are your own :)
Exactly my experience of the Java JIT and GC where I worked on uh "optimizations" of related metrics in two or three projects.