You can run the test with and without C2 (eg with -Xint) to see the effects on the inlining/escape analysis, and (on Linux) with -prof asm to see where the code is taking time.
The header is only 12 bytes with compressedOops (which most people use), but the long is aligned on an 8 byte boundary so the advantage is wasted here.
I always love to see the under hood concepts. I open articles like these and give up reading half way through. As the text goes I will be lost in air. Is there any book which explains JVM underhood process? If so it will be great!
You can run the test with and without C2 (eg with -Xint) to see the effects on the inlining/escape analysis, and (on Linux) with -prof asm to see where the code is taking time.