Hacker News new | past | comments | ask | show | jobs | submit login

I'd not put any trust in geekbench.

If you read the whitepaper[0], you'd notice quite some weird scenarios. Take the compression (it uses zstd, which a major prop), however it compresses some 10k small files of a total 75MB. Compressing small files individually is just pointless, ztsd is not an archiver (I suppose the compressor/decompressors are re-created for each file, too). The total size is just tiny as well. Then it uses a virtual file system with AES in memory, no idea how the latter is implemented but I suppose it lays entirely in the userland.

Due to small sizes, the workload would be very L2/L3 sensitive - however it doesn't represent any common way that compression is used in practice, esp. zstd.

[0]: https://www.geekbench.com/doc/geekbench6-benchmark-internals...




In general, I'd say disregard benchmarks that aren't open source and not disclosing their full compilation stack.

Don't want a repeat of Cinebench compiled with ICC or https://hothardware.com/news/antutu-mobile-benchmark-cited-b... and https://www.anandtech.com/show/7384/state-of-cheating-in-and...


I don't see why a CPU test should be only about number crunching, and not about RAM and cache access efficiency, provided that the test is not incurring gratuitous cache trashing.

Also, working with a ton of small files is a typical real-world task.


>Also, working with a ton of small files is a typical real-world task.

Yes, but not compressing them individually, and while staying in the userland (and having SHA1 on each, just for kicks). The real world scenarios tend to be make a big archive (e.g. tar) and compress that thing.

>and not about RAM and cache access efficiency

It's about the size of the cache, workloads that fit L2 vs such that don't, exhibit an amazing performance boost. Pretty much, the performance drops off a cliff when it doesn't fit the L2.

Overall microbenchmarks are extremely/notoriously difficult to get right, and more often than not, they are gamed. However the compression/decompression of geekbench is just bad.


I agree with all your points and also agree that honestly, these scenarios aren't far off from real world tasks.

I get the main issue, which is you could adjust the workload by 10% and achieve a 50% performance loss when you do this at the point where we cross the cache threshold and whatnot.

However I see CPUs unique in that I rank them _for_ these scenarios. A particular might be ranked unfairly, but as long as the test is equal, the better one is infact better, just not by the 50% the test might show but it's still going to be 5% better. I expect my GPU to be idle when it isn't training AI or rendering frames, but for the CPU, it's general purpose in real life, and anything goes.


>just not by the 50% the test might show but it's still going to be 5% better.

Sort of, indeed. Yet, when you see any promotional/marketing material - you see all those phallic bar graphs, and how much bigger it is. Other than that - heavy cache utilization hides inferior memory subsystem (latency/throughput), the latter tends to be quite important in the real world. Overall benchmarks/tests that feature handful of MB as datasets, and run in 100s of ms - should not be used as representative... for most use cases.

That was my initial point - 'don't trust'.


geekbench 5 and 6 scores generally correspond well to spec2017 which is pretty much the industry standard.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: