Hacker News new | past | comments | ask | show | jobs | submit login

The L3 issue isn't quite that simple. Sure if your dataset fits in Intel's L3 that's great. Problem is that a single shared L3 (for the same amount of effort/transistors) has much lower bandwidth than smaller separate L3s.

So a dual socket AMD has 8 zeppelin chips and 16 8MB L3 caches. I'd be quite surprised if intel could match the bandwidth of those 16 L3 caches. Additionally if there is enough cache misses AMD has a 33% advantage in both outstanding memory references (16 at a time in a dual socket) and bandwidth.

Basically both architectures are HUGELY complicated. Even minor things like which compiler/which compiler flags can make a big difference. Now more than ever it's important to benchmark your workload, any simple rule of thumb is likely to be useless.




Intel's new chips allow you to logically dedicate segments of l3 to different programs/VMs.

https://software.intel.com/en-us/articles/introduction-to-ca...


Skylake has up to 28 L3 caches that should provide significant bandwidth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: