Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, no kidding. The author did their homework and this is really good code to study. I'm with you, it's almost certainly not a coincidence.

It's aging a bit, but Ulrich Drepper's seminal paper on memory comes to mind: https://www.akkadia.org/drepper/cpumemory.pdf

Say what you want about Drepper's style and personality, but that paper told me (a) he's an incredibly knowledgable dude and (b) I'll always have more to learn, especially about cache.




Looks like he's not writing to "reserved" field. So CPU will definitely need to read the cache line at current log entry before write (RFO, read for ownership).

I'm not sure whether current CPUs are smart enough, but in theory writing 64 bytes at once to cache aligned address could avoid RFO. If it's L1/L2/L3 miss, that could be 100-200 cycles saving.

Might have also been better to do RDTSCP first. Otherwise it'll also avoid reordering instructions. And sabotage attempt to avoid RFO.

Anyways, not sure, didn't profile. (And by profiling I mean using those countless CPU performance counters to figure out what's going on in the mysterious black box.)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: