How do you write fast multithreaded C-code? The article mentions that the C code is too disconnected from the real hardware which, in this case, has multiple cores.
Do you need to call (non portable?) code setting mutexes manually? (not that it would be a problem)
How do you use the CPU's underlying CAS operation? (by inline assembly?)
As an example: the guys who wrote the very fast LMAX disruptor pattern in Java relied on the fact that Java does provide methods inside the AtomicXXX classes calling CAS operations under the hood. But sadly they couldn't "pick" the one operation they'd like, which would have been faster than the one Java decided to use (it's a RFE if I recall correctly: they'd like Oracle to modify Java so that it uses the faster version when it makes sense).
I take it that in C you can inline assembly and do as you want!?
How do you write fast multithreaded C-code? The article mentions that the C code is too disconnected from the real hardware which, in this case, has multiple cores.
Do you need to call (non portable?) code setting mutexes manually? (not that it would be a problem)
How do you use the CPU's underlying CAS operation? (by inline assembly?)
As an example: the guys who wrote the very fast LMAX disruptor pattern in Java relied on the fact that Java does provide methods inside the AtomicXXX classes calling CAS operations under the hood. But sadly they couldn't "pick" the one operation they'd like, which would have been faster than the one Java decided to use (it's a RFE if I recall correctly: they'd like Oracle to modify Java so that it uses the faster version when it makes sense).
I take it that in C you can inline assembly and do as you want!?