1024cores

chrisaycock · on Dec 31, 2010

The "many core" problem is this decade's C10K. I look forward to more expert discussion on scaling across massively multi-core architectures.

zerothehero · on Dec 31, 2010

Is it really? Scaling today means running on more than one machine (google, facebook, twitter, etc.)

That means no shared memory. He helpfully makes this distinction on his front page ("I'm mostly interested in shared-memory system, so if you are looking for information about clusters, web-farms, distributed databases and the like, it's the wrong place")

According to Google's Jeff Dean, "to Google, multi-core computers look like separate servers with really fast interconnections" (i.e. memory).

So if you are running your applications on many machines anyway, you might as well drastically simplify your code by writing it "single-threaded" and running #cores copies on each machine.

dvyukov · on Jan 1, 2011

> Scaling today means running on more than one machine

Of course it's not. Hundreds of millions of people use just a single computer for a lot of tasks.

>you might as well drastically simplify your code by writing it "single-threaded" and running #cores copies on each machine.

It's not the worst approach. However there may be significant penalties in terms of performance and latency in some contexts. You are definitely don't want to use your approach for games and browsers. As for server software it depends on performance/latency requirements. For example, if you will use it in High-Frequency Trading marker, count you loose all your money.

JabavuAdams · on Dec 31, 2010

Games, movies, multimedia, etc.

sgt · on Dec 31, 2010

Interesting, but seriously, take those ads off. Are you expecting to make money off of this? Even if you did, it would be pocket change at most.

DaniFong · on Dec 31, 2010

I would add -- if you're expecting to make money off this, you're probably better off advertising your own consulting services -- at suitably high rates. You're the expert now... :-)

hmottestad · on Dec 31, 2010

I have ads on my own site, never made much off of them, not even enough to partially cover my hosting costs.

However I leave them there just as a reminder that someone is footing the bill.

If someone doesn't like ads, then they can use an ad blocker :)

dvyukov · on Jan 1, 2011

I am very close to covering a first year of hosting :) As for consulting and, perhaps, paid libraries, well, first I need a good site, right?

codex · on Dec 31, 2010

So far, nobody has mentioned the author's C++ based race detection tool. Conceptually it is similar to Corensic's commercial product, Jinx (http://www.corensic.com) but practically Jinx supports more languages, doesn't require recompilation, and is most likely much faster.

dvyukov · on Jan 1, 2011

It's "similar" to a lot of software out there: Intel Thread Checker, Chord, Zing, Spin, RacerX, CheckFence, Sober, Coverity Thread Analyzer, CHESS, KISS, PreFast, Prefix, FxCop. However, what you are missing is that most of these tools (and Jinx as far as I see, can't find clear description on the site, mostly vague marketing stuff) are of help to you if you are an application developer who writes in term of mutexes. While RRD is of help to you if you are implementing mutexes itself. Can you verify involved mutex algorithm down to possible memory access reorderings? I doubt.

> and is most likely much faster. Or an order of magnitude slower.

Kaya · on Jan 1, 2011

Jinx can help verify mutex implementations themselves, although the example code that ships with the product is a little more advanced (lock-free stack). Some of the underlying technology is described here: http://s3.amazonaws.com/corensic/whitepapers/DeterministicSh... and here: http://www.corensic.com/WhyYouNeedJinx/CorensicHasaUniqueTec.... Because it's a hypervisor, it can aid in verifying synchronization primitives that are a mix of userspace and kernel code.

dvyukov · on Jan 1, 2011

I do not see anything about memory fences. If Jinx does not support them, then it's pretty much useless for verification of synchronization algorithms. I've implemented dozens of advanced synchronization algorithms, and I may say that it's crucial. Also, if it works on binary level (does not require re-compilation), then it also renders it useless, because on that level you lose information about order of memory accesses, memory fences, atomicity. For example, if you see plain x86 MOV instruction, what is it? non-atomic store? atomic relaxed store? atomic release store?

viraptor · on Dec 31, 2010

> ..., atomic-free synchronization algorithms

Actually I'm not sure if it was supposed to be funny or serious. I see the funny "everything-free" list, as well as can imagine that there is some action you can do not atomically (relative to other actions) that gives you synchronisation.

Anyone?

dkersten · on Dec 31, 2010

I'm not quite sure what you mean, but synchronization without atomic operations is possible.

An example of mutual exclusion, without any atomic operations, taken from the book "The art of multiprocessor programming"[1] is (paraphrased) as follows:

Two threads, A and B, want to access some memory. Each thread has a flag.

When thread A wants to access the shared memory:

    Set flag A
    Wait for flag B to become unset
    Access memory
    Unset flag A

When thread B wants to access the shared memory:

    Set flag B
    While flag A is set {
        Unset flag B
        Wait for flag A to become unset
        Set flag B
    }
    Access memory
    Unset flag B

Obviously this isn't a general purpose solution, but rather an easy to understand example demonstrating that atomic operations are not required.

[1] http://www.amazon.com/Art-Multiprocessor-Programming-Maurice...

tedunangst · on Dec 31, 2010

That only works with coherent in order memory operations. Once you add the appropriate memory barriers, it looks a lot more "atomic".

dkersten · on Dec 31, 2010

I chose that example because its easy to understand, obviously in modern processors with out of order execution and whatnot, you would need something a lot more elaborate.

Once you add the appropriate memory barriers, it looks a lot more "atomic"

Well, they force in order memory access. That doesn't look terribly "atomic" to me, but I understand your point.

dvyukov · on Jan 1, 2011

The intention was actually something like "atomic-RMW-free", that is "costly-operation-free". It should also include expensive memory fences, and excessive sharing. I have an explanation of "atomic-free" in the Russian article, but I did not transfer it to the English page on xxx-free. Sorry for any confusion. I think I better just remove it from the main page for now.

_kp6z · on Dec 31, 2010

The speaker admits his English isn't perfect; probably an error as much of the site seems geared toward lock-free _atomic_ operations.