Amdahl's Law

yongjik · on March 26, 2021

I remember coming across Amdahl's law in the grad school and thinking, "Why is this even a law? Isn't it something completely obvious?"

...later I became a developer. Turns out it's not obvious.

Veserv · on March 26, 2021

I mean, the law part is still completely obvious. If you have a part that scales proportionally with workers (p) and a part that does not scale with workers (1 - p), then only the part that scales proportionally with workers scales with the number of workers (p / s), it is pretty much a tautology.

The only reason why it is non-obvious how to take advantage of it in practice is that problems in the real world very rarely break down so cleanly into a completely serial piece and a trivially, arbitrarily parallelizable piece and even if you are lucky enough to find such a problem at a certain scale, the very act of breaking down the problem almost always eventually results in other previously ignorable constant or scaling factors becoming important either as you chop the problem into smaller pieces or as you scale to bigger problems. The difficulty lies entirely in posing the problem to be "solved", not the application of the law itself.

dwohnitmok · on March 26, 2021

There's a lot of things that are obvious when stated, but for various reasons people don't really grasp their ramifications. Put another way, it seems often difficult to grasp just how significant an already superficially internalized concept really is.

E.g. pigeonhole principle -> no perfect compression (among other similar proofs)

Another a favorite one I have is understanding just how "weird" it is that pi, a constant, exists. Take a rope and tie it around the Earth so that it fits snugly on the Earth's surface. How much more length of rope does it take so that the entire rope circling the whole of the Earth can be lifted 10 feet off its surface? ~31 feet (this was at least surprising to me the first time I was asked this question, which was long after I first learned about pi).

ithinkso · on March 26, 2021

It doesn't have anything to do with a pi though. If you had a cube-shaped earth and the same rope, you would need 80ft of additional rope (instead of 62ft for a spherical planet).

rileymat2 · on March 26, 2021

Isn’t it double that? 2piR

thyrsus · on March 26, 2021

We're talking about a rope around the equator, right? Circumference = 2piradius is the same as circumference/2 = pi*radius so three more feet of circumference is 1.5 more feet of radius.

dwohnitmok · on March 26, 2021

Ha of course. Looks like I still haven't internalized it.

fsckboy · on March 26, 2021

I think the better way to pose that brain teaser is to say "you make the rope three feet longer, now how high off the ground will the rope be"

Avshalom · on March 26, 2021

"law" is the (modernish) scientific term for a model. amdahl's law is a model of how parallelization can('t) speed things up.

kergonath · on March 26, 2021

Yes. This is a constant issue when non-scientists discuss scientific theories. A law is just a convenient equation that works sometimes, in certain conditions.

wolfram74 · on March 26, 2021

Except this is closer to a mathematical theorem than a law like say, boyle's law. You might be able to re-write an algorithm so it has a larger portion that's parallelizable, but that's the only way to get around it. You can't just keep adding parallel computers and enter a qualitatively different regime.

kergonath · on March 26, 2021

It's a model that tells how something will behave under certain conditions. It works if the program being run can be neatly separated into purely serial and purely parallel (infinitely scalable) parts. In reality it breaks down at some point because nothing scales forever and there is some communication overhead.

It is similar to the ideal gas law, for example, which can be proved from the axioms of classical thermodynamics and yet fails when the molecules start interacting.

Laws span the whole spectrum from purely empirical to provable within some theories' frameworks.

beaconstudios · on March 26, 2021

it's a heuristic, or rule of thumb; which is to say, a very loose model useful for estimating but not calculating with precision.

dnautics · on March 26, 2021

The counterpoint is gustafson's law:

https://en.m.wikipedia.org/wiki/Gustafson%27s_law

hyperpallium2 · on March 26, 2021

I predicted that because some computations are hard or impossible to parallelize, attention would shift to solving real-world problems with computations that can be parallelized.

Maybe this is one factor in ML, deep learning neural nets vs. symbolic?

However, engineers tend not to develop techniques without actual hardware [1]. And CPU core-count has remained much lower than I expected [2]. Therefore, my prediction has not come to pass, yet. But it totally will!

1. In myself, I can imagine a maybe-possible situation one or two steps away from present reality, but not see the consequences... but the instant I learn it is definitely possible, I can see consequences.

2. Yes, GPU core-count is sky-high, but is limited by memory access.

verdverm · on March 26, 2021

"Gordon Freeman" (as he is affectionately referred to) would argue otherwise, fervently!

- https://www.cs.binghamton.edu/~pmadden/pubs/dispelling-ieeed...

- https://dl.acm.org/doi/abs/10.1145/1324499.1324502

- https://community.cadence.com/cadence_blogs_8/b/ii/posts/eda...

- https://www.cs.binghamton.edu/~pmadden/

hyperpallium2 · on March 26, 2021

I'm saying that techniques that are easy to parallelize will become more popular.

Of course he's right, that if you can make a serial technique faster, you should do that.

I see a way to support his position: in past decades we got free speedups from silicon - then we hit a kind of "Peak Silicon". Just as for Peak Oil, it makes sense to try techniques that weren't worth it before.

Personally, I think the first step should be to give up all the horrific layers of bloat-on-bloat; but instead we got (e.g.) faster JVMs and JIT JS compilation.

But what about the step after that? He thinks we should mine the tailings (and he's surely right there's a lot to be found there).

I think multi-core seems the only extendable solution (til I guess computers are the size of houses again). But because it's so difficult to parallelize code, we'll instead change our approach, and favour techniques that are easy to parallelize. Arguably, this is currently happening with DL.

I've only skimmed the links you gave, so I don't have a deep appreciation of his position.

hyperpallium2 · on March 26, 2021

Formal problems often have theoretical limits, and solutions approaching them. But for "real-world problems", you can often change the problem, and even change the context of where that problem originates - perhaps eliminating that specific problem entirely [1].

So we'll change problems/contexts to ones that are easily parallelizable.

In that dispelling-myths paper, Patrick Madden notes that a less efficient but faster-because-parallel method still uses more energy. A good point, but I'll note that because energy consumption is superlinear with frequency (e.g. a 2 core method at half the clock speed of a 1 core method will use less energy, assuming perfect parallelization), a parallel solution can be both faster and more energy efficient. GPU compute shows this e.g. bitcoin mining.

His last paragraph begins:

  Despite these challenges, there is no choice but to forge ahead with parallelism

...which doesn't sound like disagreement. He's mainly criticising breathy papers.

Finally... why is he known by the half-life protagonist? Does he look like him, or is there some thematic connection?

1. Maybe the entire context can be formalized and solved, so you can't "change the problem", but even then, real-world problems tend to keep changing, because of changing customer demands, competition, technology, legislation etc.

carlmr · on March 26, 2021

>Personally, I think the first step should be to give up all the horrific layers of bloat-on-bloat; but instead we got (e.g.) faster JVMs and JIT JS compilation.

I think this is already happening. The trending languages I see in HN are Rust, nim, zig right now. We're going back to native, and in Rust's case also with fearless concurrency as a paradigm.

yellowapple · on March 26, 2021

Right, but how does that compare to the incumbents? The JVM and .NET CLR don't seem to be going anywhere.

In this day and age I don't think the abstraction layer is the issue. Modern bytecode VMs are already able to be within an order of magnitude of native code, and I'd imagine there's still plenty of room there for improvement (WASM comes to mind). Unless you're on constrained hardware, or developing directly against hardware (rather than using an operating system, which is itself a horrific layer of bloat-on-bloat), a bytecode VM is still perfectly reasonable even for soft-real-time applications (let alone for applications that are throughput-sensitive rather than latency-sensitive).

carlmr · on March 26, 2021

>Modern bytecode VMs are already able to be within an order of magnitude of native code

Looking at the slowing of hardware advances this could be a 10 year lag in performance. I think that's quite significant.

I think byte code VMs have their place, and for many applications that may be enough, but if you have high performance needs it may be worth checking if the price is one you're willing to pay.

yellowapple · on March 26, 2021

The significance of a "10 year lag in performance" has wildly different meanings between the present day and, say, 10 years ago. 2010 (serial) performance on 2020 hardware would be hardly perceptible. 2000 performance on 2010 would be much more noticeable. 1990 performance on 2000 hardware would be catastrophic.

And yet, Java, C#, Erlang, Perl, Python, Ruby, Tcl... all of these languages and their VMs are decades old now. And yeah, the languages toward the end of that list ain't exactly speed demons, but the ones toward the front are already commonly used for high-performance applications with pretty dang good success, and even the ones toward the back of the pack are there for other reasons beyond just the VM (global interpreter locks, parsing overhead, things like that).

The VM, that is to say, ain't the issue. Hell, things like SPIR-V demonstrate that VMs are perfectly fine even for GPU computation. There's a price, sure, but eliminating that price is about as premature of an optimization at it gets for all but the most constrained of environments. There are almost certainly worse bottlenecks - and those bottlenecks are probably around I/O and memory, knowing most applications (and a bytecode VM often helps here, since the opcodes can be tuned for minimum size, often giving even RISC CPU opcodes a run for their money).

carlmr · on March 26, 2021

If it makes your life harder, yes, it's premature optimization. But Rust, Nim and Zig are quite high level languages, they don't drag me down in terms of developer performance.

With C/C++ I agree, you'll spend weeks and years debugging depending on how long lived your application is, that you wouldn't have in Java or C#.

littlecranky67 · on March 26, 2021

Always a good example to point out the techempowerent benchmarks: https://www.techempower.com/benchmarks/#section=data-r20&hw=...

In the top 10 you will always find bytecode/vm based solutions not far from C/Rust implementations. Of course, these are highly optimized, but the assumption that native is always better is not always true.

carlmr · on March 26, 2021

I've seen a few of them, the highly optimized ones usually fall back to some form of manual memory management, which then becomes way more complicated than in a language like Rust which has all this by default.

What I find so beautiful about the native/close to the metal languages is that you can have clean code that's also near optimal, without optimizing a lot. Compared to the lengths people go to get some Python code to run fast it's the easier way.

yellowapple · on March 26, 2021

> which then becomes way more complicated than in a language like Rust which has all this by default.

Um, no:

1. Manual memory management doesn't have to be complex at all - and is arguably much easier to reason about than garbage collection, even if it's more error prone / less guaranteed-to-be-safe

2. Rust's manual memory management (i.e. the default kind of memory management in Rust) is of at least the same complexity as any other sort of manual memory management - if not more due to the need to be explicit around lifetimes and borrowing.

The memory management strategy is regardless orthogonal to whether there's a VM involved; there are garbage-collected native languages (Lisp, D) and not-garbage-collected VMs (WASM, or languages like Perl and Tcl if you count reference-counting as "not-garbage-collected") galore.

thyrsus · on March 26, 2021

Many data centers are larger than houses. I expect them to get larger as cores get cheaper, and code gets smarter about NUMA.

jsolson · on March 26, 2021

... how high did you expect CPU core count to be at this point with close to uniform memory access costs, and what changes did you expect to enable that degree of concurrent bus access without catastrophic stalling due to contention?

hyperpallium2 · on March 26, 2021

I was thinking of GPU cores having random write access, but yes that has concurrent access problems.

BTW how do GPUs solve the concurrent bus access you mention, with many cores are writing pixels?

Also, is this GPU-style processing, where each core writes a separate result (a texture/display pixel or transformed vector) but can read randomly (from textures), the only approach to parallelization that works? It's a form of scatter-gather.

monocasa · on March 26, 2021

> BTW how do GPUs solve the concurrent bus access you mention, with many cores are writing pixels?

Batching and buffering with spatial locality so that you can stream out relatively large bursts of pixels.

hugs · on March 26, 2021

I thought about Amdahl's Law a lot when I started Sauce Labs. To make browser test automation faster, the initial plan was to throw more hardware at the problem by getting more servers and running tests in parallel. But we were always looking for other things to optimize because we knew "throw more hardware at the problem" was going to stop working at some point. One example was virtual machine startup speed. In early days on EC2, Windows VMs took 10-ish minutes to boot up, I think part of the reason was because they had to do a disk image transfer first. But if you can keep lots of redundant copies of disk images in lots of places, you can reduce that waste of time on pre-boot network data transfer and ride the Amdahl's Law curve a little longer. Another trick is having the VM "pre-booted" by having a VM snapshot to jump to avoid the entire boot process. Of course, getting this much low-level control over the boot process to optimize for parallelability means building out your own cloud, which is another story for another day...

vasco · on March 26, 2021

One thing that was taught to me in university related to this, is that in optimisation, "every little bit" does NOT help. This insight applies to everything, and is the cause of great frustration. The world should know about this law and that optimising the 3% case is useless in any field, if there's an 80% case staring you in the face.

heinrichhartman · on March 26, 2021

Ahmdal's law describes the scaling behavior of systems with contention ("serialized workload portion"). This was generalized to the Universal Scalability Law by Neil Gunther which additionally takes into account "cross talk" for coordination between nodes. Systems with cross talk always run into configurations where adding more nodes will decrease the throughput of a system.

This is best explained in Baron Schwarz's book: https://raw.githubusercontent.com/VividCortex/ebooks/master/...

My notes on this topic are here: https://nbviewer.jupyter.org/github/HeinrichHartmann/Statist...

dang · on March 26, 2021

Surprisingly few relevant past threads (that were any good), but I found these:

Gene Amdahl has died - https://news.ycombinator.com/item?id=10557793 - Nov 2015 (111 comments)

Compilers and More: Is Amdahl's Law Still Relevant? - https://news.ycombinator.com/item?id=8930987 - Jan 2015 (17 comments)

Amdahl's law in reverse: the wimpy core advantage - https://news.ycombinator.com/item?id=5265242 - Feb 2013 (16 comments)

Amdahl’s law - Amdahl's original paper - https://news.ycombinator.com/item?id=3490492 - Jan 2012 (1 comment)

Bluestein · on March 26, 2021

  (We need a new "rule", rule ... 42?
   "If it exists, there is a HN thread on it. No exceptions." :)

yellowapple · on March 26, 2021

Rule 42a: if a HN thread does not yet exist for it, then it is your civic duty to create one.

Bluestein · on March 26, 2021

I like this one better than 42, with the appeal to civic duty and all :)

tomrod · on March 26, 2021

Rule 42a.1: if someone has created a novel thread, the community should support with up votes even if they disagree with the conclusions or assertions.

Bluestein · on March 26, 2021

This. So much this ...

Tip of the hat to you sir.-

This is so much in the spirit of the site, methinks ...

... support the thread, because it -fosters discussion-, and have the wherewithal to have your point of view challenged.-

Expose yourself to the opposing viewpoint(s).-

jeswin · on March 26, 2021

Is there are separate rule (or phrase/idiom) one can quote for the "Serial Programs" part of Amdahl's Law?

This is one area where teams waste a lot of time; spending too much effort improving an area which doesn't matter so much. Amdahl's law could be quoted, but the articles generally skew heavily towards parallelism.

moonchild · on March 26, 2021

See also: the universal scalability law, a refinement thereupon - https://wso2.com/blog/research/scalability-modeling-using-un...

kache_ · on March 26, 2021

This concept generalizes really well to many things outside of computing. For example, this is why adding more engineers to a team can slow things down.

hntrader · on March 26, 2021

Amdahl's Law says that the marginal benefit of additional cores is always positive (or zero).

What you're describing is negative marginal benefit.

hinkley · on March 26, 2021

Communication overhead.

hntrader · on March 26, 2021

Mostly that, but I would add cultural and competence issues as well - more internal politics, and C-teamers sneaking in that drag the team down.

It's easy to make sure everyone in a small team is elite and has value alignment, and it's hard to maintain that as it gets bigger.

loloquwowndueo · on March 26, 2021

That’s more of a dispelled myth and not Amdahl's law.

https://en.m.wikipedia.org/wiki/The_Mythical_Man-Month Specifically :)

majewsky · on March 26, 2021

A surprising number of managers haven't got the memo about the "dispelled myth" part.

anonytrary · on March 26, 2021

Amdahl's law is more simplistic than that. You're suggesting that the benefit of adding an additional worker is negative. Amdahl's law is not representative of most systems; your generalization seems better. The marginal benefit as a function of resources depends on the system in question. It isn't just a flat positive percentage.

vegetablepotpie · on March 26, 2021

I’m not so sure of that. Amdahls law tells you the speed up to expect when adding more cores to the parallelized portion of an algorithm. Whereas adding a new member to a team slows down one or more members for a matter of weeks/months because the members are spending time they would otherwise spend producing teaching the new member.

onikolas7 · on March 26, 2021

The insight from Amdahl's law is what led to the creation of hybrid processors like the Cell (used in the PlayStation 3 console). They quickly became irrelevant as GPGPU became more mainstream.

anonytrary · on March 26, 2021

> ... if a program needs 20 hours to complete using a single thread, but a one-hour portion of the program cannot be parallelized ... then regardless of how many threads are devoted to a parallelized execution of this program, the minimum execution time cannot be less than one hour.

It really does seem obvious. You can either do the sync part before or after the parallelized part. Since you cannot have infinite horizontal compute, the parallelized part will take some ε > 0 of time to finish, and the total execution time will be 1 + ε.

mettamage · on March 26, 2021

Amdahl’s law reminds me to tell people why computer science can tell us why we can’t have more than 3 to 4 people washing dishes in a small kitchen.

1 washes dishes

2 to 3 are drying

Everyone else would simply congest the process

avh02 · on March 26, 2021

I had a special topics course in parallel computing in university - Amdahl's law is probably the single biggest takeaway from that course.

It's amazing how applicable it is to more general things in life as well - can't remember examples right now but I've reasoned about many day-to-day things and realised Amdahl's law applies

Bluestein · on March 26, 2021

See the "many people dishwashing in a small kitchen" comment upthread :)

hyperpallium2 · on March 26, 2021

The general formulation is interesting, that some parts are sped up, and some aren't. Bottlenecks.

Mandatum · on March 26, 2021

This ties into the Theory of Constraints, right? Only as fast as the slowest part.

anonytrary · on March 26, 2021

It also applies to chemistry, physics, and everything else. If you have a reaction that depends on step A and B, and A must happen before B, but B can be sped up arbitrarily with diminishing returns by introducing an enzyme, then the whole reaction will still take slightly longer than the time it takes to complete A.

maweki · on March 26, 2021

In my research group we're sometimes benchmarking parallel systems (databases with parallelizable queries). Whenever there's a student thesis, I force them to explain Amdahl's Law in their thesis, and contextualize their work regarding this Law.

From this most students correctly get the idea that they have to analyze the parts of the workload seperately, as their parallelization behaviour is not uniform.

For example, loading the data from a spinning disk - given one disk - behaves differently than, for example, the evaluation of a query once the data has been loaded.

justinsaccount · on March 26, 2021

I often use a variation of this line of thinking for determining if something is worth optimizing.

Sometimes it can be hard to tell from profiling output or flame graphs just how much time is spent somewhere. Before trying to make something faster the first thing I do is turn it into a no-op.

If avoiding doing something entirely makes the program run 2% faster, then making that run in half the time is only going to give an overall 1% improvement.

I don't know if there's a name for this variation. It's just the same thing applied to different components and not specifically related to parallelism.

SatvikBeri · on March 26, 2021

Technically that's still just Amdahl's law, the statement has nothing to do with parallelism. I'm not sure when it became so heavily associated with parallelism.

Bluestein · on March 26, 2021

"In computer architecture, Amdahl's law (or Amdahl's argument[1]) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967."

saberience · on March 26, 2021

Maybe I'm missing something here but isn't this mind-numbingly obvious? I've never read this before but this is exactly how my mental model for parallelisation has always worked.

hutrdvnj · on March 26, 2021

This is why we need 100% parallel code, binaries that we can run in parallel across many different machines.

sidpatil · on March 26, 2021

That won't make a difference unless the underlying algorithms can be perfectly parallelized. That's the whole point of Amdahl's Law.

sunstone · on March 26, 2021

The Goal, but with computers.