What Does an Idle CPU Do?

DonGateley · on Oct 29, 2014

In the days of OS/360 and OS/370 (IBM's mainframe OS in the 60's and '70s) no instructions at all executed in the wait state. There was no OS level idle process. Instead, the internal CPU microcode itself ran a little loop without dispatching any machine instructions until the tentacles that reached out from that loop were tickled by an external interrupting event. That loop had to be very short for latency reasons and very prickly to test for all the conditions that could take the processor out of it to resume dispatching instructions or do critical internal things.

I know this because I wrote that 6 (initially 8) micro-instruction loop for the IBM System/370 Model 155-157 in the late '60s. I was told back then that since that model became the most widely sold machine ever I had written the code most executed in the history of computing. :-)

The status latch in the machine that indicated that loop was running was fed to a light on the console and, yep, it was nearly always lit other than when booting the OS.

beat · on Oct 29, 2014

They kept that trick up for a long time. As of AIX 3.2.5 back in the 1990s, an "idle" RS/6000 ran 70% cpu utilization. They were super-fast and efficient in their day, too.

Dunno how true that is today, but I wouldn't be surprised if AIX still uses the same kind of tight event loop internally.

socceroos · on Oct 29, 2014

RS/6000. Heh - I'm the only person left in my state who still knows how one of these works.

Diagnosing startup problems with all those super quick flashing codes was a nightmare.

beat · on Oct 29, 2014

Yeah, but SMIT is still the best old-school admin tool ever.

Back in the day, I considered doing an open-source knockoff, just to call it SMUT.

ghuntley · on Oct 29, 2014

http://www.pimpworks.org/ibm/aixled.html

:-)

kev009 · on Oct 30, 2014

http://ps-2.kev009.com/rs6000/ - some fun stuff here

petemc_ · on Oct 30, 2014

I played with one for a few years, felt pretty l33t dd'ing the kernel into the prep partition rather than using yaboot.

Angostura · on Oct 30, 2014

Presumably these were the pre-PowerPC machines?

DonGateley · on Oct 30, 2014

I think John Cocke was working out the architecture of that line then. He's not often credited with it but he invented RISC with that effort. Hardware was built in research (called the 80x where I don't remember x) but the actual Power-PC was quite a ways in the future.

gumby · on Nov 1, 2014

To me the brilliance in Radin's 801 paper was the belief that you could trust the compiler to do a better job. Essentially a compiler can have more than 10 fingers to find its way through optimizing code (optimizing assembly often means marking places with your fingers and glancing back and forth as your trace the logic of execution). That realization struck me at the time like a thunderbolt.

Cocke deserved his Turing, no question about it, however I see the intellectual antecedents of RISC being machines like Gordon Bell's PDP-6/10 (with a small, clean, orthogonal instruction set) and Cray's 6600 (with its overlapping execution units). CISC was a doomed effort to make assembly programmers more efficient.

Through the most popular compiler output today is a CISC instruction set, that output is simply run on microprogrammed emulators.

DonGateley · on Nov 1, 2014

Yes, John and Fran Allen wrote the first optimizing fortran compiler and developed most of the optimization techniques and analysis used by subsequent compilers.

I knew John and it was that work that led him to design an instruction set optimized for compilers that was implemented in the 801. I'm less aware of George's contribution. Perhaps you could enlighten me.

gumby · on Nov 2, 2014

I was simply referring to to famous 801 paper in the IBM R&D Journal of which Radin was the author.

peterfirefly · on Oct 30, 2014

http://en.wikipedia.org/wiki/IBM_801

esmi · on Oct 30, 2014

This can be true on modern CPUs as well. In fact, if the software chooses, it can even remove power from the core dynamically so one doesn't even have to pay the leakage power.

See for example Intel's C6 state. http://www.hardwaresecrets.com/article/Everything-You-Need-t...

wglb · on Oct 29, 2014

Awesome.

Were some of the smaller machines (135 or smaller) non-microcode? E.g. RTL or TTL hardwired?

DonGateley · on Oct 29, 2014

Everything in that generation of machines was micro-coded except for the Model 195 which was then IBM's super scientific computer.

DonGateley · on Oct 29, 2014

Oh, all those machines used linear ECL (Emitter Coupled Logic) circuits for speed. Terribly hot logic family but complimentary FET based devices just couldn't yet cut the mustard.

FWIW, the transition from the Model 155 to the 157 heralded the introduction of table driven virtual memory. We called it "relocation" hardware then. :-)

The first engineering model of the 155 used core memory but transistor memory became fast, reliable and cheap enough during its development that a very disruptive re-design was performed across the whole family of machines.

That was my first job out of the university and what fun it was! And what an incredible experience working for IBM was then.

wglb · on Oct 30, 2014

What a fun gig. And what interesting times. I remember the headlines in Computerworld when the cost of a megabyte of memory had dropped to $15,000.

I did very little work on the mainframes, lots on the IBM 1800, a process control computer, and my first consulting gig was on a system 32. Don't ask me what programming language it was.

yuhong · on Oct 30, 2014

The new model was called the 158 (and the 168). There was a DAT box for the 155 and 165 to gain virtual memory (aka DAT).

DonGateley · on Oct 30, 2014

Yes, I think you're right. Clunky memory. And I didn't know about the 155, 165 retrofit.

wglb · on Oct 29, 2014

Ah, yes.

What did that use internally?

DonGateley · on Oct 30, 2014

It was all discrete, hard logic circuits centered around a unit called the scoreboard. That term was borrowed from either Burroughs or CDC, I can't remember which. That machine pioneered many of the scalar optimizations that are common in today's architectures.

beachstartup · on Oct 29, 2014

8 to 6... a 25% optimization. what'd you take out?

DonGateley · on Oct 29, 2014

I got the hardware guys to merge some of the conditions that caused a loop exit into condition groups that were tested by other micro-instructions until a couple went away. Getting out quick was important, sorting out why was less so.

AceJohnny2 · on Oct 29, 2014

Gustavo Duarte's (not to be confused with Androider Matias Duarte) blog posts are awesome. I still highly recommend his 2009 in-depth review of how the Linux kernel manages memory: http://duartes.org/gustavo/blog/post/how-the-kernel-manages-...

pwelch · on Oct 29, 2014

Wish I could upvote you twice. His posts are amazing.

s_kanev · on Oct 29, 2014

Yeah, cpu idleness is pretty fascinating and, as it turns out, quite important. Shameless plug, I just published a paper [1], showing that if you mess up how deeply you go to sleep in cpuidle, you can loose up to ~15% latency in datacenter services like websearch.

[1] http://www.skanev.org/papers/iiswc14ep.pdf

amluto · on Oct 30, 2014

This reminds me: if you're stuck on a Linux kernel before 3.9 and you're using Sandy Bridge hardware or newer, turn off C1E in BIOS. The "C1E" setting controls a misfeature in which the CPU will look at the OS's request for a light sleep and say "Ha! You must have asked for light sleep by accident! I'm going to go all the way to sleep instead!" This causes surprisingly severe performance problems (we've seen milliseconds of unexplained random hiccups).

Linux 3.9 and higher is immune: it just forcibly disables this feature.

makomk · on Oct 30, 2014

Hah. I guess that's a major benchmark win - decreases power usage in common cases, and few CPU benchmarks test wake-from-sleep latency either directly or indirectly.

amluto · on Oct 29, 2014

Tiny little correction: On x86 multi-processor machines, the CPU can be woken from idle by an interrupt or by a write to a designated address in main memory from another CPU. Linux uses* the latter technique to very efficiently wake up a process on one CPU when it receives data from another CPU. This is a big speedup on some workloads.

* This didn't work very well until Linux 3.16.

amelius · on Oct 29, 2014

But what does a CPU running an idle virtual machine do?

ciupicri · on Oct 30, 2014

By the way, FreeDOS and MS-DOS use 100% of the CPU when running on a VirtualBox virtual machine [1].

[1]: http://www.freedos.org/wiki/index.php/VirtualBox_-_Chapter_7

userbinator · on Oct 30, 2014

DOS-style operating systems were written at a time when halting the CPU to reduce power consumption was basically not a concern, so the "idle loop" is actually an infinite loop (TSRs often hooked into this loop to do "background processing", a rudimentary form of task switching.)

I seriously doubt that it is healthy for any computer to run for a longer time with 100% CPU usage.

Computers in the DOS and Win9x days would be at 100% CPU usage for basically as long as they were switched on; and a modern one should be absolutely fine running at 100% for an indefinite amount of time provided that things like the cooling system are in good working condition. The biggest difference between CPUs now and CPUs then is the difference in power consumption between idle/sleep and full power - e.g. a 66MHz 486 consumes <5W at full power (and not much difference when halted) vs. ~5W at idle and close to 100W for a recent Haswell i7.

If I have a DOS guest running for some time in VirtualBox, with full CPU usage, OS X eventually will pop up a nice half transparent window. It tells me that my computer needs now to be shut down and that I should push the power button long enough to do so. It's the Mac version of the BSOD.

That suggests insufficient cooling. If it's a system that's been in use for a while, it's probably time to clean out the heatsinks and fans. If a new system is overheating from 100% CPU usage, then I would consider it defective.

kalleboo · on Oct 30, 2014

Yeah there's definitely something wrong with his hardware. If the CPU is overheating, even on my constantly thermally-constrained first-gen MacBook Pro Retina, it should just throttle down the CPU speed to something safe and painfully slow. He's definitely got some hardware fault if it's panicing (check the panic.log)

cnvogel · on Oct 30, 2014

> That suggests insufficient cooling. If it's a system that's been in use for a while, it's probably time to clean out the heatsinks and fans. If a new system is overheating from 100% CPU usage, then I would consider it defective.

Yes, definitely. The thing to advise is: Clean the heatsinks and fans, and then run a performance test such as prime95 over night to see that the machine indeed can run at 100% usage for a prolonged time. Only then I'd trust a machine that was prone to crashing under load again for my work.

voltagex_ · on Oct 30, 2014

Which is why someone wrote DOSIDLE.

I'm only just starting to learn asm so I'm not entirely sure how it works, but the source is available at http://maribu.home.xs4all.nl/zeurkous/download/mirror/dosidl....

zheng · on Oct 30, 2014

Not sure why this is being downvoted, it's actually a good question. If all of your VM's are idle, you would want your host CPU to idle as well, although my guess is this doesn't happen in things like Xen, VMWare, etc.

vegardx · on Oct 30, 2014

There has actually been problems with that, there was a bug with Citrix XenServer [0] where the CPU would, if left on idle for a long time, step down and turn off more or less all cores, which resulted in a locked up and unresponsive host. You can imagine what that did to the virtual machine supposedly running on that host.

[0] http://support.citrix.com/article/CTX127395

amluto · on Oct 30, 2014

Off the top of my head: the VM executes either HLT or MWAIT, and those instructions are generally programmed to cause a VM exit, so the hypervisor can take action. The hypervisor will deschedule the VM and possibly go idle itself.

TBH, I have no idea how MONITOR works in a VM. It might be quite messy.

Edit: Here's a really nice explanation: http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html

hderms · on Oct 30, 2014

Does it reinitiate VM execution by waiting for the hypervisor to be scheduled again which schedules out to its constituent vms?

abuzzooz · on Oct 30, 2014

I believe that these days unused CPU cores are physically shut down by turning off the voltage supply that's feeding them, in order to save power. This is a commonly used technique in chip design called "power gating". It is one of the most important techniques for prolonging battery life in mobile devices.

leonatan · on Oct 29, 2014

When I was young, I never understood why there was an idle process in Windows, eating my CPU... The days without Internet.

esmi · on Oct 30, 2014

"The fundamental axiom here is that at any given moment, exactly one task is active on a CPU."

Maybe I missed it, and I did check the earlier post and I understand this is trying to be a simplification, but what was the argument for why I should take the above axiom? Was there one?

Or I am just being too literal and all the author is really trying to say is the current task pointer must always be valid, in Linux. Which is probably true enough (I don't know Linux well) but a much less powerful statement.

From the post contents I suspect the author is using OS as a synonym for Linux.

typedweb · on Oct 29, 2014

I seem to recall the idle 'task' on Windows zero's empty memory pages, so there might be less of what's happening in the article on those machines.

sbierwagen · on Oct 30, 2014

Back In The Day, before Windows implemented calling the HLT instruction in the idle loop, overclockers would run a userland program that did the same thing, called cpuidle.

http://screenshots.en.sftcdn.net/en/scrn/5000/5197/cpu-idle-...

http://www.cpuidle.de/news.php

coryking · on Oct 30, 2014

There was actually several of them used by overclockers. I was always a fan of "waterfall". It made a huge difference in the CPU temperature!

http://estu.nit.ac.jp/~e982457/other/cpuidle/idle.htm

bjornsing · on Oct 30, 2014

I was hoping for an explanation as to why process 0 needs to be "multitasking cooperatively", as in calling needs_reschedule()... Anybody?

ww520 · on Oct 30, 2014

It's for avoiding to halt the CPU when there are other pending tasks ready to run.

When the idle task is the only task running, it halts the CPU to save power. When the scheduling timer interrupt fires, it wakes up the CPU and calls the scheduler to schedule tasks. If there are other tasks ready to run, the idle task is swapped out to let another task to run but it is still in the ready state. Depending on the scheduling algorithm, the idle task might get picked to run again before other ready tasks. In that case, it should not halt the CPU but rather just busy-loops the rest of the quantum. At that time, the timer interrupt fires and schedules another ready task to run right the way without needing to wake up the CPU.

kakakiki · on Oct 30, 2014

Excellent blog! I am loving all the other posts he has written!

CmonDev · on Oct 30, 2014

So there is a distributed mega-super-computer being idle on this planet.

DogeDogeDoge · on Oct 29, 2014

nop nop nop nop

Codhisattva · on Oct 30, 2014

Counts sheep.

sholanozie · on Oct 29, 2014

This is a really awesome article - thanks for sharing!

mindslight · on Oct 29, 2014

Spread spectrum modulate its emissions to backhaul cached crypto keys?

(/me ducks)

edit: oops, the downvoters are correct; modern cpus have dedicated hardware for that.

beat · on Oct 29, 2014

What does an idle CPU do?

Oh, lifts weights, writes a little poetry, works on its side project startup, surfs for cat videos...