In the days of OS/360 and OS/370 (IBM's mainframe OS in the 60's and '70s) no instructions at all executed in the wait state. There was no OS level idle process. Instead, the internal CPU microcode itself ran a little loop without dispatching any machine instructions until the tentacles that reached out from that loop were tickled by an external interrupting event. That loop had to be very short for latency reasons and very prickly to test for all the conditions that could take the processor out of it to resume dispatching instructions or do critical internal things.
I know this because I wrote that 6 (initially 8) micro-instruction loop for the IBM System/370 Model 155-157 in the late '60s. I was told back then that since that model became the most widely sold machine ever I had written the code most executed in the history of computing. :-)
The status latch in the machine that indicated that loop was running was fed to a light on the console and, yep, it was nearly always lit other than when booting the OS.
They kept that trick up for a long time. As of AIX 3.2.5 back in the 1990s, an "idle" RS/6000 ran 70% cpu utilization. They were super-fast and efficient in their day, too.
Dunno how true that is today, but I wouldn't be surprised if AIX still uses the same kind of tight event loop internally.
I think John Cocke was working out the architecture of that line then. He's not often credited with it but he invented RISC with that effort. Hardware was built in research (called the 80x where I don't remember x) but the actual Power-PC was quite a ways in the future.
To me the brilliance in Radin's 801 paper was the belief that you could trust the compiler to do a better job. Essentially a compiler can have more than 10 fingers to find its way through optimizing code (optimizing assembly often means marking places with your fingers and glancing back and forth as your trace the logic of execution). That realization struck me at the time like a thunderbolt.
Cocke deserved his Turing, no question about it, however I see the intellectual antecedents of RISC being machines like Gordon Bell's PDP-6/10 (with a small, clean, orthogonal instruction set) and Cray's 6600 (with its overlapping execution units). CISC was a doomed effort to make assembly programmers more efficient.
Through the most popular compiler output today is a CISC instruction set, that output is simply run on microprogrammed emulators.
Yes, John and Fran Allen wrote the first optimizing fortran compiler and developed most of the optimization techniques and analysis used by subsequent compilers.
I knew John and it was that work that led him to design an instruction set optimized for compilers that was implemented in the 801. I'm less aware of George's contribution. Perhaps you could enlighten me.
This can be true on modern CPUs as well. In fact, if the software chooses, it can even remove power from the core dynamically so one doesn't even have to pay the leakage power.
Oh, all those machines used linear ECL (Emitter Coupled Logic) circuits for speed. Terribly hot logic family but complimentary FET based devices just couldn't yet cut the mustard.
FWIW, the transition from the Model 155 to the 157 heralded the introduction of table driven virtual memory. We called it "relocation" hardware then. :-)
The first engineering model of the 155 used core memory but transistor memory became fast, reliable and cheap enough during its development that a very disruptive re-design was performed across the whole family of machines.
That was my first job out of the university and what fun it was! And what an incredible experience working for IBM was then.
What a fun gig. And what interesting times. I remember the headlines in Computerworld when the cost of a megabyte of memory had dropped to $15,000.
I did very little work on the mainframes, lots on the IBM 1800, a process control computer, and my first consulting gig was on a system 32. Don't ask me what programming language it was.
It was all discrete, hard logic circuits centered around a unit called the scoreboard. That term was borrowed from either Burroughs or CDC, I can't remember which. That machine pioneered many of the scalar optimizations that are common in today's architectures.
I got the hardware guys to merge some of the conditions that caused a loop exit into condition groups that were tested by other micro-instructions until a couple went away. Getting out quick was important, sorting out why was less so.
Yeah, cpu idleness is pretty fascinating and, as it turns out, quite important. Shameless plug, I just published a paper [1], showing that if you mess up how deeply you go to sleep in cpuidle, you can loose up to ~15% latency in datacenter services like websearch.
This reminds me: if you're stuck on a Linux kernel before 3.9 and you're using Sandy Bridge hardware or newer, turn off C1E in BIOS. The "C1E" setting controls a misfeature in which the CPU will look at the OS's request for a light sleep and say "Ha! You must have asked for light sleep by accident! I'm going to go all the way to sleep instead!" This causes surprisingly severe performance problems (we've seen milliseconds of unexplained random hiccups).
Linux 3.9 and higher is immune: it just forcibly disables this feature.
Hah. I guess that's a major benchmark win - decreases power usage in common cases, and few CPU benchmarks test wake-from-sleep latency either directly or indirectly.
Tiny little correction: On x86 multi-processor machines, the CPU can be woken from idle by an interrupt or by a write to a designated address in main memory from another CPU. Linux uses* the latter technique to very efficiently wake up a process on one CPU when it receives data from another CPU. This is a big speedup on some workloads.
DOS-style operating systems were written at a time when halting the CPU to reduce power consumption was basically not a concern, so the "idle loop" is actually an infinite loop (TSRs often hooked into this loop to do "background processing", a rudimentary form of task switching.)
I seriously doubt that it is healthy for any computer to run for a longer time with 100% CPU usage.
Computers in the DOS and Win9x days would be at 100% CPU usage for basically as long as they were switched on; and a modern one should be absolutely fine running at 100% for an indefinite amount of time provided that things like the cooling system are in good working condition. The biggest difference between CPUs now and CPUs then is the difference in power consumption between idle/sleep and full power - e.g. a 66MHz 486 consumes <5W at full power (and not much difference when halted) vs. ~5W at idle and close to 100W for a recent Haswell i7.
If I have a DOS guest running for some time in VirtualBox, with full CPU usage, OS X eventually will pop up a nice half transparent window. It tells me that my computer needs now to be shut down and that I should push the power button long enough to do so. It's the Mac version of the BSOD.
That suggests insufficient cooling. If it's a system that's been in use for a while, it's probably time to clean out the heatsinks and fans. If a new system is overheating from 100% CPU usage, then I would consider it defective.
Yeah there's definitely something wrong with his hardware. If the CPU is overheating, even on my constantly thermally-constrained first-gen MacBook Pro Retina, it should just throttle down the CPU speed to something safe and painfully slow. He's definitely got some hardware fault if it's panicing (check the panic.log)
> That suggests insufficient cooling. If it's a system that's been in use for a while, it's probably time to clean out the heatsinks and fans. If a new system is overheating from 100% CPU usage, then I would consider it defective.
Yes, definitely. The thing to advise is: Clean the heatsinks and fans, and then run a performance test such as prime95 over night to see that the machine indeed can run at 100% usage for a prolonged time. Only then I'd trust a machine that was prone to crashing under load again for my work.
Not sure why this is being downvoted, it's actually a good question. If all of your VM's are idle, you would want your host CPU to idle as well, although my guess is this doesn't happen in things like Xen, VMWare, etc.
There has actually been problems with that, there was a bug with Citrix XenServer [0] where the CPU would, if left on idle for a long time, step down and turn off more or less all cores, which resulted in a locked up and unresponsive host. You can imagine what that did to the virtual machine supposedly running on that host.
Off the top of my head: the VM executes either HLT or MWAIT, and those instructions are generally programmed to cause a VM exit, so the hypervisor can take action. The hypervisor will deschedule the VM and possibly go idle itself.
TBH, I have no idea how MONITOR works in a VM. It might be quite messy.
I believe that these days unused CPU cores are physically shut down by turning off the voltage supply that's feeding them, in order to save power. This is a commonly used technique in chip design called "power gating". It is one of the most important techniques for prolonging battery life in mobile devices.
"The fundamental axiom here is that at any given moment, exactly one task is active on a CPU."
Maybe I missed it, and I did check the earlier post and I understand this is trying to be a simplification, but what was the argument for why I should take the above axiom? Was there one?
Or I am just being too literal and all the author is really trying to say is the current task pointer must always be valid, in Linux. Which is probably true enough (I don't know Linux well) but a much less powerful statement.
From the post contents I suspect the author is using OS as a synonym for Linux.
Back In The Day, before Windows implemented calling the HLT instruction in the idle loop, overclockers would run a userland program that did the same thing, called cpuidle.
It's for avoiding to halt the CPU when there are other pending tasks ready to run.
When the idle task is the only task running, it halts the CPU to save power. When the scheduling timer interrupt fires, it wakes up the CPU and calls the scheduler to schedule tasks. If there are other tasks ready to run, the idle task is swapped out to let another task to run but it is still in the ready state. Depending on the scheduling algorithm, the idle task might get picked to run again before other ready tasks. In that case, it should not halt the CPU but rather just busy-loops the rest of the quantum. At that time, the timer interrupt fires and schedules another ready task to run right the way without needing to wake up the CPU.
I know this because I wrote that 6 (initially 8) micro-instruction loop for the IBM System/370 Model 155-157 in the late '60s. I was told back then that since that model became the most widely sold machine ever I had written the code most executed in the history of computing. :-)
The status latch in the machine that indicated that loop was running was fed to a light on the console and, yep, it was nearly always lit other than when booting the OS.