Understanding Linux CPU Load - when should you be worried?

molo_ · on July 26, 2011

There are two contributions to the load factor: number of processes/threads on the ready-to-run queue and the number blocked on I/O. The processes blocked on I/O show up in the "D" state in ps and top and also contribute to this number.

This article entirely ignores the number of processes blocked on I/O. A load average exceeding the number of CPUs (cores, whatever) does not automatically mean the CPUs are overloaded.

vegardx · on July 26, 2011

Thank you, finally someone who managed to mention it. I was at some point thinking that I've misunderstood the whole concept of load, but now you just confirmed my initial doubt, they are trying to sell a product.

How did this end up on HN? You have equally good articles on load at Wikipedia.

caf · on July 27, 2011

Not all processes blocked on I/O are in D state - for a common example, processes blocked on I/O to a network socket or terminal will simply be in the S state, and not count towards load.

duskwuff · on July 27, 2011

In practice, processes typically go into D state ("uninterruptible sleep") when they're blocked on access to a local disk, whether that's explicit I/O (read/write) or implicit (paging). Not coincidentally, this is also the one type of blocking I/O that you can't get knocked out of by a signal.

freiheit · on July 27, 2011

Actually, they'll get blocked on access to NFS or other network-based disk, too. (but you can configure the NFS mount to allow signals to interrupt) I've seen NFS problems lead to loads over 100.

kbob · on July 26, 2011

Three comments.

1. I can't comment on the original article. Are comments closed, or am I dumb?

2. The author seems to have assumed a web server responding to bursty traffic. Several people have pointed out workloads to which the 0.7 heuristic doesn't apply - compute servers, I/O bound servers, compile jobs, desktops. He should have stated that assumption up front.

3. Hyperthreads. For purposes of load monitoring, you should be counting the number of threads, not the number of cores. Yes, hyperthreads are slower than cores, but that doesn't matter. The load average is the ratio of work available to work being done (oversimplified, I know), and, as such, it's scaled to the actual throughput of the threads available.

Fortunately, the author suggested counting CPUs by reading /proc/cpuinfo, and /proc/cpuinfo lists threads, not cores. So those two errors cancel out. (-:

scott_s · on July 26, 2011

Point 3 depends on the workloads. Most SMT [1] implementations replicate integer functional units - otherwise the threads would stall on basic things like computing addresses - but they don't replicate floating point units. So if you have lots of floating point heavy work, then you're limited by the number of cores, not the total number of SMT contexts provided by all of those cores.

So it's not that SMT pipelines are slower, it's just that they share resources with the other SMT pipelines.

[1] Simultaneous multithreading (SMT), http://en.wikipedia.org/wiki/Simultaneous_multithreading, is the generic name for what Intel calls hyperthreading.

kbob · on July 27, 2011

You are correct. I should have had two oversimplification disclaimers in that sentence. (-:

ChuckMcM · on July 26, 2011

1.) No you are not dumb, this is an advertisement for their product. (also note the blog posting is either dated in the future or over a year old, not sure which)

2) They make that assumption because they are selling web server monitoring tools, if you don't have a web server you aren't their target user :-)

3) Now you actually want to talk about real details about monitoring performance and the goal of this particular article is to sell their product to people who run web servers and probably don't want to delve too deeply into actual performance analysis.

Hope that helps.

nbpoole · on July 26, 2011

> I can't comment on the original article. Are comments closed, or am I dumb?

It was posted in 2009, so the comments are probably closed :-)

acabal · on July 26, 2011

I love reading stuff like this. As a kind-of sysadmin by need rather than by choice, I'm often confused and intimidated by systems that other sysadmins seem to be born knowing about. It's always refreshing to read a straightforward explanation for one of those important concepts that seems to be common knowledge for everyone but me, and never seems to be explained anywhere.

sciurus · on July 26, 2011

Load average is an easy number to monitor, so lots of people focus on it. However, it doesn't provide you with much information. When your load is high, you have to examine other values (e.g. CPU time spent in user mode, system mode, and iowait) to determine why the load is high before you can start to resolve the problem. If you monitor and alert directly on those other values, you'll save time.

keithnoizu · on July 26, 2011

Exactly, looking at the load curves over time is useful to gauging how well you are doing overall and for spotting potential trouble issues but when it comes to actually dealing with these issues or predicting an eminent collapse i tend to look at mysql threads and performance, the slow query log, concurrent users, etc. for determining what needs to be dealt with and what will hit us on the head in the near future.

DrJ · on July 26, 2011

(if you don't know this you should read the link):

If (# CPU Cores / Load) > 1, shit has hit the fan

I disagree with 0.7 being the starting point for investigation on extraneous load, but you should be more worried about changes in 1st or 2nd moments in the load (velocity and acceleration), which as analogy on the link, you don't care too much about steady traffic, it's when traffic starts bursting at the seams.

Having a machine running at 0.75 load for a shared machined (say a development database) might actually mean your resources are actually being consumed regularly. Albeit seeing that average load climb slowly towards ~1.0 means you need to fix it before the pipes clog shut.

joe_bleau · on July 26, 2011

Don't you mean (Load / # cores) > 1?

DrJ · on July 26, 2011

yep, whoops

xtacy · on July 26, 2011

I am not sure if it's even possible to accurately measure 1st and 2nd moment changes in load to a fine extent that you can quickly (edit) detect bursts and distinguish them from noise. Also, the analogy about the link seems isn't quite right; you shouldn't be pushing the link to its maximum capacity.

This is because traffic is rarely "smooth." Even if you say the link is operating at 90% utilisation, you usually refer to the average. Pushing a system to high load can lead to instabilities and unpredictable performance.

syedkarim · on July 26, 2011

Would there be any reason that perceived performance would decrease when the cpu load is 50% of the total number of cores? We have an X5660 with 24-cores and once the one-minute average gets over 12, pageload times increase dramatically.

wiredfool · on July 26, 2011

Check your disk stats, or memory. Most of the time when I've had high load averages, it's been because the disk system is swamped and there's a whole bunch of processes waiting on io.

For example, in the classic apache dos failure state, you wind up with enough apache processes that some of them are forced out of memory, and then you start swapping and fall over. What you'll see then is a really high load average and long page load times. Looking at something like vmstat 1 or top, or iotop you can see if it looks like memory, disk, or something else.

From your description, you probably have enough cpu resources to saturate something else. Maybe the DB server, maybe your memory. When that happens, your processes stack up and your load average rises. It looks like you don't have neough CPU, but that's probably not it.

azim · on July 26, 2011

What's likely going on is fairly complicated to delve in to in a comment thread like this. However one possibility is that Linux just has a terribly hard time scheduling with that many cores. Processes attempt to maintain locality up to a point, but tend to move around between cpus when another has more free time. Moving results in a cold cache needing refreshing, and that significantly slows down work.

mrich · on July 26, 2011

make sure you are looking at physical core count, not hyperthreaded cores (which should be a 2x difference for your CPU).

keeperofdakeys · on July 27, 2011

A hyperthreaded cpu actually emulates an extra core. A cpu with a single physical core and a hyperthreaded core is seen by the OS as two cores, it doesn't know anything about the hyperthreading. This means that the load average 2.0 means the system is fully loaded, not 1.0.

Although, it is much easier to go from 1.0 to 2.0, as opposed to 0 to 1.0 on such a cpu, because the cpu can't handle much more work before it gets overloaded.

scott_s · on July 27, 2011

Lacking any other information, I would look into this explanation first - I have seen many experiments on cores with SMT contexts where the performance plateaus at the total number of cores, not SMT contexts.

syedkarim · on July 26, 2011

Am I using the wrong command to count physical cores (I'm guessing so? grep 'model name' /proc/cpuinfo | wc -l What should I use to count physical cores?

mrich · on July 26, 2011

/proc/cpuinfo are hyperthreaded cores, as exposed to the OS. For basically all the modern multi-core Intel Xeon CPUs you can divide that by 2. It seems you can also find out by looking at physical ID and "cpu cores". On a 64 (hyperthreaded) cores machine, I see physical ID 0..4 and cpu cores 8 in this case, which would mean 8*4=32.

mrich · on July 27, 2011

small correction: "physical ID 0..3"

sciurus · on July 26, 2011

`grep 'core id' /proc/cpuinfo | sort -u | wc -l` will work. See http://serverfault.com/questions/262867/how-to-find-out-if-m...

ceejayoz · on July 26, 2011

> I disagree with 0.7 being the starting point for investigation on extraneous load, but you should be more worried about changes in 1st or 2nd moments in the load (velocity and acceleration), which as analogy on the link, you don't care too much about steady traffic, it's when traffic starts bursting at the seams.

Caring about 0.7 load means you've got some capacity left if traffic does burst. You generally don't have warning, so having a healthy amount of CPU left available is generally good.

aaronharnly · on July 27, 2011

Incidentally, on a Mac, this will give you your number of cores, along with other handy stuff:

system_profiler SPHardwareDataType

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro5,1
      Processor Name: Intel Core 2 Duo
      Processor Speed: 2.4 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache: 3 MB
      Memory: 4 GB
      Bus Speed: 1.07 GHz
      Boot ROM Version: MBP51.007E.B05
      SMC Version (system): 1.41f2
      Serial Number (system): [snip]
      Hardware UUID: [snip]
      Sudden Motion Sensor:
          State: Enabled

jff · on July 26, 2011

Nicely written... but looking at the last section "Bringing it Home" I'd like to point out that if you were to, say, do a make -j<#cores> in the Linux source tree (or any other bloated GNU monstrosity) you'll get a 15-min load of well over the 70% desired :) But it's not a bad thing... it just means Firefox will run like crap for a while. Also, don't do that on your web server, which is probably what he was talking about anyway.

jerf · on July 26, 2011

"it just means Firefox will run like crap for a while."

Put an "ionice -c 3" on that job and you probably won't notice the performance effect on Firefox anymore. (You probably don't need conventional "nice" because compile jobs tend to get their priorities dropped anyhow because they are using a lot of CPU without yielding, but dropping the scheduler that hint can still be helpful in some cases.)

(Annoyingly, unlike nice, ionice requires the specification of the class; I wish it would just default to -c 3 like nice has a reasonable default.)

simcop2387 · on July 26, 2011

IO Nice will just affect the io scheduling (obviously), but you can also do the same thing to the CPU schedulers.

    schedtool -B -e ionice -c3 make -j10

That's the idiom I commonly use on long large compile jobs. means that anything else will always get the cpu or io time (maybe with some increased latency, which usually isn't too bad) that it needs while all idle time is taken up by the larger compile job. This makes for a very happy desktop system when upgrading (gentoo user here).

cpeterso · on July 26, 2011

Linus Torvalds' stance on "nice make":

https://lwn.net/Articles/418739/

Seriously. Nobody _ever_ does "nice make", unless they are seriously repressed beta-males (eg MIS people who get shouted at when they do system maintenance unless they hide in dark corners and don't get discovered). It just doesn't happen.

:)

sciurus · on July 26, 2011

If you don't have schedtool available on your distro, you can use 'chrt --batch 0' instead.

otterley · on July 27, 2011

ionice(1) only works if you're using the CFQ I/O scheduler, and even then, only on reads.

adobriyan · on July 27, 2011

"make -jN" is peanuts.

Now, "make -j" on Linux source tree is not something most machines recover from.

afhof · on July 26, 2011

I had heard that the load averages were the size of the scheduler's ready queue. If that is correct, wouldn't a load of more than 1.00 on a multi processor machine still be bad, since processes are ready to fire but are waiting for the next jiffy?

seiji · on July 26, 2011

Here's a more thorough treatment (with maths and all):

http://www.teamquest.com/pdfs/whitepaper/ldavg1.pdf

http://www.teamquest.com/pdfs/whitepaper/ldavg2.pdf

kronusaturn · on July 26, 2011

Each CPU has its own scheduler queue; load average represents the summed length of all of them.

Create · on July 26, 2011

http://video.google.com/videoplay?docid=-8002801113289007228

ftp://crisp.dyndns-server.com/pub/release/website/dtrace/

quantumhobbit · on July 27, 2011

So how does this change for logical/nonphysical cores. Should a hyperthreaded dual core system be considered full at a load of 2.00, 4.00 or something in between like 3.00?