top -- What is it telling us?

EddieJennings

While I think I have an idea of what load and %cpu mean, methinks I'm mistaken. So hopefully this thread will help not only me, but others who might have similar misunderstanding.

load average: 0.55, 0.52, 0.58

Since is a single vCPU Vultr VM, top is telling me that over the last 5 minutes 55% of my CPU's capacity was being used. Over the last 10 minutes 52% of capacity was used and the last 15 minutes 58% of my CPU's capacity was being used.

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
5904 asterisk  20   0 1632188  61960   6052 S  0.0  6.1   0:17.64 asterisk

Here top is telling me that the asterisk process with PID 5904 is taking 0% of my CPU's resources and 6.1% of my RAM at that moment in time.

At one point today, my FreePBX test had load averages of 8.0. I panicked and restarted (have since learned the error of my ways), but I did notice upon running top before the restart that glancing down the %CPU column showed no set of values that whose sum would've been close to 100%, which based on my likely incorrect understanding seems odd.

For reference, here is the result of top -b -n 1 sent to a text file.

top - 14:44:42 up 30 min,  1 user,  load average: 0.55, 0.52, 0.58
Tasks: 110 total,   2 running, 108 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.0 us,  3.3 sy,  0.0 ni, 87.3 id,  0.2 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem :  1016380 total,   156324 free,   578004 used,   282052 buff/cache
KiB Swap:  2097148 total,  2068624 free,    28524 used.   211464 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    1 root      20   0   43400   3112   2016 S  0.0  0.3   0:01.89 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.07 ksoftirqd/0
    6 root      20   0       0      0      0 S  0.0  0.0   0:00.06 kworker/u2:0
    7 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S  0.0  0.0   0:00.89 rcu_sched
   10 root      rt   0       0      0      0 S  0.0  0.0   0:00.01 watchdog/0
   12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kdevtmpfs
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 netns
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.00 khungtaskd
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 writeback
   16 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kintegrityd
   17 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 bioset
   18 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kblockd
   19 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 md
   25 root      20   0       0      0      0 S  0.0  0.0   0:00.45 kswapd0
   26 root      25   5       0      0      0 S  0.0  0.0   0:00.00 ksmd
   27 root      39  19       0      0      0 S  0.0  0.0   0:00.34 khugepaged
   28 root      20   0       0      0      0 S  0.0  0.0   0:00.00 fsnotify_mark
   29 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 crypto
   37 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kthrotld
   39 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kmpath_rdacd
   40 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kpsmoused
   41 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 ipv6_addrconf
   60 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 deferwq
   95 root      20   0       0      0      0 S  0.0  0.0   0:00.01 kauditd
  246 root      20   0       0      0      0 S  0.0  0.0   0:00.22 kworker/0:3
  265 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kworker/u2:2
  266 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 ata_sff
  282 root      20   0       0      0      0 S  0.0  0.0   0:00.00 scsi_eh_0
  283 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 scsi_tmf_0
  284 root      20   0       0      0      0 S  0.0  0.0   0:00.00 scsi_eh_1
  285 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 scsi_tmf_1
  288 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 ttm_swap
  360 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kdmflush
  361 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 bioset
  371 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kdmflush
  372 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 bioset
  386 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfsalloc
  387 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs_mru_cache
  388 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-buf/dm-0
  389 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-data/dm-0
  390 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-conv/dm-0
  391 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-cil/dm-0
  392 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-reclaim/dm-
  393 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-log/dm-0
  394 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 xfs-eofblocks/d
  395 root      20   0       0      0      0 S  0.0  0.0   0:00.73 xfsaild/dm-0
  471 root      20   0   36116   4036   3800 S  0.0  0.4   0:00.67 systemd-journal
  486 root      20   0  200816   7168    836 S  0.0  0.7   0:00.01 lvmetad
  501 root      20   0   43612   1200   1000 S  0.0  0.1   0:00.06 systemd-udevd
  519 root      20   0       0      0      0 S  0.0  0.0   0:00.02 hwrng
  520 root      20   0       0      0      0 S  0.0  0.0   0:00.00 vballoon
  582 root      20   0       0      0      0 S  0.0  0.0   0:00.00 jbd2/vda1-8
  583 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 ext4-rsv-conver
  597 root      16  -4   55412   1304   1236 S  0.0  0.1   0:00.06 auditd
  617 root      20   0  217936   3976   3724 S  0.0  0.4   0:00.36 rsyslogd
  622 root      20   0   24188   1548   1296 S  0.0  0.2   0:00.25 systemd-logind
  625 root      20   0   27100    596    596 S  0.0  0.1   0:00.00 xinetd
  627 avahi     20   0   27976    976    908 S  0.0  0.1   0:00.03 avahi-daemon
  629 root      20   0   14988    864    828 S  0.0  0.1   0:00.00 incrond
  637 avahi     20   0   27976     24      0 S  0.0  0.0   0:00.00 avahi-daemon
  638 root      20   0    6964    232    188 S  0.0  0.0   0:00.00 mdadm
  639 nobody    20   0   15540    872    868 S  0.0  0.1   0:00.00 dnsmasq
  642 root      20   0  541040  22900  14524 S  0.0  2.3   0:00.33 httpd
  643 polkitd   20   0  527508   4404   2112 S  0.0  0.4   0:00.18 polkitd
  650 dbus      20   0   24544   1364   1136 S  0.0  0.1   0:00.52 dbus-daemon
  655 chrony    20   0  115844   1176   1000 S  0.0  0.1   0:00.03 chronyd
  670 root      20   0  553148  11584   2508 S  0.0  1.1   0:00.25 tuned
  671 root      20   0  105476   2540   2460 S  0.0  0.2   0:00.01 sshd
  698 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 cfg80211
  717 root      20   0   25840    916    716 S  0.0  0.1   0:00.00 atd
  724 root      20   0  126252   1644    964 S  0.0  0.2   0:00.06 crond
  737 root      20   0  110032    812    688 S  0.0  0.1   0:00.00 agetty
  744 mysql     20   0  113252   1560   1280 S  0.0  0.2   0:00.02 mysqld_safe
  748 mongodb   20   0  482916  15728   2072 R  0.0  1.5   0:04.99 mongod
  782 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 dio/dm-0
  992 asterisk  20   0  739892  28808  12044 S  0.0  2.8   0:00.29 httpd
  994 asterisk  20   0  744500  48900  29736 S  0.0  4.8   0:03.28 httpd
  997 asterisk  20   0  744936  45172  23484 S  0.0  4.4   0:00.98 httpd
 1003 asterisk  20   0  658320  46432  26168 S  0.0  4.6   0:01.47 httpd
 1018 mysql     20   0 1172020 138964   5032 S  0.0 13.7   0:14.98 mysqld
 1144 root      20   0   89032   1952    944 S  0.0  0.2   0:00.01 master
 1151 postfix   20   0   89136   3508   2508 S  0.0  0.3   0:00.00 pickup
 1152 postfix   20   0   89204   3500   2492 S  0.0  0.3   0:00.00 qmgr
 1211 root      20   0  112876  12608    124 S  0.0  1.2   0:00.00 dhclient
 1215 asterisk  20   0  195328   6384   2236 S  0.0  0.6   0:00.04 pnp_server
 5881 root      20   0  185668   6808   2024 S  0.0  0.7   0:00.09 fail2ban-server
 5888 root      20   0  394884  18704  10032 S  0.0  1.8   0:01.64 php
 5898 root      20   0  115240    816    568 S  0.0  0.1   0:00.00 safe_asterisk
 5904 asterisk  20   0 1632188  61960   6052 S  0.0  6.1   0:17.64 asterisk
 6939 asterisk  20   0  922860  23244   6524 S  0.0  2.3   0:02.23 PM2 v2.7.1: God
 7782 asterisk  20   0  414292  40116  10536 S  0.0  3.9   0:00.30 php
 8639 asterisk  20   0 1082328  31412   6604 S  0.0  3.1   0:02.38 node /var/www/h
 8739 asterisk  20   0  647284  20756   9604 S  0.0  2.0   0:00.51 httpd
 8773 asterisk  20   0  656384  41708  21416 S  0.0  4.1   0:00.63 httpd
 8774 asterisk  20   0  649608  22596   9024 S  0.0  2.2   0:03.78 httpd
 9228 asterisk  20   0 1273280  73748   6740 S  0.0  7.3   0:08.26 letschat
 9330 asterisk  20   0  649604  23316   9756 S  0.0  2.3   0:03.90 httpd
 9389 asterisk  20   0  656468  43516  23380 S  0.0  4.3   0:01.69 httpd
15485 asterisk  20   0  649120  22596   9540 S  0.0  2.2   0:00.14 httpd
15879 root      20   0       0      0      0 S  0.0  0.0   0:01.12 kworker/0:0
16191 root      20   0  148384   5872   4528 S  0.0  0.6   0:00.31 sshd
16193 root      20   0  115380   2060   1676 S  0.0  0.2   0:00.03 bash
17097 root       0 -20       0      0      0 S  0.0  0.0   0:00.02 kworker/0:2H
17817 root      20   0       0      0      0 S  0.0  0.0   0:00.66 kworker/0:1
18071 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:1H
18953 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
19124 root      20   0  157568   2036   1500 R  0.0  0.2   0:00.00 top

Alex Sage

Paging @scottalanmiller

scottalanmiller

Load is unrelated to CPU utilization. Load has to do with the run queue depth. A run queue can be very deep while the processor might be idle. And the CPU could be extremely busy without there being a run queue. Two very different aspects of your processing.

scottalanmiller

CPU % is a reference to how much work your CPU is doing. It takes an amount of time, generally about one second, and sees how many cycles within that second it had productive work to do and how many it was just awaiting something to do and gives you a percentage. Quite simple.

A run queue is how many threads are waiting their turn to get into the CPU for processing. And, like anything, these change every nanosecond, so the number is an average over a period of time, like five minutes.

So one is about how much work the CPU has to do. The other is about how much software is trying to get the CPU's attention.

scottalanmiller

The rule of thumb is that load FACTOR below one is no problem. Load Factor is the load average divided by the number of thread engines that you have. If you have four thread engines are your machine, then your load is always fine below four. Above four might be fine too, as long as the CPU is overly taxed.

scottalanmiller

How do you get a high CPU % with no run queue? Easy, make a single process that counts from one to infinity... it will go as fast as the CPU can process it, forever, but will never need to load another process into the CPU. So that one thread can keep the CPU infinitly busy.

scottalanmiller

How do you get a deep run queue while the CPU is idle? You might have threads that are awaiting some other resource and cannot be loaded into the CPU yet, but have been placed in the queue. The CPU is available, but has no means of processing them yet, so they wait in the queue. So a deep queue can be okay, if the CPU is also relatively idle. This tells you that the queue depth is not caused by an overloaded CPU, but from something else.

scottalanmiller

Even a taxed CPU with a high CPU % number, and a deep queue doesn't prove that the CPU is overloaded, it might just be "efficiently utilized." It is likely, if your queue is extra high and the CPU is taxed that the queue is high because the CPU is taxed. Knowing your system baselines helps you to understand what is going on when this happens.

scottalanmiller

If you have a CPU that is at a high percentage, say 98%, and your queue depth is small, you are not overloaded, you are simply running applications at the "speed of the machine." Think of a taxi that goes at full speed between airport and hotel, every load has people in it, but it never has to leave anyone at the taxi stand for the next run. That's a high CPU %, each run has someone, no one left behind to wait.

If you have a CPU that is overloaded, this means that there are threads that are needed but can't get into the CPU because it is busy. This is like that taxi going at the same speed, but there are too many people and some of them have to be left behind because they don't fit into the taxi on the first run. If the people keep coming at the same pace, the taxi will just get more and more backed up. That's overloaded.

scottalanmiller

There are two ways to deal with overload (other than reducing how much the CPU has to work on.) One is to get a "faster" CPU. This is the same as raising the speed limit for our taxi. The taxi hauls the same car load each time, but at 75mph instead of at 65mph. Over the course of the day, it can pick up about 15% more people from the extra speed making each round trip that much faster.

Or you can increase the size of the taxi, maybe replacing that Honda Accord with a Dodge Caravan. Now each trip is still at 65mph, but you can haul eight people at a time instead of just four. Twice the people, same speed. This is like going from four cores to eight cores.

And, of course, we can do both at the same time.

Increasing the speed helps every passenger, every time by making the time in the taxi less. This lowers latency and even if you only get a single passenger every tenth trip, that one passenger benefits. But speed ups are often 5-10% tops, nothing huge.

Increasing the size of the load, increasing cores, only helps when you have more passengers than you could get in a single load previously, but often jumps by 25-100% increases.

scottalanmiller

@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?

EddieJennings

Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?

The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).

scottalanmiller

@eddiejennings said in top -- What is it telling us?:

Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?

The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).

The taxi goes every cycle without fail whether it has a load or not. The taxi never stops. Low CPU % means that there were no passengers to pick up so the taxi was running empty.

EddieJennings

@scottalanmiller said in top -- What is it telling us?:

@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?

By that logic 800%, which seems impossible; thus, my misunderstanding.

EddieJennings

@scottalanmiller said in top -- What is it telling us?:

@eddiejennings said in top -- What is it telling us?:

Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?

The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).

The taxi goes every cycle without fail whether it has a load or not. The taxi never stops. Low CPU % means that there were no passengers to pick up so the taxi was running empty.

Ah, I see.

scottalanmiller

@eddiejennings said in top -- What is it telling us?:

@scottalanmiller said in top -- What is it telling us?:

@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?

By that logic 800%, which seems impossible; thus, my misunderstanding.

That's why I felt it was odd that you panicked, given that it couldn't be over 100%, so no reason to assume that 8 was a bad number.

scottalanmiller

Even if your machine has a high CPU % and a high load, you still have to test running applications and ask "is it fast enough"? If you have a perfectly planned system, you might easily have a busy CPU and lots of load and no issues at all. Generally you want your CPU % to be high, otherwise it generally means that you didn't size your system correctly and bought something more expensive than you really needed.

DustinB3403

So I've got a VM here, with 1 vCPU and 2048 MB of ram.

Here is top of that system.

top - 15:22:30 up  2:52,  1 user,  load average: 0.00, 0.01, 0.05
Tasks: 102 total,   1 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1883884 total,   365780 free,   445932 used,  1072172 buff/cache
KiB Swap:   839676 total,   839676 free,        0 used.  1188012 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1336 mysql     20   0 1102624 113756   9976 S  0.3  6.0   0:05.04 mysqld
	1 root      20   0  128164   6820   4060 S  0.0  0.4   0:02.47 systemd
	2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
	3 root      20   0       0      0      0 S  0.0  0.0   0:00.12 ksoftirqd/0
	5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H

I truncated it for shortness. Based on that the CPU is just too fast for the workload. I can't possibly give a VM half a core. . .

scottalanmiller

@dustinb3403 said in top -- What is it telling us?:

So I've got a VM here, with 1 vCPU and 2048 MB of ram.

Here is top of that system.

top - 15:22:30 up 2:52, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 102 total, 1 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1883884 total, 365780 free, 445932 used, 1072172 buff/cache
KiB Swap: 839676 total, 839676 free, 0 used. 1188012 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
1336 mysql 20 0 1102624 113756 9976 S 0.3 6.0 0:05.04 mysqld
1 root 20 0 128164 6820 4060 S 0.0 0.4 0:02.47 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H

I truncated it for shortness. Based on that the CPU is just too fast for the workload. I can't possibly give a VM half a core. . .

It's wasted. If you had more control of the system, you could assign it less CPU. Hypervisors don't actually assign cores, that's a myth.

scottalanmiller

Hypervisors present "visible thread processors" which may or may not correlate to actual cores, or thread engines, under the hood. A key purpose of a hypervisor is to allow for a workload to receive less than a single core, or thread engine, of workload. The stnadard use case is for a VM to get far less than full cores or thread engines.

What the VM sees and what it is given are very different things. The hypervisor might only give 1/100th of a core, but tell the VM it has two.