top -- What is it telling us?
-
While I think I have an idea of what load and %cpu mean, methinks I'm mistaken. So hopefully this thread will help not only me, but others who might have similar misunderstanding.
load average: 0.55, 0.52, 0.58
Since is a single vCPU Vultr VM,
top
is telling me that over the last 5 minutes 55% of my CPU's capacity was being used. Over the last 10 minutes 52% of capacity was used and the last 15 minutes 58% of my CPU's capacity was being used.PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5904 asterisk 20 0 1632188 61960 6052 S 0.0 6.1 0:17.64 asterisk
Here
top
is telling me that the asterisk process with PID 5904 is taking 0% of my CPU's resources and 6.1% of my RAM at that moment in time.At one point today, my FreePBX test had load averages of 8.0. I panicked and restarted (have since learned the error of my ways), but I did notice upon running
top
before the restart that glancing down the %CPU column showed no set of values that whose sum would've been close to 100%, which based on my likely incorrect understanding seems odd.For reference, here is the result of
top -b -n 1
sent to a text file.top - 14:44:42 up 30 min, 1 user, load average: 0.55, 0.52, 0.58 Tasks: 110 total, 2 running, 108 sleeping, 0 stopped, 0 zombie %Cpu(s): 9.0 us, 3.3 sy, 0.0 ni, 87.3 id, 0.2 wa, 0.0 hi, 0.0 si, 0.2 st KiB Mem : 1016380 total, 156324 free, 578004 used, 282052 buff/cache KiB Swap: 2097148 total, 2068624 free, 28524 used. 211464 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 43400 3112 2016 S 0.0 0.3 0:01.89 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.07 ksoftirqd/0 6 root 20 0 0 0 0 S 0.0 0.0 0:00.06 kworker/u2:0 7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root 20 0 0 0 0 S 0.0 0.0 0:00.89 rcu_sched 10 root rt 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0 12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs 13 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khungtaskd 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback 16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kintegrityd 17 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd 19 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 md 25 root 20 0 0 0 0 S 0.0 0.0 0:00.45 kswapd0 26 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 27 root 39 19 0 0 0 S 0.0 0.0 0:00.34 khugepaged 28 root 20 0 0 0 0 S 0.0 0.0 0:00.00 fsnotify_mark 29 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto 37 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld 39 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kmpath_rdacd 40 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 41 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ipv6_addrconf 60 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 deferwq 95 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kauditd 246 root 20 0 0 0 0 S 0.0 0.0 0:00.22 kworker/0:3 265 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u2:2 266 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff 282 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 283 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 scsi_tmf_0 284 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 285 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 scsi_tmf_1 288 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ttm_swap 360 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kdmflush 361 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 371 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kdmflush 372 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 386 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfsalloc 387 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs_mru_cache 388 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-buf/dm-0 389 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-data/dm-0 390 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-conv/dm-0 391 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-cil/dm-0 392 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-reclaim/dm- 393 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-log/dm-0 394 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 xfs-eofblocks/d 395 root 20 0 0 0 0 S 0.0 0.0 0:00.73 xfsaild/dm-0 471 root 20 0 36116 4036 3800 S 0.0 0.4 0:00.67 systemd-journal 486 root 20 0 200816 7168 836 S 0.0 0.7 0:00.01 lvmetad 501 root 20 0 43612 1200 1000 S 0.0 0.1 0:00.06 systemd-udevd 519 root 20 0 0 0 0 S 0.0 0.0 0:00.02 hwrng 520 root 20 0 0 0 0 S 0.0 0.0 0:00.00 vballoon 582 root 20 0 0 0 0 S 0.0 0.0 0:00.00 jbd2/vda1-8 583 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ext4-rsv-conver 597 root 16 -4 55412 1304 1236 S 0.0 0.1 0:00.06 auditd 617 root 20 0 217936 3976 3724 S 0.0 0.4 0:00.36 rsyslogd 622 root 20 0 24188 1548 1296 S 0.0 0.2 0:00.25 systemd-logind 625 root 20 0 27100 596 596 S 0.0 0.1 0:00.00 xinetd 627 avahi 20 0 27976 976 908 S 0.0 0.1 0:00.03 avahi-daemon 629 root 20 0 14988 864 828 S 0.0 0.1 0:00.00 incrond 637 avahi 20 0 27976 24 0 S 0.0 0.0 0:00.00 avahi-daemon 638 root 20 0 6964 232 188 S 0.0 0.0 0:00.00 mdadm 639 nobody 20 0 15540 872 868 S 0.0 0.1 0:00.00 dnsmasq 642 root 20 0 541040 22900 14524 S 0.0 2.3 0:00.33 httpd 643 polkitd 20 0 527508 4404 2112 S 0.0 0.4 0:00.18 polkitd 650 dbus 20 0 24544 1364 1136 S 0.0 0.1 0:00.52 dbus-daemon 655 chrony 20 0 115844 1176 1000 S 0.0 0.1 0:00.03 chronyd 670 root 20 0 553148 11584 2508 S 0.0 1.1 0:00.25 tuned 671 root 20 0 105476 2540 2460 S 0.0 0.2 0:00.01 sshd 698 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cfg80211 717 root 20 0 25840 916 716 S 0.0 0.1 0:00.00 atd 724 root 20 0 126252 1644 964 S 0.0 0.2 0:00.06 crond 737 root 20 0 110032 812 688 S 0.0 0.1 0:00.00 agetty 744 mysql 20 0 113252 1560 1280 S 0.0 0.2 0:00.02 mysqld_safe 748 mongodb 20 0 482916 15728 2072 R 0.0 1.5 0:04.99 mongod 782 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 dio/dm-0 992 asterisk 20 0 739892 28808 12044 S 0.0 2.8 0:00.29 httpd 994 asterisk 20 0 744500 48900 29736 S 0.0 4.8 0:03.28 httpd 997 asterisk 20 0 744936 45172 23484 S 0.0 4.4 0:00.98 httpd 1003 asterisk 20 0 658320 46432 26168 S 0.0 4.6 0:01.47 httpd 1018 mysql 20 0 1172020 138964 5032 S 0.0 13.7 0:14.98 mysqld 1144 root 20 0 89032 1952 944 S 0.0 0.2 0:00.01 master 1151 postfix 20 0 89136 3508 2508 S 0.0 0.3 0:00.00 pickup 1152 postfix 20 0 89204 3500 2492 S 0.0 0.3 0:00.00 qmgr 1211 root 20 0 112876 12608 124 S 0.0 1.2 0:00.00 dhclient 1215 asterisk 20 0 195328 6384 2236 S 0.0 0.6 0:00.04 pnp_server 5881 root 20 0 185668 6808 2024 S 0.0 0.7 0:00.09 fail2ban-server 5888 root 20 0 394884 18704 10032 S 0.0 1.8 0:01.64 php 5898 root 20 0 115240 816 568 S 0.0 0.1 0:00.00 safe_asterisk 5904 asterisk 20 0 1632188 61960 6052 S 0.0 6.1 0:17.64 asterisk 6939 asterisk 20 0 922860 23244 6524 S 0.0 2.3 0:02.23 PM2 v2.7.1: God 7782 asterisk 20 0 414292 40116 10536 S 0.0 3.9 0:00.30 php 8639 asterisk 20 0 1082328 31412 6604 S 0.0 3.1 0:02.38 node /var/www/h 8739 asterisk 20 0 647284 20756 9604 S 0.0 2.0 0:00.51 httpd 8773 asterisk 20 0 656384 41708 21416 S 0.0 4.1 0:00.63 httpd 8774 asterisk 20 0 649608 22596 9024 S 0.0 2.2 0:03.78 httpd 9228 asterisk 20 0 1273280 73748 6740 S 0.0 7.3 0:08.26 letschat 9330 asterisk 20 0 649604 23316 9756 S 0.0 2.3 0:03.90 httpd 9389 asterisk 20 0 656468 43516 23380 S 0.0 4.3 0:01.69 httpd 15485 asterisk 20 0 649120 22596 9540 S 0.0 2.2 0:00.14 httpd 15879 root 20 0 0 0 0 S 0.0 0.0 0:01.12 kworker/0:0 16191 root 20 0 148384 5872 4528 S 0.0 0.6 0:00.31 sshd 16193 root 20 0 115380 2060 1676 S 0.0 0.2 0:00.03 bash 17097 root 0 -20 0 0 0 S 0.0 0.0 0:00.02 kworker/0:2H 17817 root 20 0 0 0 0 S 0.0 0.0 0:00.66 kworker/0:1 18071 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:1H 18953 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 19124 root 20 0 157568 2036 1500 R 0.0 0.2 0:00.00 top
-
Paging @scottalanmiller
-
Load is unrelated to CPU utilization. Load has to do with the run queue depth. A run queue can be very deep while the processor might be idle. And the CPU could be extremely busy without there being a run queue. Two very different aspects of your processing.
-
CPU % is a reference to how much work your CPU is doing. It takes an amount of time, generally about one second, and sees how many cycles within that second it had productive work to do and how many it was just awaiting something to do and gives you a percentage. Quite simple.
A run queue is how many threads are waiting their turn to get into the CPU for processing. And, like anything, these change every nanosecond, so the number is an average over a period of time, like five minutes.
So one is about how much work the CPU has to do. The other is about how much software is trying to get the CPU's attention.
-
The rule of thumb is that load FACTOR below one is no problem. Load Factor is the load average divided by the number of thread engines that you have. If you have four thread engines are your machine, then your load is always fine below four. Above four might be fine too, as long as the CPU is overly taxed.
-
How do you get a high CPU % with no run queue? Easy, make a single process that counts from one to infinity... it will go as fast as the CPU can process it, forever, but will never need to load another process into the CPU. So that one thread can keep the CPU infinitly busy.
-
How do you get a deep run queue while the CPU is idle? You might have threads that are awaiting some other resource and cannot be loaded into the CPU yet, but have been placed in the queue. The CPU is available, but has no means of processing them yet, so they wait in the queue. So a deep queue can be okay, if the CPU is also relatively idle. This tells you that the queue depth is not caused by an overloaded CPU, but from something else.
-
Even a taxed CPU with a high CPU % number, and a deep queue doesn't prove that the CPU is overloaded, it might just be "efficiently utilized." It is likely, if your queue is extra high and the CPU is taxed that the queue is high because the CPU is taxed. Knowing your system baselines helps you to understand what is going on when this happens.
-
If you have a CPU that is at a high percentage, say 98%, and your queue depth is small, you are not overloaded, you are simply running applications at the "speed of the machine." Think of a taxi that goes at full speed between airport and hotel, every load has people in it, but it never has to leave anyone at the taxi stand for the next run. That's a high CPU %, each run has someone, no one left behind to wait.
If you have a CPU that is overloaded, this means that there are threads that are needed but can't get into the CPU because it is busy. This is like that taxi going at the same speed, but there are too many people and some of them have to be left behind because they don't fit into the taxi on the first run. If the people keep coming at the same pace, the taxi will just get more and more backed up. That's overloaded.
-
There are two ways to deal with overload (other than reducing how much the CPU has to work on.) One is to get a "faster" CPU. This is the same as raising the speed limit for our taxi. The taxi hauls the same car load each time, but at 75mph instead of at 65mph. Over the course of the day, it can pick up about 15% more people from the extra speed making each round trip that much faster.
Or you can increase the size of the taxi, maybe replacing that Honda Accord with a Dodge Caravan. Now each trip is still at 65mph, but you can haul eight people at a time instead of just four. Twice the people, same speed. This is like going from four cores to eight cores.
And, of course, we can do both at the same time.
Increasing the speed helps every passenger, every time by making the time in the taxi less. This lowers latency and even if you only get a single passenger every tenth trip, that one passenger benefits. But speed ups are often 5-10% tops, nothing huge.
Increasing the size of the load, increasing cores, only helps when you have more passengers than you could get in a single load previously, but often jumps by 25-100% increases.
-
@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?
-
Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?
The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).
-
@eddiejennings said in top -- What is it telling us?:
Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?
The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).
The taxi goes every cycle without fail whether it has a load or not. The taxi never stops. Low CPU % means that there were no passengers to pick up so the taxi was running empty.
-
@scottalanmiller said in top -- What is it telling us?:
@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?
By that logic 800%, which seems impossible; thus, my misunderstanding.
-
@scottalanmiller said in top -- What is it telling us?:
@eddiejennings said in top -- What is it telling us?:
Great, useful information :D. Could this be a correct analogy for the deep run queue but low CPU percentage?
The taxi isn't moving (0% CPU) because there's some barrier preventing people from loading into the taxi (and this queue of people gets longer and longer).
The taxi goes every cycle without fail whether it has a load or not. The taxi never stops. Low CPU % means that there were no passengers to pick up so the taxi was running empty.
Ah, I see.
-
@eddiejennings said in top -- What is it telling us?:
@scottalanmiller said in top -- What is it telling us?:
@EddieJennings if you thought that a load of .55 meant 55%, what did you think that 8.0 meant?
By that logic 800%, which seems impossible; thus, my misunderstanding.
That's why I felt it was odd that you panicked, given that it couldn't be over 100%, so no reason to assume that 8 was a bad number.
-
Even if your machine has a high CPU % and a high load, you still have to test running applications and ask "is it fast enough"? If you have a perfectly planned system, you might easily have a busy CPU and lots of load and no issues at all. Generally you want your CPU % to be high, otherwise it generally means that you didn't size your system correctly and bought something more expensive than you really needed.
-
So I've got a VM here, with 1 vCPU and 2048 MB of ram.
Here is top of that system.
top - 15:22:30 up 2:52, 1 user, load average: 0.00, 0.01, 0.05 Tasks: 102 total, 1 running, 101 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 1883884 total, 365780 free, 445932 used, 1072172 buff/cache KiB Swap: 839676 total, 839676 free, 0 used. 1188012 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1336 mysql 20 0 1102624 113756 9976 S 0.3 6.0 0:05.04 mysqld 1 root 20 0 128164 6820 4060 S 0.0 0.4 0:02.47 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
I truncated it for shortness. Based on that the CPU is just too fast for the workload. I can't possibly give a VM half a core. . .
-
@dustinb3403 said in top -- What is it telling us?:
So I've got a VM here, with 1 vCPU and 2048 MB of ram.
Here is top of that system.
top - 15:22:30 up 2:52, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 102 total, 1 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1883884 total, 365780 free, 445932 used, 1072172 buff/cache
KiB Swap: 839676 total, 839676 free, 0 used. 1188012 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1336 mysql 20 0 1102624 113756 9976 S 0.3 6.0 0:05.04 mysqld
1 root 20 0 128164 6820 4060 S 0.0 0.4 0:02.47 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0HI truncated it for shortness. Based on that the CPU is just too fast for the workload. I can't possibly give a VM half a core. . .
It's wasted. If you had more control of the system, you could assign it less CPU. Hypervisors don't actually assign cores, that's a myth.
-
Hypervisors present "visible thread processors" which may or may not correlate to actual cores, or thread engines, under the hood. A key purpose of a hypervisor is to allow for a workload to receive less than a single core, or thread engine, of workload. The stnadard use case is for a VM to get far less than full cores or thread engines.
What the VM sees and what it is given are very different things. The hypervisor might only give 1/100th of a core, but tell the VM it has two.