ScreenConnect High CPU Usage
-
We have a ScreenConnect instance on CentOS 7.2 on Digital Ocean that has been running well for quite some time. We've seen some large process spikes (in the mono process on which SC depends) after a reboot this morning. I was not part of the reboot or an update that was done by @gjacobse so I will let him fill in some historical info for this. Here is what I see on SAR. Before the reboot, we never see less than about 95% idle.
09:46:39 AM LINUX RESTART 09:50:02 AM CPU %user %nice %system %iowait %steal %idle 10:00:01 AM all 87.50 0.00 1.25 0.22 2.14 8.89 10:10:07 AM all 89.69 0.00 3.23 0.73 2.74 3.61 10:20:01 AM all 86.53 0.00 3.85 0.71 2.17 6.74 10:30:02 AM all 24.12 0.00 0.50 0.06 0.89 74.43 10:40:01 AM all 46.27 0.00 0.65 0.17 1.37 51.54 10:50:01 AM all 7.89 0.00 0.33 0.07 0.54 91.17 11:00:01 AM all 5.37 0.00 0.47 0.20 0.83 93.13 11:10:01 AM all 6.00 0.00 0.36 0.09 0.74 92.81 11:20:01 AM all 37.83 0.00 0.72 0.33 1.33 59.78 11:30:02 AM all 27.93 0.00 1.37 0.24 0.96 69.49
Memory is fine, plenty of space there. Same with the disks.
-
And here is the output of top. As you can see, mono is running super hard, nothing else on the box is doing any work at all.
top - 11:42:15 up 1:55, 1 user, load average: 0.42, 0.92, 1.23 Tasks: 78 total, 2 running, 76 sleeping, 0 stopped, 0 zombie %Cpu0 : 42.2 us, 0.7 sy, 0.0 ni, 56.1 id, 0.0 wa, 0.0 hi, 0.0 si, 1.0 st KiB Mem : 1016920 total, 88500 free, 567288 used, 361132 buff/cache KiB Swap: 0 total, 0 free, 0 used. 313664 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9663 root 20 0 1216128 518188 18096 S 44.9 51.0 7:58.40 mono 878 nginx 20 0 123036 3896 1008 S 0.3 0.4 0:03.33 nginx 8582 root 20 0 0 0 0 S 0.3 0.0 0:00.52 kworker/0:2 10359 root 20 0 157564 2144 1512 R 0.3 0.2 0:00.03 top
-
Yeah I reported an issue with my SC session hosted by NTG last week. I noticed that it was back to the slowness I had a few months ago. Between SW and my vacation, I haven't reached out to him yet this week.
-
@Dashrender said in ScreenConnect High CPU Usage:
Yeah I reported an issue with my SC session hosted by NTG last week. I noticed that it was back to the slowness I had a few months ago. Between SW and my vacation, I haven't reached out to him yet this week.
We are trying to determine if the CPU spike is part of the update that he ran this morning and that it took a bit to run and is calming down now or if it is something else. The ongoing performance issues, we think, are the DDoS attack on the Internet messing with traffic as nothing else has changed.
-
OIC. What I don't know is - is my SC session running on that same VM, or it's own? It's not suppose to be part of the main NTG group of SC users.
-
@Dashrender said in ScreenConnect High CPU Usage:
OIC. What I don't know is - is my SC session running on that same VM, or it's own? It's not suppose to be part of the main NTG group of SC users.
You have your own SC system, this is ours that I'm looking at. This isnt a thread about you
-
@Dashrender said in ScreenConnect High CPU Usage:
It's not suppose to be part of the main NTG group of SC users.
I don't understand. Are you seeing NTG users on your system?
-
Okay, the CPU has dropped on its own. Maybe this was an update process that was running.
11:40:01 AM all 55.06 0.00 1.17 0.23 1.41 42.13 11:50:01 AM all 9.44 0.00 0.26 0.10 0.47 89.73
Keeping an eye on it now to make sure that it does not spike back up on its own.
-
There is a bit of writing going on from the SC process. Something major is going on. Maybe it is some sort of database update process?
VIRT RES CPU% MEM% PID USER NI S TIME+ IOR/s IOW/s NAME 1.2G 501M 12.6 50.5 9663 root 0 S 8:44.64 0 121K mono
121K Write IOPS isn't trivial.
-
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
OIC. What I don't know is - is my SC session running on that same VM, or it's own? It's not suppose to be part of the main NTG group of SC users.
You have your own SC system, this is ours that I'm looking at. This isnt a thread about you
Good to have that confirmed, but it does seem like a semi related problem - they are both having performance issues.
-
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
It's not suppose to be part of the main NTG group of SC users.
I don't understand. Are you seeing NTG users on your system?
Well yes frankly, I know Gene has an account, and I'm fine with that. But that wasn't my point. I was under an impression, but hadn't had it confirmed that my SC was in fact on it's won VM - are you confirming that now?
-
@Dashrender said in ScreenConnect High CPU Usage:
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
OIC. What I don't know is - is my SC session running on that same VM, or it's own? It's not suppose to be part of the main NTG group of SC users.
You have your own SC system, this is ours that I'm looking at. This isnt a thread about you
Good to have that confirmed, but it does seem like a semi related problem - they are both having performance issues.
I don't believe that yours was updated, though. This appears to be an issue with the update.
Everything on the Internet is having performance problems right now, so that the two share that during a national Internet outage isn't too telling.
-
@Dashrender said in ScreenConnect High CPU Usage:
Well yes frankly, I know Gene has an account, and I'm fine with that. But that wasn't my point. I was under an impression, but hadn't had it confirmed that my SC was in fact on it's won VM - are you confirming that now?
Yes, you have a totally unique, independent SC system.
-
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
Well yes frankly, I know Gene has an account, and I'm fine with that. But that wasn't my point. I was under an impression, but hadn't had it confirmed that my SC was in fact on it's won VM - are you confirming that now?
Yes, you have a totally unique, independent SC system.
Thanks.
-
I'm an idiot, should have grabbed the block device report straight away. Here is the disk activity:
09:46:39 AM LINUX RESTART 09:50:02 AM tps rtps wtps bread/s bwrtn/s 10:00:01 AM 10.39 0.05 10.34 0.73 297.16 10:10:07 AM 95.03 85.88 9.15 16396.11 276.51 10:20:01 AM 155.68 146.07 9.61 28524.76 260.43 10:30:02 AM 2.29 0.07 2.22 1.03 50.41 10:40:01 AM 5.55 0.00 5.55 0.00 162.94 10:50:01 AM 2.00 0.04 1.96 7.22 42.68 11:00:01 AM 5.77 0.00 5.77 0.00 158.67 11:10:01 AM 3.17 1.45 1.72 19.73 41.24 11:20:01 AM 6.14 0.03 6.11 0.38 167.57 11:30:02 AM 25.66 24.06 1.61 1020.60 48.93 11:40:01 AM 16.00 5.80 10.20 350.94 265.18 11:50:01 AM 1.43 0.07 1.36 20.54 31.69 12:00:01 PM 6.53 1.02 5.51 18.00 150.42 Average: 24.83 19.44 5.39 3387.89 147.87
That's some crazy load even for a RAID 10 SSD array. No wonder it is slowing down. Something major is going to disk.
-
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
@scottalanmiller said in ScreenConnect High CPU Usage:
@Dashrender said in ScreenConnect High CPU Usage:
OIC. What I don't know is - is my SC session running on that same VM, or it's own? It's not suppose to be part of the main NTG group of SC users.
You have your own SC system, this is ours that I'm looking at. This isnt a thread about you
Good to have that confirmed, but it does seem like a semi related problem - they are both having performance issues.
I don't believe that yours was updated, though. This appears to be an issue with the update.
Everything on the Internet is having performance problems right now, so that the two share that during a national Internet outage isn't too telling.
Is the slowness coming from latency introduced by the attacks? Is that why you say they could be related?
-
@Dashrender said in ScreenConnect High CPU Usage:
Is the slowness coming from latency introduced by the attacks? Is that why you say they could be related?
That's our guess. No other changes, nothing visible on the system.
The system here had updates run, twice, and an obvious and immediate system impact after the updates. And looking at the reports, the updates were immediately followed by massive disk activity. So the guess is that the system is running a database compression process or something and that that is using some massive amount of disk IO.
-
And loads of disk IO leads to IOWaits.
-
Since @Dashrender brought it up, for comparison this is his ScreenConnect instance during the same window:
07:20:01 AM CPU %user %nice %system %iowait %steal %idle 07:30:01 AM all 0.68 0.00 0.16 0.03 0.08 99.06 07:40:01 AM all 0.36 0.00 0.14 0.03 0.06 99.41 07:50:01 AM all 0.60 0.00 0.20 0.04 0.09 99.08 08:00:01 AM all 0.44 0.00 0.18 0.03 0.07 99.29 08:10:01 AM all 0.80 0.00 0.23 0.03 0.09 98.85 08:20:01 AM all 0.47 0.00 0.15 0.04 0.07 99.27 08:30:01 AM all 0.74 0.00 0.22 0.04 0.10 98.90 08:40:01 AM all 1.21 0.00 0.21 0.04 0.11 98.44 08:50:01 AM all 2.02 0.00 0.30 0.05 0.23 97.40 09:00:01 AM all 0.70 0.00 0.17 0.04 0.10 98.99 09:10:01 AM all 1.31 0.00 0.34 0.06 0.12 98.18 09:20:01 AM all 1.30 0.00 0.21 0.04 0.11 98.33 09:30:02 AM all 2.37 0.00 0.38 0.06 0.27 96.92 09:40:01 AM all 1.16 0.00 0.24 0.03 0.15 98.41 09:50:01 AM all 1.11 0.00 0.22 0.04 0.12 98.51 10:00:01 AM all 0.67 0.00 0.19 0.03 0.08 99.04 10:10:02 AM all 1.29 0.00 0.30 0.05 0.08 98.29 10:20:01 AM all 0.66 0.00 0.17 0.03 0.06 99.08 10:30:01 AM all 1.57 0.00 0.57 0.05 0.13 97.68 10:40:01 AM all 1.12 0.00 0.57 0.05 0.13 98.13 10:50:01 AM all 1.48 0.00 0.58 0.07 0.16 97.72 11:00:01 AM all 1.00 0.00 0.34 0.04 0.11 98.51 11:10:01 AM all 1.25 0.00 0.30 0.05 0.10 98.31 11:20:01 AM all 0.88 0.00 0.20 0.04 0.08 98.80 11:30:01 AM all 1.10 0.00 0.19 0.04 0.10 98.57 11:40:01 AM all 0.70 0.00 0.17 0.04 0.10 99.00 11:50:01 AM all 1.04 0.00 0.24 0.06 0.11 98.55 12:00:01 PM all 0.70 0.00 0.20 0.04 0.09 98.98 Average: all 0.68 0.01 0.21 0.03 0.07 99.00
And here are the disks:
07:20:01 AM tps rtps wtps bread/s bwrtn/s 07:30:01 AM 1.01 0.00 1.01 0.00 22.10 07:40:01 AM 0.75 0.00 0.75 0.00 13.15 07:50:01 AM 1.06 0.00 1.06 0.00 22.80 08:00:01 AM 0.84 0.00 0.84 0.00 15.65 08:10:01 AM 1.32 0.00 1.32 0.00 25.53 08:20:01 AM 0.86 0.00 0.86 0.00 16.36 08:30:01 AM 1.11 0.00 1.11 0.00 25.73 08:40:01 AM 1.09 0.01 1.08 0.08 22.73 08:50:01 AM 1.18 0.00 1.18 0.00 28.18 09:00:01 AM 1.14 0.00 1.14 0.03 20.68 09:10:01 AM 1.66 0.31 1.35 13.68 31.35 09:20:01 AM 0.99 0.01 0.98 0.05 20.73 09:30:02 AM 1.22 0.01 1.21 0.55 28.42 09:40:01 AM 0.93 0.00 0.93 0.03 19.95 09:50:01 AM 1.43 0.00 1.43 0.00 31.65 10:00:01 AM 0.96 0.00 0.96 0.00 20.60 10:10:02 AM 1.42 0.16 1.26 6.77 30.39 10:20:01 AM 1.25 0.09 1.16 4.04 22.91 10:30:01 AM 1.80 0.00 1.79 0.09 47.44 10:40:01 AM 1.64 0.00 1.64 0.00 39.45 10:50:01 AM 1.77 0.00 1.77 0.00 46.79 11:00:01 AM 1.38 0.00 1.38 0.00 30.81 11:10:01 AM 1.32 0.06 1.26 10.12 29.75 11:20:01 AM 1.05 0.00 1.05 0.00 22.53 11:30:01 AM 1.17 0.00 1.17 0.00 28.66 11:40:01 AM 1.13 0.00 1.13 0.00 22.02 11:50:01 AM 1.19 0.00 1.19 0.00 28.63 12:00:01 PM 1.00 0.00 1.00 0.00 20.82 Average: 1.05 0.06 0.99 5.29 20.66
As you can see, totally different performance. But same OS, host, VM configuration, etc.
-
Both systems are on a similar system update and reboot schedule, only hours off from each other. Both are at identical patches right now:
# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core)