@NetworkNerd
1.
Identify the Cause of High Disk IO and CPU Wait
MariaDB Activity: Since mariadb is showing high IO during the problematic window, it's crucial to identify the queries causing this load. You can enable the slow query log in MariaDB to capture queries that are taking an unusually long time to execute.
Scheduled Tasks: Check for any scheduled tasks (cron jobs) on the server that run around 5 AM CST. These could be system tasks, WordPress cron jobs, or database maintenance tasks.
2.
Systemd-journald Failure
The failure of systemd-journal-flush.service suggests that the journaling system is overwhelmed, likely due to the high IO load. Investigate the journal logs (journalctl) for any errors or warnings that occur around this time.
3.
Review WordPress Plugins and Activities
Plugin Behavior: Even though plugins like Updraft Plus are scheduled for different times, they might be triggering background tasks. Verify plugin behavior and logs.
WordPress Cron: WordPress has its own cron system (wp-cron.php) that can sometimes trigger resource-intensive tasks. Review the WordPress cron events.
4.
Server and Database Optimization
Database Optimization: Run a check and optimization task on your MariaDB database. Over time, databases can become inefficient and slow.
Upgrade Resources: An e2-micro instance is quite limited in resources. If this issue is related to resource constraints, consider upgrading the VM instance type.
5.
Monitoring and Logs
Enable Enhanced Monitoring: Tools like sar, iotop, or atop can provide in-depth system metrics. Make sure they are configured correctly.
Access and Error Logs: Review NGINX, PHP-FPM, and MariaDB logs for any anomalies during the problematic time frame.
6.
External Factors
Traffic Spikes: Although Jetpack stats show low traffic, consider checking the access logs for unexpected traffic spikes, which might be bots or crawlers.
Network Analysis: Use tools to monitor network activity. Unexpected external connections might be contributing to the load.
7.
Testing and Isolation
Isolate Components: Temporarily disable certain components or plugins during the problem window to see if the issue persists.
Test in a Staging Environment: If possible, replicate the setup in a staging environment to test without affecting the live site.