Halloween Outage 2015
-
We just had a major outage this morning and are still investigating. We are unsure what happened but shortly before 7am the entire server become inaccessible. For a short time the Rackspace system was also inaccessible and once it was available again, console access to our server was not available.
We rebooted at 9:56 AM EDT and the system came up but MangoLassi did not. At 9:59 AM EDT the system rebooted on its own, perhaps from a power cycle on the RS side as they have been known to do without authorization, and at that time the site came up automatically.
-
Here is some quick SAR info for the record:
12:55:01 PM all 1.98 0.00 0.29 0.04 0.04 97.64 12:56:01 PM all 4.92 0.00 0.41 0.05 0.04 94.57 12:57:01 PM all 1.12 0.00 0.18 0.03 0.04 98.62 12:58:01 PM all 1.57 0.00 0.19 0.02 0.04 98.18 12:59:01 PM all 3.96 0.00 0.31 0.03 0.04 95.67 Average: all 5.36 0.00 0.46 0.08 0.05 94.05 01:56:46 PM LINUX RESTART (2 CPU) 01:57:01 PM CPU %user %nice %system %iowait %steal %idle 01:58:01 PM all 21.08 0.00 0.90 0.60 0.08 77.33 Average: all 21.08 0.00 0.90 0.60 0.08 77.33 01:59:47 PM LINUX RESTART (2 CPU) 02:00:01 PM CPU %user %nice %system %iowait %steal %idle 02:01:01 PM all 31.24 0.00 1.26 1.55 0.08 65.87 02:02:01 PM all 3.73 0.00 0.53 0.33 0.06 95.36 02:03:01 PM all 7.35 0.00 0.46 0.23 0.05 91.91 02:04:01 PM all 3.06 0.00 0.33 0.09 0.05 96.47 02:05:01 PM all 4.07 0.00 0.44 0.18 0.04 95.26
-
The restart at 1:56 PM UTC was caused by me doing a power cycle on the RS console. The one at 1:59 PM UTC was not authorized and we do not know yet the cause.
-
Here is what the outage looks like in views Sadly on a super busy day when we are pushing for a new site record.
-
Ah ha, we have confirmation, it was Rackspace. The underlying hardware failed and they had to move the workload. Nothing on our end, thank goodness. Hardware failure happens so this was relatively minor. Now the question becomes - are we so busy that we should be talking about high availability options for the site? Our outages are small, but there is a real possibility that with the continued growth that the risks to outages will get bigger and bigger. It could easily be time to consider a database cluster, multiple application servers and a load balancer!
-
Here is the RS confirmation:
-
OMG Rackspace must be out of business
-
I think it was a ghost.
-
@hubtechagain said:
OMG Rackspace must be out of business
Definitely their twitter hasn't had a post since Tuesday, and latest reply was 6 min ago. What kind of scam is this?