A Public Post Mortem of An Outage
-
$10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?
-
@Dashrender said in A Public Post Mortem of An Outage:
$10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?
System down time does not directly relate to a complete loss of revenue as most business try to claim. It more often is related to a slower revenue stream, which significantly expands the time that things can be down.
-
Of course I understand that. Each business is different.
When we self hosted our EHR, if it was down for a day, we could literally cancel clinics until it was fixed. While many of those patients would be rescheduled, we'd be paying staff to be onsite cleaning up, etc, and those costs add up against the no income stream we would be having.
-
@Dashrender said in A Public Post Mortem of An Outage:
$10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?
It was more than $10K is losses, $10K is how much cheaper it was to take the loss rather than to pay to mitigate it.
Most SMBs can have their servers down for a bit without major impact. AD, for example, will have near zero impact on a normal business because of cached creds.
-
@JaredBusch said in A Public Post Mortem of An Outage:
@Dashrender said in A Public Post Mortem of An Outage:
$10K isn't exactly chump change, but it's not really a lot either. 3 day outage and it didn't equate to a 10K in losses? What was on that server?
System down time does not directly relate to a complete loss of revenue as most business try to claim. It more often is related to a slower revenue stream, which significantly expands the time that things can be down.
Exactly. They didn't have email on there, nor phones, so their communications didn't go down. And AD was cached, so not impacted except for users migrating from one desktop to another which they don't do (or very rarely) and all of their applications work offline. They were certainly impacted, but it definitely didn't bring the business to its knees, either.
-
@Dashrender said in A Public Post Mortem of An Outage:
When we self hosted our EHR, if it was down for a day, we could literally cancel clinics until it was fixed. While many of those patients would be rescheduled, we'd be paying staff to be onsite cleaning up, etc, and those costs add up against the no income stream we would be having.
A major factor for a lot of businesses is the rubber band effect of work - only companies running with their production "backs against the wall" can't experience it. What happens is that the staff gets time to "rest" while nothing is happening. They might take time off, just have a "lazy day" or catch up on the other things... cleaning the office, rearranging the furniture, physical filing, whatever. The chances that it would have zero value are very low, almost impossible. Then, when the systems return, they are better prepared to work more intensely and can often catch up either partially or fully. Rarely do you have a total productivity loss and rarely a total recovery, but normally somewhere in between as you tend to work faster and take on work more productively.
Since a normal business isn't doing as much work as it could possibly do (only those that don't do sales and have no ability to take on new customers without more resources) they can normally catch up to some degree. This doesn't work for businesses like a 911 call center, of course. But a typical business can, at least partially.
-
Good topic.
One question, how long was the outage?
-
@DustinB3403 said in A Public Post Mortem of An Outage:
One question, how long was the outage?
Nearly a week.
-
@scottalanmiller said in A Public Post Mortem of An Outage:
@DustinB3403 said in A Public Post Mortem of An Outage:
One question, how long was the outage?
Nearly a week.
Wow, that is a rather long time.
-
@DustinB3403 said in A Public Post Mortem of An Outage:
Wow, that is a rather long time.
Yup, parts were very hard to get and getting the server physically moved before diagnostics could begin ate huge amounts of time up. Cost of speeding things up would have been huge - replacing gear instead of repairing it. But since the vendor could not diagnose the issue with the hardware (their error messages were ones that they did not have documented) it complicated things greatly.