Burned by Eschewing Best Practices
-
@Dashrender said:
Great question - I'm guessing that MS doesn't tell us the lowly users where the actual fault is, therefore the only thing we can say is... it's an Exchange outage if we can't use the service.
That's the issue, though, do we define it as "when Exchange is down"? Or "when the vendor encapsulated service is unavailable to everyone?" Or "when the vendor service is unavailable to an account, region, country, product?" Or "when it is down in a specific place?"
To me, you can call it an Office 365 outage when the customer does not have an option of a workaround. Which is why our Azure outage, to me, was as clear as any outage can be to being an outage. The fault was Azure, Azure was unavailable to people with whom there was on possible workaround. Only the Azure support team had any ability to bring it back up.
-
@Dashrender said:
Same goes if the outage is caused by Azure... I'm happy to put the blame where it belongs.. but rarely do cloud service providers tell us that information.
But we know when we can work around or when we cannot. At least generally speaking.
-
@scottalanmiller said:
@Dashrender said:
I did this morning ask the question, why not just go to one single super awesome box... the answer 'HA.'
Pretty sure that HA was listed as a requirement. We definitely covered what I said above several times in stating that the NAS was not delivering any benefit and not meeting goals which is all it takes to get all that is stated above. Once the NAS is known to not have delivered HA, we know that all of the money around that was wasted.
HA was listed by the OP, not necessarily by his boss. I say this because the OP indicated that his boss gets pissy every time there is an outage. Of course we can't just go guessing what's really needed, we do have to take some things at face value until we are shown that they were wrong, then change direction based on new information.
-
@scottalanmiller said:
@Dashrender said:
Same goes if the outage is caused by Azure... I'm happy to put the blame where it belongs.. but rarely do cloud service providers tell us that information.
But we know when we can work around or when we cannot. At least generally speaking.
In a cloud solution like O365, you do?
-
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
Same goes if the outage is caused by Azure... I'm happy to put the blame where it belongs.. but rarely do cloud service providers tell us that information.
But we know when we can work around or when we cannot. At least generally speaking.
In a cloud solution like O365, you do?
Generally. While it is possible to not be the case, can you think of any circumstance where a cloud provider is not down but you cannot reach them to verify while also not knowing that you have a more significant outage?
-
You lost me again.
I think you're blending Cloud provider outages with my local onsite outages.
If O365 is having an outage - other than I can't access their services, it has nothing to do with my local services.
-
Perhaps you saying... I'm in the local helpdesk and I get a call - hey Exchange isn't responding.. is it down?
I tell them I have to look into it and get back to them.
I dig around and discover that my local ISP is down.
When I call the user back I'm not going to say, oh yeah. Exchange is down.. oh and yeah, everything else on the internet is also...
nah - instead I'm going to say - hey our connection to the internet is down. Anything we access through online services is inaccessible because of that.
-
@Dashrender said:
If O365 is having an outage - other than I can't access their services, it has nothing to do with my local services.
right, but you asked if I knew that or not. And I do, in nearly all cases.
-
@Dashrender said:
Perhaps you saying... I'm in the local helpdesk and I get a call - hey Exchange isn't responding.. is it down?
I tell them I have to look into it and get back to them.
I dig around and discover that my local ISP is down.
When I call the user back I'm not going to say, oh yeah. Exchange is down.. oh and yeah, everything else on the internet is also...
nah - instead I'm going to say - hey our connection to the internet is down. Anything we access through online services is inaccessible because of that.
Exactly. In a case like that you would know that you have no reason to suspect that Exchange is down. You can also hop on your phone and determine, almost certainly, that it is up.
-
@scottalanmiller said:
@Dashrender said:
Perhaps you saying... I'm in the local helpdesk and I get a call - hey Exchange isn't responding.. is it down?
I tell them I have to look into it and get back to them.
I dig around and discover that my local ISP is down.
When I call the user back I'm not going to say, oh yeah. Exchange is down.. oh and yeah, everything else on the internet is also...
nah - instead I'm going to say - hey our connection to the internet is down. Anything we access through online services is inaccessible because of that.
Exactly. In a case like that you would know that you have no reason to suspect that Exchange is down. You can also hop on your phone and determine, almost certainly, that it is up.
But why does this matter? No one is blaming Exchange for being down in this case. Can you restate the question you think I'm asking?
-
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
Perhaps you saying... I'm in the local helpdesk and I get a call - hey Exchange isn't responding.. is it down?
I tell them I have to look into it and get back to them.
I dig around and discover that my local ISP is down.
When I call the user back I'm not going to say, oh yeah. Exchange is down.. oh and yeah, everything else on the internet is also...
nah - instead I'm going to say - hey our connection to the internet is down. Anything we access through online services is inaccessible because of that.
Exactly. In a case like that you would know that you have no reason to suspect that Exchange is down. You can also hop on your phone and determine, almost certainly, that it is up.
But why does this matter? No one is blaming Exchange for being down in this case. Can you restate the question you think I'm asking?
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
-
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
-
@Dashrender said:
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
Oh yes, typically it is large. But it can be hard to tell. Outages can be account, datacenter, region, ISP, total, etc. Total has never, TTBOMK, happened. Lots of outages have happened that are MS' fault. But determining when it is can get to be a little complicated.
-
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
Oh yes, typically it is large. But it can be hard to tell. Outages can be account, datacenter, region, ISP, total, etc. Total has never, TTBOMK, happened. Lots of outages have happened that are MS' fault. But determining when it is can get to be a little complicated.
But those outages are all Exchange outages because from the outside, we have no idea what caused the outage - unless there are reports telling us (there might be, I've never looked) what happened.
-
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
Oh yes, typically it is large. But it can be hard to tell. Outages can be account, datacenter, region, ISP, total, etc. Total has never, TTBOMK, happened. Lots of outages have happened that are MS' fault. But determining when it is can get to be a little complicated.
But those outages are all Exchange outages because from the outside, we have no idea what caused the outage - unless there are reports telling us (there might be, I've never looked) what happened.
I agree. But I've been yelled at for calling some of them outages here on the community before, you see.
-
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
Oh yes, typically it is large. But it can be hard to tell. Outages can be account, datacenter, region, ISP, total, etc. Total has never, TTBOMK, happened. Lots of outages have happened that are MS' fault. But determining when it is can get to be a little complicated.
But those outages are all Exchange outages because from the outside, we have no idea what caused the outage - unless there are reports telling us (there might be, I've never looked) what happened.
I agree. But I've been yelled at for calling some of them outages here on the community before, you see.
Really? I'd ask by whom, but that's not important -
Ohh.. they were saying, we'll I'm not down.. so you're just crazy Scott. gotcha.
yeah I'm on your side here.
The really tricky part comes when it's just a single user - is it an outage if a single user can't access, but everyone else can? I'd say no. -
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
In most cases where people state the risk of Exchange being down, it's in a position where "some people can still access it" and we are not clear where the momentary outage is.
aww OK.
Well from what I've seen around here, if MS is having an outage in O365, it normally affects an entire small company (large businesses due to large geo-diversity might have some parts be up while others are down), not just a few users.
Oh yes, typically it is large. But it can be hard to tell. Outages can be account, datacenter, region, ISP, total, etc. Total has never, TTBOMK, happened. Lots of outages have happened that are MS' fault. But determining when it is can get to be a little complicated.
But those outages are all Exchange outages because from the outside, we have no idea what caused the outage - unless there are reports telling us (there might be, I've never looked) what happened.
I agree. But I've been yelled at for calling some of them outages here on the community before, you see.
Really? I'd ask by whom, but that's not important -
Ohh.. they were saying, we'll I'm not down.. so you're just crazy Scott. gotcha.
yeah I'm on your side here.
The really tricky part comes when it's just a single user - is it an outage if a single user can't access, but everyone else can? I'd say no.Yeah, it was that the outage, while being from the provider and no means of working around and a REAL outage with services 100% gone, that because it was isolated by account that I couldn't say that they had an outage. But the services were gone and they could not even bring them back themselves.
It wasn't a single user, not even a single company, but a single class of companies.
-
@Dashrender by me of course.
Because the way it was phrased was click baity and inferring more than it was.
I never said it wasn't an outage. I only said it wasn't an Exchange outage. it was an account outage. Semantics. But important to be clear on exactly where the outage occurred.
Something like that can easily be called an Exchange outage initially. but once things are known to be an account outage, then it needs specified.
On the other hand I have also argued with vendors that say they do not have an outage because their upstream provider has an outage. To me the user, it is my vendor's outage. I am not a client of the upstream provider.
This is different because it is unrelated to my account in any way. Unlike the instance @scottalanmiller was referring to.
Also, I want my cake and will eat it too.
-
@JaredBusch said:
@Dashrender by me of course.
Because the way it was phrased was click baity and inferring more than it was.
I never said it wasn't an outage. I only said it wasn't an Exchange outage. it was an account outage. Semantics. But important to be clear on exactly where the outage occurred.
Something like that can easily be called an Exchange outage initially. but once things are known to be an account outage, then it needs specified.
On the other hand I have also argued with vendors that say they do not have an outage because their upstream provider has an outage. To me the user, it is my vendor's outage. I am not a client of the upstream provider.
This is different because it is unrelated to my account in any way. Unlike the instance @scottalanmiller was referring to.
Also, I want my cake and will eat it too.
I mostly agree here, but the issue, to us, was that while there was an account outage, that triggered an account-localized Exchange and Azure outage. Exchange and Azure services were unavailable to a group of customers (we know of several others affected too, not just our account). So by any normal reasoning yes, there was an account outage, but that's a little like the vendor saying that their ISP failed. To us, as a customer, Exchange and Azure had failed and it something technical on the vendor's side (account is not technical, that they could not fix the account was technical) that they took days in one instance to fix and are still on months trying to fix in the other.
If we use the criteria of "are the services no longer available to the customer(s)" then the answer was yes. The cause was "technical glitches in the account management system" rather than "vendor's ISP failed" , but that's after the point that Exchange and Azure had outages. The end result was Exchange and Azure being down to customer(s).
-
I had a conversation on Wednesday even with someone that just was so crazy it locked up my mind and I could not create a coherent argument. The beer count prior to this conversation also had something to do with it I am sure.
I was out drinking with an old high school buddy that runs a local PC shop in my hometown. He does not do commercial work in general. Basically he spends all day everyday removing viruses and recovering basic stuff for home users. Great guy, knows the limits of what he wants to learn and do, etc.
Anyway, some local guy that is recently into doing some basic consulting work sees us (my friend mostly) and comes and joins us. Basic conversation on various IT stuff ensues, then this guy suddenly starts digging into my buddy wondering if he has any KVM experience (he does not) and if he was wanting to maybe get in on a deal with him to put a rack in his office and set up severs to for people.
A little questioning from me and he is like MS sucks. VMWare is crap and no way would I use Citrix. All that stuff is too expensive. I setup everything myself in Linux and use KVM.
I just locked up... I mean WTF and you have actual clients? Apparently he does. Currently all hosted out of his house. I am fairly certain that violates the ToS on his internet service.
He seriously had no answer for why he was not using something with any kind of support. I mean, I have nothing against KVM, but damn if I am gong to pay for my shit to be someplace, I want it someplace with something behind it.
I told the guy, if you want KVM, buy Scale gear. If you want free, use XenServer or Hyper-V in that order. Either way do something that approaches some kind of industry standard that you can get supported.