Domain Controller Down (VM)
-
@Dashrender said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
At least if the other end knew what he needed he could get some help. But now he might cancel his subscription and go somewhere else (which I believe is what they are trying to avoid). I can't imagine the amount of "IT Pros" that contact them looking for support for issues like that.
Same vein, how many avoid them because they don't provide ANY reasonable support options? I'm never asking anyone to support everything, but everyone needs to support something serious.
Right, and they do. VMware.
Oh okay, well that's fine then. Not the BEST option, but acceptable. And by BEST I don't mean that VMware is or isn't the best, I mean ONLY supporting that one is not as good as supported a few options.
Ya, this whole thing started because Dustin said he should drop them since they don't support anything else. That's ridiculous.
I see. Yeah that's going to far. That's lacking variety and options, but not lacking an enterprise deployment option. You have to figure the costs associated with VMware into the product's costs when decision making, but that's about it. VMware is very, very enterprise. It's a bit crappy that they don't offer ANY lower cost options for companies like this where VMware is way out of their league and crazy that they allow 100Mb/s Synology iSCSI but require VMware ESXi... so they have some clear problems in their thinking and requirements, but VMware itself is just fine.
To be clear, requiring VMware ESXi in a supported configuration is at odds with the 100Mb/s for vMotion and iSCSI (VMware does NOT support this abomination of a configuration).
I thought that I was stating that... that they had a mismatch, going for the biggest, baddest, most expensive enterprise hypervisor and then... don't care if it is set up in a viable way.
To be clear, he has Essentials Plus which is only 6K up front and $1200 a year for 24/7 support and free upgrades. This is the CHEAPEST hypervisor from an ongoing support for 24/7 support of 6 sockets, and a central management and monitoring solution. (Citrix for XenServer, and Red Hat cost more. Microsoft crazy more for SCCM-VMM and a support agreement).
I guess the difference there, at least with MS, is that you don't expect to get your expert support from MS directly, instead you get it from companies like NTG or those who know it.
But the bigger fail is - did they really need Essentials Plus in the first place? Could they afford near zero downtime? seems unlikely, unless they are a location that's open 24/7.
I can't disagree with you more.
Its a medical facility that has beds occupied 24/7 so yes. There is a bizarre assumption (That I used to be guilty to) that because you are small you don't need 24/7 availability and that is just changing. An increasing number of SMB's operate 3 shifts, or have customer expectations of availability 24/7. Its true you can have maintenance windows and things, and maybe we should blame google for it, but the game has changed. More mission critical systems have gone from pen and paper to the computers.
Spending 6K so you can get 24/7 support (That's not even an option of Essentials) is less on a per daily basis than my wife's star buck's addiction. That's not a big fail and nothing anyone should be shamed over especially one with no training and no backup (That's a bigger fail, but not a replacement for vendor support).
MSPs are NOT a replacement for a support agreement (In fact most REQUIRE you have them). If there is a driver issue someone has to stay on the phone and deal with it. Most MSP's worth a damn are going to charge you for 24/7 support of a hypervisor ~$150-250 per host. So the support costs for his 3 hosts from the MSP would actually be more even if you went Free Hyper-V. Given the MSP would need to manage patching, the costs for overtime to force it being done after hours disruptively would likely negate any savings from going local storage with no vMotion for patching.
I advocate having both. In house steady state IT should NOT be running outage's by them self's without the opportunity for a shift change. Also MSP's see every kind of outage and know how to isolate and react to them. In this case any normal MSP would have...
- Never agree'd to support this environment. They wouldn't have signed a contract after the discovery until this storage/networking mess was fixed.
- Mandated support remote monitoring (SNMP/Syslog) of the switch and detected the fault and isolated it. This would have cut the outage to a 1/3 of its length.
- If a HA cluster was deployed used, 2 switches would have been deployed so only a single one would have failed (no outage).
- Would be regularly patching the environment so he was on a mainstream supported release of vSphere.
- Would have demanded a replacement of the Synology.
- Would be actively managing proper backups (and not using an ancient version of ArcServe).
- Been on the phone handling the issue, handling updates to management to keep them out of the way, and bringing in specialists as needed (networking, storage, hypervisor) as well as used their partner relationships with the vendors involved (Cisco, HP, VMware) to get escalated tickets opened and tracked as needed.
A proper MSP is like having an enterprise support army in your back pocket for less than the cost of a FTE. Honestly as a SMB you shouldn't hire an in house resource before you hire a MSP first, and any shop that doesn't want to pay for a MSP but will pay for a FTE is a GIANT red flag that they lack any level of competence in IT governance, budgeting, or common sense.
-
I didn't know what kind of medical facility @wirestyle22 was in..
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.
-
@Dashrender said in Domain Controller Down (VM):
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
I can't disagree more. I've seen someone try to do this in a SMB and they got fired.
It is available in ESXi (its a bit faster in 5.5 ESXi has a proper IO mirror driver so you don't have helper snapshots in a never ending catch up process to handle the IO happening during the merge of snapshots).- Doing shared nothing migrations impacts performance (Seriously, look at the disk latency the next time you do it. Telling management "well we kicked off the migration 7 hours ago and we can't really stop it" is a great way to get shown the door.
- This doesn't scale, and can make patch windows take DAYS very quickly. No one would seriously consider this for monthly patching.
- If you have high enough IO and are using a hypervisor that lacks a mirror driver you end up with an never ending amount of snapshot merges.
-
@Dashrender said in Domain Controller Down (VM):
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
Storage motion is not for production hours. That's great if you have a greenzone, but if you have that you don't need the storage motion. Storage motion is mostly for migrations and one time, unavailable events. It's not something you do during production time unless you have no choice (dying storage system.)
-
@Dashrender said in Domain Controller Down (VM):
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
It's as simple as "there was no HA and no attempt made at it."
-
@Dashrender said in Domain Controller Down (VM):
I didn't know what kind of medical facility @wirestyle22 was in..
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
Medical facilities with beds have generators and fuel. HVAC for something this small can be covered for redundancy with a spot cooler (I have this in my own house for my lab, so If I can afford it, you have to be a tiny outfit to not be able to afford it). I agree its a process, and the biggest piece is having a MSP to back you up, and having 24/7 dispatched resources to help you with the persistent layer. Not having redundancy at the people level is the biggest issue to address. While I normally advocate some kind of offsite ready to fire off DR, in the case of a facility like this its not actually as important (beyond BC reasons) because if the whole facility blows up the need for the system goes with it. Still there are a bazillion Veeam/VCAN partners who can cover this piece for cheap so why not.
-
@scottalanmiller said in Domain Controller Down (VM):
@Dashrender said in Domain Controller Down (VM):
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
It's as simple as "there was no HA and no attempt made at it."
It would take me about 5 minutes to explain to a 3rd grader why the system he has isn't redundant is bad. The fact that it continues to exist shows that either...
- Management has the intellectual capacity below a 3rd grader (possible)
- No one in non-jargon english explained how bad this configuration was. (more likely).
-
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
-
@Dashrender said in Domain Controller Down (VM):
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong.
Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!).
-
@John-Nicholson said in Domain Controller Down (VM):
A proper MSP is like having an enterprise support army in your back pocket for less than the cost of a FTE. Honestly as a SMB you shouldn't hire an in house resource before you hire a MSP first, and any shop that doesn't want to pay for a MSP but will pay for a FTE is a GIANT red flag that they lack any level of competence in IT governance, budgeting, or common sense.
I agree. Anyone going into an FTE role in an SMB should probably ask what their MSP ecosystem of support is like BEFORE accepting a position. That's something that we never talk about but is a great idea. They should either have a great answer (and the MSP should be likely part of the interview process) or they should be like "that's why we are bringing you in, to help us find those good resources."
-
@John-Nicholson said in Domain Controller Down (VM):
@Dashrender said in Domain Controller Down (VM):
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong.
Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!).
"IT Right" isn't even a thing. IT is just part of the business. It's "running the business right."
-
We actually did a video on that last night, it is being edited right now.
-
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
-
-
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority.
We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage.
-
-
I totally understand that there are medical situations where high availability and high uptime are considered necessary and make sense in a business context. And I totally agree that this has the potential to be one of them. I'm only saying that it being possible doesn't make it so and that all indications from reading back their previous decisions, investments and behaviour suggest that they do not agree with that assessment.
-
@John-Nicholson said in Domain Controller Down (VM):
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
1 - take your word for it at this point
2 - what prevents you from documenting on paper and then entering when the system comes up - every one I know operates this way, and they do get paid for those things that are transposed to electronic after the fact. -
-
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority.
We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage.
This type of argument is something I see you make all the time. Just because the system didn't perform in the manner that they wanted/needed - doesn't mean that they weren't trying to obtain it just the same. What it does mean is that whoever they hired to accomplish that goal lied to them (assuming that really was the goal).
If you're the business owner and don't know squat about IT, so you hire George the IT consultant - how is owner suppose to know that George did the job right or wrong? Unless you're telling me that the owner should be hiring a second consultant to look over George's work to make sure it was what the owner really wanted?
-
-
@scottalanmiller said in Domain Controller Down (VM):
I totally understand that there are medical situations where high availability and high uptime are considered necessary and make sense in a business context. And I totally agree that this has the potential to be one of them. I'm only saying that it being possible doesn't make it so and that all indications from reading back their previous decisions, investments and behaviour suggest that they do not agree with that assessment.
Again - read my previous post - Assuming the owner's aren't IT personal - how are they SUPPOSED to know? It was like John asking why WS didn't refresh the ISCSI connection instead of rebooting the whole switch - if he's never done it before, how's he suppose to know? All they can do is trust those that they hire to do what was asked.
-
@Dashrender said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority.
We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage.
This type of argument is something I see you make all the time. Just because the system didn't perform in the manner that they wanted/needed - doesn't mean that they weren't trying to obtain it just the same. What it does mean is that whoever they hired to accomplish that goal lied to them (assuming that really was the goal).
I didn't say that it did. I said that it was the only information that we have and that every decision both planned and triage pointed to the same conclusion - that they don't care about uptime. That's it, period. ANYTHING other than this is someone here injecting personal opinion into the mix. Pushing HA where no HA is suggested. We have no reason to suspect that they ever felt that HA was going to happen. That's an assumption based on nothing at all.
That doesn't mean that they didn't, it only means that there is zero evidence to suggest it. All evidence that we have points away. It's that simple. They took no actions towards HA, they didn't state that they wanted HA, they didn't provide documentation as to why HA would be needed, they didn't behave in an HA way.
-
-
@Dashrender said in Domain Controller Down (VM):
If you're the business owner and don't know squat about IT, so you hire George the IT consultant - how is owner suppose to know that George did the job right or wrong? Unless you're telling me that the owner should be hiring a second consultant to look over George's work to make sure it was what the owner really wanted?
You are only making an argument for why the evidence that they don't want HA is not very strong. I never said that it was. You are not even slightly making an argument that they wanted HA, only that we don't know much based on the evidence based on the assumption that they CEO is an moron and can't do his job. Other than that being a moderately safe assumption as it is generally the case in SMBs, it tells us nothing. I never stated anything to the contrary, so pointing this out doesn't dispute my point.