Domain Controller Down (VM)
-
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
At least if the other end knew what he needed he could get some help. But now he might cancel his subscription and go somewhere else (which I believe is what they are trying to avoid). I can't imagine the amount of "IT Pros" that contact them looking for support for issues like that.
Same vein, how many avoid them because they don't provide ANY reasonable support options? I'm never asking anyone to support everything, but everyone needs to support something serious.
Right, and they do. VMware.
Oh okay, well that's fine then. Not the BEST option, but acceptable. And by BEST I don't mean that VMware is or isn't the best, I mean ONLY supporting that one is not as good as supported a few options.
Ya, this whole thing started because Dustin said he should drop them since they don't support anything else. That's ridiculous.
I see. Yeah that's going to far. That's lacking variety and options, but not lacking an enterprise deployment option. You have to figure the costs associated with VMware into the product's costs when decision making, but that's about it. VMware is very, very enterprise. It's a bit crappy that they don't offer ANY lower cost options for companies like this where VMware is way out of their league and crazy that they allow 100Mb/s Synology iSCSI but require VMware ESXi... so they have some clear problems in their thinking and requirements, but VMware itself is just fine.
To be clear, requiring VMware ESXi in a supported configuration is at odds with the 100Mb/s for vMotion and iSCSI (VMware does NOT support this abomination of a configuration).
I thought that I was stating that... that they had a mismatch, going for the biggest, baddest, most expensive enterprise hypervisor and then... don't care if it is set up in a viable way.
To be clear, he has Essentials Plus which is only 6K up front and $1200 a year for 24/7 support and free upgrades. This is the CHEAPEST hypervisor from an ongoing support for 24/7 support of 6 sockets, and a central management and monitoring solution. (Citrix for XenServer, and Red Hat cost more. Microsoft crazy more for SCCM-VMM and a support agreement).
-
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
At least if the other end knew what he needed he could get some help. But now he might cancel his subscription and go somewhere else (which I believe is what they are trying to avoid). I can't imagine the amount of "IT Pros" that contact them looking for support for issues like that.
Same vein, how many avoid them because they don't provide ANY reasonable support options? I'm never asking anyone to support everything, but everyone needs to support something serious.
Right, and they do. VMware.
Oh okay, well that's fine then. Not the BEST option, but acceptable. And by BEST I don't mean that VMware is or isn't the best, I mean ONLY supporting that one is not as good as supported a few options.
Ya, this whole thing started because Dustin said he should drop them since they don't support anything else. That's ridiculous.
I see. Yeah that's going to far. That's lacking variety and options, but not lacking an enterprise deployment option. You have to figure the costs associated with VMware into the product's costs when decision making, but that's about it. VMware is very, very enterprise. It's a bit crappy that they don't offer ANY lower cost options for companies like this where VMware is way out of their league and crazy that they allow 100Mb/s Synology iSCSI but require VMware ESXi... so they have some clear problems in their thinking and requirements, but VMware itself is just fine.
To be clear, requiring VMware ESXi in a supported configuration is at odds with the 100Mb/s for vMotion and iSCSI (VMware does NOT support this abomination of a configuration).
I thought that I was stating that... that they had a mismatch, going for the biggest, baddest, most expensive enterprise hypervisor and then... don't care if it is set up in a viable way.
To be clear, he has Essentials Plus which is only 6K up front and $1200 a year for 24/7 support and free upgrades. This is the CHEAPEST hypervisor from an ongoing support for 24/7 support of 6 sockets, and a central management and monitoring solution. (Citrix for XenServer, and Red Hat cost more. Microsoft crazy more for SCCM-VMM and a support agreement).
I guess the difference there, at least with MS, is that you don't expect to get your expert support from MS directly, instead you get it from companies like NTG or those who know it.
But the bigger fail is - did they really need Essentials Plus in the first place? Could they afford near zero downtime? seems unlikely, unless they are a location that's open 24/7.
-
@Dashrender said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
At least if the other end knew what he needed he could get some help. But now he might cancel his subscription and go somewhere else (which I believe is what they are trying to avoid). I can't imagine the amount of "IT Pros" that contact them looking for support for issues like that.
Same vein, how many avoid them because they don't provide ANY reasonable support options? I'm never asking anyone to support everything, but everyone needs to support something serious.
Right, and they do. VMware.
Oh okay, well that's fine then. Not the BEST option, but acceptable. And by BEST I don't mean that VMware is or isn't the best, I mean ONLY supporting that one is not as good as supported a few options.
Ya, this whole thing started because Dustin said he should drop them since they don't support anything else. That's ridiculous.
I see. Yeah that's going to far. That's lacking variety and options, but not lacking an enterprise deployment option. You have to figure the costs associated with VMware into the product's costs when decision making, but that's about it. VMware is very, very enterprise. It's a bit crappy that they don't offer ANY lower cost options for companies like this where VMware is way out of their league and crazy that they allow 100Mb/s Synology iSCSI but require VMware ESXi... so they have some clear problems in their thinking and requirements, but VMware itself is just fine.
This is another break down at the vendor end, most likely. The vendor probably only said - we only support ESXi as a hypervisor. Beyond that they probably don't say what server hardware they support/require, or the NICs or the Switches, or the SAN.
What they should be providing is minimum requirements in things like RAM and IOPs, then say - you must supply these, we really don't care how. Clearly if that had been done, it's likely that the synology SANs and the 100 Mb switches would have failed that test and other options would have had to be implemented.
To be fair, its implied that you have at least GigE for iSCSI/NFS/vMotion. The Implementor had to have been either an idiot or greedy to deploy this. I got asked one time to do something like this and I just walked out and told sales to refund their money when they refused to get real gear. I couldn't risk my company name and professional reputation being attached to such clown car stuff.
Even personally (working in house IT) you have to put your foot down at some point, because otherwise the users will talk about how shitty your IT is, and it will impact your ability to get a job elsewhere when others hear about all the outage and performance issues. Even if you can tell in an interview why it was that bad, no one wants to hire someone who worked in a clown car for 3-5 years.
-
@Dashrender said in Domain Controller Down (VM):
unless they are a location that's open 24/7.
ya they are
-
@Dashrender said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@stacksofplates said in Domain Controller Down (VM):
At least if the other end knew what he needed he could get some help. But now he might cancel his subscription and go somewhere else (which I believe is what they are trying to avoid). I can't imagine the amount of "IT Pros" that contact them looking for support for issues like that.
Same vein, how many avoid them because they don't provide ANY reasonable support options? I'm never asking anyone to support everything, but everyone needs to support something serious.
Right, and they do. VMware.
Oh okay, well that's fine then. Not the BEST option, but acceptable. And by BEST I don't mean that VMware is or isn't the best, I mean ONLY supporting that one is not as good as supported a few options.
Ya, this whole thing started because Dustin said he should drop them since they don't support anything else. That's ridiculous.
I see. Yeah that's going to far. That's lacking variety and options, but not lacking an enterprise deployment option. You have to figure the costs associated with VMware into the product's costs when decision making, but that's about it. VMware is very, very enterprise. It's a bit crappy that they don't offer ANY lower cost options for companies like this where VMware is way out of their league and crazy that they allow 100Mb/s Synology iSCSI but require VMware ESXi... so they have some clear problems in their thinking and requirements, but VMware itself is just fine.
To be clear, requiring VMware ESXi in a supported configuration is at odds with the 100Mb/s for vMotion and iSCSI (VMware does NOT support this abomination of a configuration).
I thought that I was stating that... that they had a mismatch, going for the biggest, baddest, most expensive enterprise hypervisor and then... don't care if it is set up in a viable way.
To be clear, he has Essentials Plus which is only 6K up front and $1200 a year for 24/7 support and free upgrades. This is the CHEAPEST hypervisor from an ongoing support for 24/7 support of 6 sockets, and a central management and monitoring solution. (Citrix for XenServer, and Red Hat cost more. Microsoft crazy more for SCCM-VMM and a support agreement).
I guess the difference there, at least with MS, is that you don't expect to get your expert support from MS directly, instead you get it from companies like NTG or those who know it.
But the bigger fail is - did they really need Essentials Plus in the first place? Could they afford near zero downtime? seems unlikely, unless they are a location that's open 24/7.
I can't disagree with you more.
Its a medical facility that has beds occupied 24/7 so yes. There is a bizarre assumption (That I used to be guilty to) that because you are small you don't need 24/7 availability and that is just changing. An increasing number of SMB's operate 3 shifts, or have customer expectations of availability 24/7. Its true you can have maintenance windows and things, and maybe we should blame google for it, but the game has changed. More mission critical systems have gone from pen and paper to the computers.
Spending 6K so you can get 24/7 support (That's not even an option of Essentials) is less on a per daily basis than my wife's star buck's addiction. That's not a big fail and nothing anyone should be shamed over especially one with no training and no backup (That's a bigger fail, but not a replacement for vendor support).
MSPs are NOT a replacement for a support agreement (In fact most REQUIRE you have them). If there is a driver issue someone has to stay on the phone and deal with it. Most MSP's worth a damn are going to charge you for 24/7 support of a hypervisor ~$150-250 per host. So the support costs for his 3 hosts from the MSP would actually be more even if you went Free Hyper-V. Given the MSP would need to manage patching, the costs for overtime to force it being done after hours disruptively would likely negate any savings from going local storage with no vMotion for patching.
I advocate having both. In house steady state IT should NOT be running outage's by them self's without the opportunity for a shift change. Also MSP's see every kind of outage and know how to isolate and react to them. In this case any normal MSP would have...
- Never agree'd to support this environment. They wouldn't have signed a contract after the discovery until this storage/networking mess was fixed.
- Mandated support remote monitoring (SNMP/Syslog) of the switch and detected the fault and isolated it. This would have cut the outage to a 1/3 of its length.
- If a HA cluster was deployed used, 2 switches would have been deployed so only a single one would have failed (no outage).
- Would be regularly patching the environment so he was on a mainstream supported release of vSphere.
- Would have demanded a replacement of the Synology.
- Would be actively managing proper backups (and not using an ancient version of ArcServe).
- Been on the phone handling the issue, handling updates to management to keep them out of the way, and bringing in specialists as needed (networking, storage, hypervisor) as well as used their partner relationships with the vendors involved (Cisco, HP, VMware) to get escalated tickets opened and tracked as needed.
A proper MSP is like having an enterprise support army in your back pocket for less than the cost of a FTE. Honestly as a SMB you shouldn't hire an in house resource before you hire a MSP first, and any shop that doesn't want to pay for a MSP but will pay for a FTE is a GIANT red flag that they lack any level of competence in IT governance, budgeting, or common sense.
-
I didn't know what kind of medical facility @wirestyle22 was in..
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.
-
@Dashrender said in Domain Controller Down (VM):
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
I can't disagree more. I've seen someone try to do this in a SMB and they got fired.
It is available in ESXi (its a bit faster in 5.5 ESXi has a proper IO mirror driver so you don't have helper snapshots in a never ending catch up process to handle the IO happening during the merge of snapshots).- Doing shared nothing migrations impacts performance (Seriously, look at the disk latency the next time you do it. Telling management "well we kicked off the migration 7 hours ago and we can't really stop it" is a great way to get shown the door.
- This doesn't scale, and can make patch windows take DAYS very quickly. No one would seriously consider this for monthly patching.
- If you have high enough IO and are using a hypervisor that lacks a mirror driver you end up with an never ending amount of snapshot merges.
-
@Dashrender said in Domain Controller Down (VM):
OK since the place is 24/7, he needs a higher than normal amount of uptime - fine. But real HA? really? I know XenServer and Hyper-V can both do storage motion while the system is running, so no shared storage is needed (granted XS is super slow, sooooo) so you don't need HA to do patches, you just need the storage motion options - I don't know if that's available in ESXi Essentials or not.
Storage motion is not for production hours. That's great if you have a greenzone, but if you have that you don't need the storage motion. Storage motion is mostly for migrations and one time, unavailable events. It's not something you do during production time unless you have no choice (dying storage system.)
-
@Dashrender said in Domain Controller Down (VM):
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
It's as simple as "there was no HA and no attempt made at it."
-
@Dashrender said in Domain Controller Down (VM):
I didn't know what kind of medical facility @wirestyle22 was in..
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
Medical facilities with beds have generators and fuel. HVAC for something this small can be covered for redundancy with a spot cooler (I have this in my own house for my lab, so If I can afford it, you have to be a tiny outfit to not be able to afford it). I agree its a process, and the biggest piece is having a MSP to back you up, and having 24/7 dispatched resources to help you with the persistent layer. Not having redundancy at the people level is the biggest issue to address. While I normally advocate some kind of offsite ready to fire off DR, in the case of a facility like this its not actually as important (beyond BC reasons) because if the whole facility blows up the need for the system goes with it. Still there are a bazillion Veeam/VCAN partners who can cover this piece for cheap so why not.
-
@scottalanmiller said in Domain Controller Down (VM):
@Dashrender said in Domain Controller Down (VM):
If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in.
It's as simple as "there was no HA and no attempt made at it."
It would take me about 5 minutes to explain to a 3rd grader why the system he has isn't redundant is bad. The fact that it continues to exist shows that either...
- Management has the intellectual capacity below a 3rd grader (possible)
- No one in non-jargon english explained how bad this configuration was. (more likely).
-
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
-
@Dashrender said in Domain Controller Down (VM):
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong.
Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!).
-
@John-Nicholson said in Domain Controller Down (VM):
A proper MSP is like having an enterprise support army in your back pocket for less than the cost of a FTE. Honestly as a SMB you shouldn't hire an in house resource before you hire a MSP first, and any shop that doesn't want to pay for a MSP but will pay for a FTE is a GIANT red flag that they lack any level of competence in IT governance, budgeting, or common sense.
I agree. Anyone going into an FTE role in an SMB should probably ask what their MSP ecosystem of support is like BEFORE accepting a position. That's something that we never talk about but is a great idea. They should either have a great answer (and the MSP should be likely part of the interview process) or they should be like "that's why we are bringing you in, to help us find those good resources."
-
@John-Nicholson said in Domain Controller Down (VM):
@Dashrender said in Domain Controller Down (VM):
As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT.
Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong.
Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!).
"IT Right" isn't even a thing. IT is just part of the business. It's "running the business right."
-
We actually did a video on that last night, it is being edited right now.
-
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
-
-
@John-Nicholson said in Domain Controller Down (VM):
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
Its a medical facility that has beds occupied 24/7 so yes.
That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way.
Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements.
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority.
We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage.
-
-
I totally understand that there are medical situations where high availability and high uptime are considered necessary and make sense in a business context. And I totally agree that this has the potential to be one of them. I'm only saying that it being possible doesn't make it so and that all indications from reading back their previous decisions, investments and behaviour suggest that they do not agree with that assessment.
-
@John-Nicholson said in Domain Controller Down (VM):
EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes.
-
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards)
-
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly.
1 - take your word for it at this point
2 - what prevents you from documenting on paper and then entering when the system comes up - every one I know operates this way, and they do get paid for those things that are transposed to electronic after the fact. -