ZFS Based Storage for Medium VMWare Workload

scottalanmiller

But that is completely unnecessary if you move to Xen (or is it XenServer - still confused) or Hyper-V

If he moves to HyperV or XenServer he would still need proprietary replicated local storage options at his node count. But it would be free at the platform layer (saving $10K at least) and far cheaper at the storage layer (saving many thousands more.)

donaldlandru

@scottalanmiller said:

It's important to recognize that it is a SPOF. But being a SPOF is not the core issue, believe it or not, just the one that causes the biggest emotional reaction. If you were to buy a super high end active/active EMC or HDS device for this (mainframe class storage, start around $50K for the smallest possible units) the fact that it was a SPOF would be heavily mitigated. The whole mainframe concept is built around making a SPOF that is unlikely to fail.

But your issues are bigger. Here are the big issues that you are left with in both of your scenarios:

Single point of failure on which everything rests (the thing most likely to fail causes EVERYTHING to fail.)

No risk mitigation for the other layers in the dependency chain. This isn't a 3-2-1 as traditionally described but actually a (1/1/1-1) meaning ANY server failure results in unmitigated (literally) failure AND any storage failure results in total failure. You have a dramatic increase in failure risk with this design, not just a small or moderate increase like most people see (because most people are confused and heavily mitigate risk at one or two but not all three layers.) So it is very important to realize that this is at least one full order of magnitude more risky than a traditional inverted pyramid of doom.

The single point of failure that you have is actually a pretty fragile one. Probably more fragile than the servers themselves. So not only is the risk of failure doubled by having two completely places for things to fail, but the single point of failure that impacts everything is the most fragile piece of all.

This has the highest cost both today AND going into the future.

Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)
Move to local storage and create redundant servers for items that can't be down (split-scope DHCP, second Exchange server) not sure how to mitigate the risk to SharePoint being offline since it is the free version, plus the SQL server would be another single point

When dealing with the Microsoft licensing to create the redundancy to obtain the reliability the business wants I think we are coming in at around the same price. Going with local storage here would reduce the complexity and if I can convince the organization to go with Office 365 we actually have a lot lower risk here and wouldn't need to create a bunch of highly available services.

The second topic (scope) is the development environments and you are 100% correct, even if we have active/active SAN clusters the failure will always be at the server level. The lack of vmotion in this "cluster" and the lack of available resources to do a failover, make the compute layer the biggest problem. If we lose a compute node those servers are offline until replaced. The business accepts that risk as long as we have a fast way of spinning down VMs and bringing up the VMs the team is working on. This is much easier with shared storage than local, in my opinion.

So I do have multiple problems to solve, with different sets of requirements.

scottalanmiller

If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following:

Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!
Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)
Make two clusters of two servers each keeping every software piece free and simple

scottalanmiller

Going the XenServer HA route, the guy who actually makes HA-Lizard is here in the community so that is a big deal that not only do you have XS resources here, but you have *the XS HA resource.

scottalanmiller

@donaldlandru said:

Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation.

donaldlandru

My next biggest concern, like any technology, is how do I get there from here. I have enough budget for a storage node, and we are going to run out of space within the next 60 days. I do not have, and will not receive additional funding this year for new servers. So some form of "in-place" style of upgrade has to occur. Obviously, this is a server down, convert vm bring it back up type of process that has an unknown LoE.

Trying to not paint a picture of a rock and a hard place, but realistically where else am I at right now?

Dashrender

@scottalanmiller said:

If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following:

Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!

Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)

Make two clusters of two servers each keeping every software piece free and simple

That would cost a lot more than his current $14,000 budget (assuming that number was a budget number).

scottalanmiller

@donaldlandru said:

Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)

I've never seen someone do this successfully. That doesn't suggest that it doesn't work, but are you sure that the MSA series will do SAN mirroring with fault tolerance? I'm not confident that that is a feature (but certainly not confident that it isn't.) Double check that to be sure as I talk to MSA users daily and no one has ever led me to believe that this was even an option.

I know that Dell's MD series cannot do this, only the EQL series.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation.

In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb.

scottalanmiller

@Dashrender said:

That would cost a lot more than his current $14,000 budget (assuming that number was a budget number).

Yes, but cost far less than what he was proposing. My recommendations were to lower his cost while improving reliability originally. Then he lept to the Ferrari scenario so I proposed another solution that still beats that one while maintaining the Ferrari features while still only spending a fraction as much money.

scottalanmiller

@donaldlandru said:

In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb.

Oh okay. So mitigated where it matters, I assume, and unmitigated where it doesn't matter so much. That I was not clear about.

Dashrender

@scottalanmiller said:

Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right?

That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief.

This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works."

donaldlandru

@Dashrender said:

@scottalanmiller said:

Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right?

That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief.

This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works."

This -- all day long! It worked for the 7 years before you got here, it will keep working long after. I fight this fight every day

dafyre

@scottalanmiller From Previous posts, it sounds like they are most concerned with the Dev environment right now since the Ops cluster appears to be ok.

coliver

@donaldlandru said:

@Dashrender said:

@scottalanmiller said:

Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right?

That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief.

This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works."

This -- all day long! It worked for the 7 years before you got here, it will keep working long after. I fight this fight every day

All you need is one good outage and the tune changes. Or at least that's what I experienced at my last position.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)

I've never seen someone do this successfully. That doesn't suggest that it doesn't work, but are you sure that the MSA series will do SAN mirroring with fault tolerance? I'm not confident that that is a feature (but certainly not confident that it isn't.) Double check that to be sure as I talk to MSA users daily and no one has ever led me to believe that this was even an option.

I know that Dell's MD series cannot do this, only the EQL series.

Real life I am not sure if it works, on paper it does. It is a false sense of security but the MSA does have active/active controllers built in (10GB iSCSI), redundant power supplies, and of course the disks are in a RAID. The risks that are not mitigated by the single chassis are:

Chassis failure (I am sure it can happen, but the only part in the chassis is the backplane and some power routing)
Software bug -- most likely failure to occur
Human error (oops I just unplugged the storage chassis)

All in all I think the operations is pretty well protected, minus the three risks listed above. It is two nodes that can absorb either node failing, it is on redundant 10gig top of rack switches and redundant 1gig switches. Also, backups are done and tested as well with Veeam. Am I missing something here?

Unless I am mistaken, and Scott please correct me if I am, it is the three node development cluster that is in sorry shape.

dafyre

In your Dev environment, you have 3 servers... with 288GB of Ram, 64GB of RAM, and 16 GB of RAM... Assume RAM compatibility... What happens if you balance out those three servers and get them at least close to having the same amount of RAM?

Does that help you at all? If that is a good idea, then why not look at converting them to XenServer and switching to Local Storage? You could then replicate the VMs to each of the three hosts, or you could set up HA-Lizard.

dafyre

If I am not mistaken with planned outages, you can actually Migrate VMs from one XenServer host to another (this is also true for Hyper-V, IIRC) without having shared storage... So for Maintenance, you can live migrate from one XenServer host to another (it would copy the storage too)... Whether or not that is feasible depends on the size of your VMs and speed of your network... among other things.

Dashrender

@coliver said:

@donaldlandru said:

@Dashrender said:

@scottalanmiller said:

Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right?

That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief.

This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works."

This -- all day long! It worked for the 7 years before you got here, it will keep working long after. I fight this fight every day

All you need is one good outage and the tune changes. Or at least that's what I experienced at my last position.

Of course, that part is pretty obvious. That outage also helps the company come to more realistic understanding of it's uptime needs, and what types of outages it can really handle. But many companies have never had to deal with one, so we are stuck where we are.

donaldlandru

@dafyre said:

In your Dev environment, you have 3 servers... with 288GB of Ram, 64GB of RAM, and 16 GB of RAM... Assume RAM compatibility... What happens if you balance out those three servers and get them at least close to having the same amount of RAM?

Does that help you at all? If that is a good idea, then why not look at converting them to XenServer and switching to Local Storage? You could then replicate the VMs to each of the three hosts, or you could set up HA-Lizard.

The two smaller servers pre-date my time with the company and were likely back of truck specials. Both of these are slated to be replaced next year with a single server with similar specs to the big server. The smallest one is already maxed out and the other one doesn't make sense to upgrade just to retire.

I also don't need HA on these (I don't have HA today on these) so I think this is an opportunity to move to different platform.