Burned by Eschewing Best Practices

Carnival Boy

Yes, and it seems to be me that two hosts, two switches and two SANs (2-2-2) offers a decent level of redundancy without over-complicating the system. That's where I'm not getting where the "doom" is coming from.

dafyre

@Carnival-Boy What happens if both SANs go down? Or both switches? Dont' laugh -- I've seen it happen... (and admittedly, I was the cause of it once or twice...)

Carnival Boy

Same as in any redundant system. Two paired disks could fail in a mirrored disk array, two paired power supplies could fail on a host. Redundancy is only about reducing risk, not eliminating it.

And in a lot of cases the risk of dual failure is higher than people would think because the two components aren't completely independent of one another. So failure on one could bring down the other. Human error would be a big cause of this. If you screw up one, there is a good chance you will screw up the other..

dafyre

I think the reason Scott uses the IPOD analogy is to make sure people aren't going into a SAN purchase, for instance, with two eyes blind. Most folks think "Oh, we have two of them, it won't EVER go down"...

dafyre

After rereading the thread... the IPOD / IVPD (inverted pyramid of doom)... Comes from any single thing that can completely bring down the "system".

As you mention the idea is to reduce risk of downtime, and a poorly implemented system does not actually reduce it, but increases it due to the complexity.

Several folks here would recommend for 2 hosts using Replication of VMs from HostA to HostB over using a SAN, because the local disk will almost always be faster than a disk on the network. You take the 10 VMs on HostA and replicate them to HostB, and then take the 10VMs on HostB and replicate them to HostA.

Depending on your hypervisor, you have yourself a nicely recoverable system that is less complex than a 2-2-2 system because you are eliminating the complexities of the SAN. You are also saving a good chunk of money as well. The down side to this sort of replication is that failures can cause lost data between replications.... (IE: VM1 is replicated every 5 minutes, and that VM Host A dies after 4 minutes and 45 secs, you likely will lose the last 5 minutes worth of data). But your only down time will be the amount of time it takes VM1 to reboot on HostB.

scottalanmiller

@dafyre said:

After rereading the thread... the IPOD / IVPD (inverted pyramid of doom)... Comes from any single thing that can completely bring down the "system".

No, that would just be a SPOF. IPOD is specifically a reference to an architecture where the SPOF is the critical "base" of the system, so the most important part to not have be fragile - combined with the top layer(s) being broad and redundant. The design of an IPOD is to be cheap(ish) to build and confusing so that it is easy to sell to management who tend to look from the "top down" and see a big redundant system while not actually providing any safety and, in fact, putting the client at risk. People who confuse redundancy with reliability (which is nearly everyone) are easily duped by this because it comes with the "Is it redundant? Yes" answer that people look for. They forget that redundancy doesn't matter, only reliability does. And the answer to "Is it reliable" is "No, not compared to better, cheaper, easier options."

IPOD is a very specific thing.... redundancy where it doesn't matter to fool the casual observer and cost cutting where people don't look or understand and hope that "magic" will keep them safe.

scottalanmiller

@dafyre said:

I think the reason Scott uses the IPOD analogy is to make sure people aren't going into a SAN purchase, for instance, with two eyes blind. Most folks think "Oh, we have two of them, it won't EVER go down"...

In an IPOD, there isn't two SANs, there are other things that are redundant, but not the storage. Sometimes, to try to fool the people who know enough to point out that there is only one SAN, they will try other tricks like saying the SAN itself is "fully redundant" because it has two controllers in it - something known to be risky and pointless which is why servers don't do it until you are pushing into full active/active.

scottalanmiller

@Carnival-Boy said:

Same as in any redundant system. Two paired disks could fail in a mirrored disk array, two paired power supplies could fail on a host. Redundancy is only about reducing risk, not eliminating it.

Correct, redundancy is just a tool in the hopes of achieving reliability. Redundancy can reduce risk, it can also increase it.

A good example of where redundancy routinely reduces risk a lot is RAID 1 mirroring. It takes "almost certain to have data loss" of a single drive to "almost never have data loss" of a mirrored pair.

A good example of where redundancy itself routinely increases risk is dual SAN controllers in non-active/active arrays (most SANs that SMBs can afford) where each controller can fail and kill the other controller and almost never provide any protection during a real world failure.

scottalanmiller

@Carnival-Boy said:

Two SANs offers a high degree of redundancy. I'm not sure where 3-2-1 fits in with that? He doesn't have a SPOF does he. He has redundant switches, redundant controllers, redundant SANs.

This is not an IPOD (aka 3-2-1.) I believe that his intended architecture is a 3-2-2. This is not a good design, but not nearly as bad as an IPOD.

The issue here is that there is redundancy, yes, but there is redundancy only through adding points of failure that are not necessary. So while there isn't a SPOF, there is unnecessary complexity as well as extra failure domains - three instead of one. So while this design, if implemented well, can be very reliable, it can never be as reliable as not having the storage layer separate nor can it compete on cost. So it isn't risky, it is unnecessarily risky while wasting money, time and effort.

scottalanmiller

When would a 3-2-2 design actually make sense, since I said it wasn't horrible? When it is actually more like a 20-2-2. The point of this kind of design is for when reliability is important but nowhere near the top priority and cost savings at scale matters, which it almost always does in any large company. Once you get to enough physical servers attached to the SAN layer you start to see the ability to lower the cost of storage while making it "reliable enough" to make sense for the business at hand. So typically in an enterprise you might see hundreds or thousands of physical hosts in the "top" layer connected to many switches connected to a pair of big enterprise SANs (EMC VMAX for example.) This is never as reliable as not having the SANs at all, that just can't happen. But what it can be is quite a bit cheaper than not having the SANs and while not the best reliability, it can be pretty reliable to a point where that's not a problem.

The key is that at large scale this design can be cheap. That's why at small scale only local storage makes sense because not only is it the most reliable and the fastest, at small scale it is always the cheapest too.

scottalanmiller

@Carnival-Boy said:

Yeah, but cost isn't an issue as money is no object.

While I don't agree that this is ever true, even if cost is no object, SANs would never make sense since their only value is cost savings at large scale. If cost was never the goal or considered at all but only reliability and speed, that would drive us to bigger, better local storage only.

scottalanmiller

@DustinB3403 said:

What this means is that there are so many potential points for failure, and that in the most basic approach of the 3-2-1 the "reliability" isn't at all reliable, or is only as reliable as your weakest link, which is often the NAS (or SAN).

A better way to word and understand that is that in a dependency chain, which is what the dashes represent, you are always less reliable than your weakest link. It's not just that the SAN represents a weak point in the design, which certainly it does, you also have three failure domains. Two of them are much more reliable than the SAN, but they do present risk on their own and can fail. So your risk is not only the risk of the weakest point failing but of the combined risk of each of the layers.

Think of it think way, you have to roll a die three times (once for each domain.) If you don't get the number that you need, you lose your data. Ready.... go...

On the first roll, the SAN roll, you have to get a 4, 5 or 6. Basically you have a 50% chance of failure.

On the second roll and the third roll, you can get a 2, 3, 4, 5 or 6. You are still rolling and taking risk, but the risk of each roll is much less.

Just because a layer is very, very reliable doesn't mean there isn't risk in it and the risk of the layer is cumulative. So that is why adding layers, even when they are really reliable ones, introduces a negative value in regards to risk and why you only add them when there is a clear reason to do so (cost savings or whatever.)

scottalanmiller

@Carnival-Boy said:

But isn't this 2-2-2 and not 3-2-1? I'm still not getting it.....

I'm playing catch up here so seeing that you nailed this. Yes it is a 2-2-2 or "column" design, far better than a IPOD. But it is not nearly as good as a 2. He has six pieces of gear to fail in three failure domains. What he has is far better (in terms of reliability) than a single server if done well, but not nearly as good as two servers without the external storage. The external storage and the need for external networking in the middle of the storage and servers triples the failure domains without adding anything of value - in fact beyond the risk it only introduces cost, complexity and latency. No upsides, lots of downsides.

scottalanmiller

@Carnival-Boy said:

Yes, and it seems to be me that two hosts, two switches and two SANs (2-2-2) offers a decent level of redundancy without over-complicating the system. That's where I'm not getting where the "doom" is coming from.

Any unnecessary complication is over-complicated. You don't design a system to be "less than ideal" for no reason. The idea that the system isn't "terrible" is correct as long as you don't take into consideration the alternatives. He could have a system that is easier, safer, faster, simpler and cheaper. Why sacrifice all of those things just because something worse in all those ways is still "good enough?" You don't. You go for the clear win. His design, while "good enough" for nearly any scenario, only looks that way if you are dealing with raw numbers rather than the relative ones provided by other approaches.

To think of it another way - if you go to buy a car and you are going to buy a Ford Focus, would you be fine paying $80K for it? If you had no idea what cars cost and never compared other options, sure, a car is a miracle of engineering and over the life of a car you probably get more than $80K of value of it. So you would spend that money. BUT what if you knew that you could buy that car for $20K? Would you still be happy and recommend that someone spend $80K on it when you know that the market value is only $20K and you can go anywhere and buy it for that?

In one case the raw value to you of "a car" might be $80K. But that doesn't mean that you should pay that much when you have the option of getting the car that you need for far less. The $80K would be good enough if better options were not readily available. But given that they are, it's not a good decision.

In the case of IT, imagine that IT is like a car buying consultant. Would you be happy if your car buying consultant sold you an $80K Ford Focus because he knew you could afford it and that it was worth that much to you do have a car? Of course not, you'd say that he wasn't doing his job and looking for the best value. That's what is happening here. The IT guy's job is to not just know how to spend money but how to get good IT value without wasting money. But in this situation, the IT guy is delivering a system worth less but spending tons more on it.

DustinB3403

Another one for the pile, not nearly as bad as some others, but in this case the Hypervisor infrastructure was setup on a singular Spinning Rust drive, which when it failed killed the entire Hypervisor host.

In this case though, they have a XenPool so recovery should be simple enough, install a new drive and reconfigure the host, lastly rejoin it to the pool.

But had they configured the host from a USB drive, they could simply install the backup USB drive and be up and running in a matter of moment.

DustinB3403

To boot on the above's system the designer left the battery backup off of their iSCSI storage unit which houses all of their VM's.

"...........ugh..."

DustinB3403

Now quite sure where this post belongs here so I'm placing it here.

An IT person is looking to setup a hypervisor setup, with Server 2011 SBS, and looking for advice, but doesn't appear to be looking at actual business needs and weighing the options.

Just going off of some information he heard from somewhere.

MattSpeller

This is a good thread & I like it, but I would be curious to see it's opposite as well - Burned by BP

scottalanmiller

@MattSpeller said:

This is a good thread & I like it, but I would be curious to see it's opposite as well - Burned by BP

Ever seen that happen? If it is even possible, then it can't be a BP.

MattSpeller

@scottalanmiller if doing windows updates is best practice, that's an easy one

I was thinking server configs that conflict or other goodies like that