Burned by Eschewing Best Practices

Carnival Boy

But isn't this 2-2-2 and not 3-2-1? I'm still not getting it.....

"The name refers to three (this is a soft point, it is often two or more) redundant virtualization host servers connected to two (or potentially more) redundant switches connected to a single storage device"

There is no single storage device here. Isn't it a "Tower of Redundancy" rather than a "Pyramid of Doom"? An expensive tower, but a tower. Or maybe a folly.

Carnival Boy

And what's the difference between "Inverted Pyramid of Doom" and the traditional term "Single Point of Failure (SPOF)", as in "a single SAN is a SPOF and therefore a bad solution. You need at least two for redundancy"?

DustinB3403

This is still a IVPD, because the servers are dependent on the NAS(s), its an improved IPVD (if such a thing could exist) but there are many points that can fail.

Making it an overly complicated solution, and by design reduces the reliability of the system as a whole. Which includes recoverablity, stability and reliability.

DustinB3403

A Single Point of Failure by its self won't bring the entire organization down.

Only that Point, and what it hosts is unavailable until it's fixed.

DustinB3403

The best way to think of a SPOF is to take any single server, and unplug it. Without any other backup servers for these functions to migrate to.

That is a SPOF. A system or server, that runs alone, hosting whatever it might be. And when it's down, it and only it are down until the problem is repaired.

Carnival Boy

@DustinB3403 said:

That is a SPOF. A system or server, that runs alone, hosting whatever it might be. And when it's down, it and only it are down until the problem is repaired.

That's not my understanding of SPOF. In the context of the OP, the "system" contains various pieces of hardware (hosts, switches & SANs). If he lacks redundancy in one area of this system (for example, by only having one switch), then that piece of non-redundant hardware is a SPOF. In the pyramid analogy, it is the '1' in 3-2-1 that represents a non-redundant component and the '1' is the SPOF.

DustinB3403

It still represents the same single point of failure. Any device (including a network switch, NAS, server, or network cable) that doesn't have a redundant "fail-safe" is a SPOF.

dafyre

The trick when building a "system" of anything... is to always be searching for things that have become an SPOF. So let's start with 3 servers and 2 x SANs (Network RAID-1, redundant, automatic failover, etc, etc), and 1 x Switch all in the same building connected to the same power grid and circuits...

The first SPOF is the Network switch. How do we fix it? Add another Network switch (this is assuming that every part of this system is in the same data center / rack).

The next is the fact that they are all on the same circuit. Have the elecrician separate them out.

What happens if the power blips? Need UPSes fo each circuit.
What happens if there's an extended power outage? Need a good generator capable of running for hours or days as neccesary.

What about cooling? That goes on its own circuit and hopefully is also connected to the generator...

The list could go on and on forever. The reason so many folks warn about the complexity is that once you've built this giant system... it is extremely complex... and the more reduntant you try to make it, the more complicated (and costly) it gets... The more moving parts you have, the more risk you run of missing something that is obviously another SPOF.

The idea is to find the balance of increased redundancy / automatic failover / reduced down time,cost, and complexity for your organization. It might not get you to the 5 nines. But it might get you say... 3 nines (99.9, right?) of uptime....

This will also involve playing nicely with the bean counters. They will suffer from sticker shock when you show them the price tag for what you want (regardless of if your organization can afford it or not). Work with them and explain how you came up with the system design and how it can save money in the long run. It would also be worth bringing them in at the start to find out exactly what the cost of down time is. So you're not spending half a million dollars to prevent $20 worth of down time.

Carnival Boy

Yes, and it seems to be me that two hosts, two switches and two SANs (2-2-2) offers a decent level of redundancy without over-complicating the system. That's where I'm not getting where the "doom" is coming from.

dafyre

@Carnival-Boy What happens if both SANs go down? Or both switches? Dont' laugh -- I've seen it happen... (and admittedly, I was the cause of it once or twice...)

Carnival Boy

Same as in any redundant system. Two paired disks could fail in a mirrored disk array, two paired power supplies could fail on a host. Redundancy is only about reducing risk, not eliminating it.

And in a lot of cases the risk of dual failure is higher than people would think because the two components aren't completely independent of one another. So failure on one could bring down the other. Human error would be a big cause of this. If you screw up one, there is a good chance you will screw up the other..

dafyre

I think the reason Scott uses the IPOD analogy is to make sure people aren't going into a SAN purchase, for instance, with two eyes blind. Most folks think "Oh, we have two of them, it won't EVER go down"...

dafyre

After rereading the thread... the IPOD / IVPD (inverted pyramid of doom)... Comes from any single thing that can completely bring down the "system".

As you mention the idea is to reduce risk of downtime, and a poorly implemented system does not actually reduce it, but increases it due to the complexity.

Several folks here would recommend for 2 hosts using Replication of VMs from HostA to HostB over using a SAN, because the local disk will almost always be faster than a disk on the network. You take the 10 VMs on HostA and replicate them to HostB, and then take the 10VMs on HostB and replicate them to HostA.

Depending on your hypervisor, you have yourself a nicely recoverable system that is less complex than a 2-2-2 system because you are eliminating the complexities of the SAN. You are also saving a good chunk of money as well. The down side to this sort of replication is that failures can cause lost data between replications.... (IE: VM1 is replicated every 5 minutes, and that VM Host A dies after 4 minutes and 45 secs, you likely will lose the last 5 minutes worth of data). But your only down time will be the amount of time it takes VM1 to reboot on HostB.

scottalanmiller

@dafyre said:

After rereading the thread... the IPOD / IVPD (inverted pyramid of doom)... Comes from any single thing that can completely bring down the "system".

No, that would just be a SPOF. IPOD is specifically a reference to an architecture where the SPOF is the critical "base" of the system, so the most important part to not have be fragile - combined with the top layer(s) being broad and redundant. The design of an IPOD is to be cheap(ish) to build and confusing so that it is easy to sell to management who tend to look from the "top down" and see a big redundant system while not actually providing any safety and, in fact, putting the client at risk. People who confuse redundancy with reliability (which is nearly everyone) are easily duped by this because it comes with the "Is it redundant? Yes" answer that people look for. They forget that redundancy doesn't matter, only reliability does. And the answer to "Is it reliable" is "No, not compared to better, cheaper, easier options."

IPOD is a very specific thing.... redundancy where it doesn't matter to fool the casual observer and cost cutting where people don't look or understand and hope that "magic" will keep them safe.

scottalanmiller

@dafyre said:

I think the reason Scott uses the IPOD analogy is to make sure people aren't going into a SAN purchase, for instance, with two eyes blind. Most folks think "Oh, we have two of them, it won't EVER go down"...

In an IPOD, there isn't two SANs, there are other things that are redundant, but not the storage. Sometimes, to try to fool the people who know enough to point out that there is only one SAN, they will try other tricks like saying the SAN itself is "fully redundant" because it has two controllers in it - something known to be risky and pointless which is why servers don't do it until you are pushing into full active/active.

scottalanmiller

@Carnival-Boy said:

Same as in any redundant system. Two paired disks could fail in a mirrored disk array, two paired power supplies could fail on a host. Redundancy is only about reducing risk, not eliminating it.

Correct, redundancy is just a tool in the hopes of achieving reliability. Redundancy can reduce risk, it can also increase it.

A good example of where redundancy routinely reduces risk a lot is RAID 1 mirroring. It takes "almost certain to have data loss" of a single drive to "almost never have data loss" of a mirrored pair.

A good example of where redundancy itself routinely increases risk is dual SAN controllers in non-active/active arrays (most SANs that SMBs can afford) where each controller can fail and kill the other controller and almost never provide any protection during a real world failure.

scottalanmiller

@Carnival-Boy said:

Two SANs offers a high degree of redundancy. I'm not sure where 3-2-1 fits in with that? He doesn't have a SPOF does he. He has redundant switches, redundant controllers, redundant SANs.

This is not an IPOD (aka 3-2-1.) I believe that his intended architecture is a 3-2-2. This is not a good design, but not nearly as bad as an IPOD.

The issue here is that there is redundancy, yes, but there is redundancy only through adding points of failure that are not necessary. So while there isn't a SPOF, there is unnecessary complexity as well as extra failure domains - three instead of one. So while this design, if implemented well, can be very reliable, it can never be as reliable as not having the storage layer separate nor can it compete on cost. So it isn't risky, it is unnecessarily risky while wasting money, time and effort.

scottalanmiller

When would a 3-2-2 design actually make sense, since I said it wasn't horrible? When it is actually more like a 20-2-2. The point of this kind of design is for when reliability is important but nowhere near the top priority and cost savings at scale matters, which it almost always does in any large company. Once you get to enough physical servers attached to the SAN layer you start to see the ability to lower the cost of storage while making it "reliable enough" to make sense for the business at hand. So typically in an enterprise you might see hundreds or thousands of physical hosts in the "top" layer connected to many switches connected to a pair of big enterprise SANs (EMC VMAX for example.) This is never as reliable as not having the SANs at all, that just can't happen. But what it can be is quite a bit cheaper than not having the SANs and while not the best reliability, it can be pretty reliable to a point where that's not a problem.

The key is that at large scale this design can be cheap. That's why at small scale only local storage makes sense because not only is it the most reliable and the fastest, at small scale it is always the cheapest too.

scottalanmiller

@Carnival-Boy said:

Yeah, but cost isn't an issue as money is no object.

While I don't agree that this is ever true, even if cost is no object, SANs would never make sense since their only value is cost savings at large scale. If cost was never the goal or considered at all but only reliability and speed, that would drive us to bigger, better local storage only.

scottalanmiller

@DustinB3403 said:

What this means is that there are so many potential points for failure, and that in the most basic approach of the 3-2-1 the "reliability" isn't at all reliable, or is only as reliable as your weakest link, which is often the NAS (or SAN).

A better way to word and understand that is that in a dependency chain, which is what the dashes represent, you are always less reliable than your weakest link. It's not just that the SAN represents a weak point in the design, which certainly it does, you also have three failure domains. Two of them are much more reliable than the SAN, but they do present risk on their own and can fail. So your risk is not only the risk of the weakest point failing but of the combined risk of each of the layers.

Think of it think way, you have to roll a die three times (once for each domain.) If you don't get the number that you need, you lose your data. Ready.... go...

On the first roll, the SAN roll, you have to get a 4, 5 or 6. Basically you have a 50% chance of failure.

On the second roll and the third roll, you can get a 2, 3, 4, 5 or 6. You are still rolling and taking risk, but the risk of each roll is much less.

Just because a layer is very, very reliable doesn't mean there isn't risk in it and the risk of the layer is cumulative. So that is why adding layers, even when they are really reliable ones, introduces a negative value in regards to risk and why you only add them when there is a clear reason to do so (cost savings or whatever.)