How Does Local Storage Offer High Availability

scottalanmiller

It has a large ecosystem: I didn't really understand that one, with the market springing up to fix it. A large ecosystem to me would mean there are tons of different devices available for development on the platform, and tons of development going on. Like OpenStack would be something I would call a fairly large ecosystem...

A good example is VMware. It is often sold as its key value being "a big ecosystem, tons of vendors make stuff for it."

Exactly, they need third party vendors to make their RAID for them, their backups for them, their storage replication for them, etc.

XenServer is then derided for having a small ecosystem. Conveniently the people making money selling VMware leave out that XenServer doesn't need one since all of the things that people make for VMware are built in to XenServer (RAID, DRBD, backup, etc.) Why would someone remake something that is already included?

It's like saying that you should buy Windows because of all the AV vendors and ignore Linux because it doesn't need them.

bbigford

@scottalanmiller said:

@BBigford said:

It's Redudant: We're just talking about controllers still, right? (Thinking about clusters).

No, the term means "two of something". Nothing more. If you feel the term redundancy means something positive, there is a misunderstanding.

Redundancy isn't inherently bad, but it is also not inherently good. The term carries no such connotation.

Ok, just wanted to make sure. I was hearing "two of something", obviously buying two of something is more than buying one of something though.

scottalanmiller

@BBigford said:

Ok, just wanted to make sure. I was hearing "two of something", obviously buying two of something is more than buying one of something though.

Doesn't mean that you have a way for two of them to be better than one, though. Redundant bills, redundant outages... all bad things.

scottalanmiller

Imagine if we applied it to something really silly, like licenses. I'll sell you "redundant Windows Server licenses." Literally, twice as many as you can use.

Sound good? No, of course not. And yet... that's actually better than dual controllers in most cases, because the double licenses is only a waste of money. But double controllers is often a waste of money AND they make the system more dangerous!

bbigford

@scottalanmiller said:

@BBigford said:

Ok, just wanted to make sure. I was hearing "two of something", obviously buying two of something is more than buying one of something though.

Doesn't mean that you have a way for two of them to be better than one, though. Redundant bills, redundant outages... all bad things.

Maybe I don't follow, but how would having a server cluster (just another redundant service on the network) be a bad thing? You're increasing your points of failure, sure, but you aren't relying on a single point...

scottalanmiller

@BBigford said:

Maybe I don't follow, but how would having a server cluster be a bad thing? You're increasing your points of failure, sure, but you aren't relying on a single point...

I didn't say that it was. I said that redundancy is neither good nor bad. You are trying to associate the value of the cluster with the concept of redundancy. Yes, in that case, redundancy was a tool used to get reliability. But I think you are confusing the means with the ends. It is the reliability improving that is what is good. That you used redundancy to get here is irrelevant.

bbigford

@scottalanmiller said:

@BBigford said:

Maybe I don't follow, but how would having a server cluster be a bad thing? You're increasing your points of failure, sure, but you aren't relying on a single point...

I didn't say that it was. I said that redundancy is neither good nor bad. You are trying to associate the value of the cluster with the concept of redundancy. Yes, in that case, redundancy was a tool used to get reliability. But I think you are confusing the means with the ends. It is the reliability improving that is what is good. That you used redundancy to get here is irrelevant.

Haha sorry Scott, I think that misconception you mentioned earlier is what is happening here. I am hearing you say one thing, and associating it with something else, causing confusion. Some things should be redundant (like clusters that house critical services), but to associate "redundant" as being important with *any *product/service is obviously ridiculous. Did I break that down alright?

scottalanmiller

@BBigford said:

@scottalanmiller said:

@BBigford said:

Maybe I don't follow, but how would having a server cluster be a bad thing? You're increasing your points of failure, sure, but you aren't relying on a single point...

I didn't say that it was. I said that redundancy is neither good nor bad. You are trying to associate the value of the cluster with the concept of redundancy. Yes, in that case, redundancy was a tool used to get reliability. But I think you are confusing the means with the ends. It is the reliability improving that is what is good. That you used redundancy to get here is irrelevant.

Haha sorry Scott, I think that misconception you mentioned earlier is what is happening here. I am hearing you say one thing, and associating it with something else, causing confusion. Some things should be redundant (like clusters that house critical services), but to associate "redundant" as being important with *any *product/service is obviously ridiculous. Did I break that down alright?

Right. For example... what you need is a certain level of reliability. If you get that from a redundant cluster, great. If you get it from a single machine that is more reliable, that's great too.

Think of it as you need something to hold up your coffee. You could choose all kinds of things, but a single brick is a pretty good one. Come back in 100 years, chances are, that brick will not have failed. Two pieces of wood, two bundles of straw, all kinds of redundant things would prove to not be as reliable as one brick.

scottalanmiller

Redundancy might be a means to reliability. It might not be. But what is important to remember is that reliability is the goal, redundancy is, at best, a potential means to that end.

dafyre

@scottalanmiller said:

Redundancy might be a means to reliability. It might not be. But what is important to remember is that reliability is the goal, redundancy is, at best, a potential means to that end.

Redundancy also means planning for failure. I would never build a single storage server without replication with out much protesting and wailing / gnashing of teeth. Storage systems do fail. Servers do fail. The network does fail.

Having redundancy in those failure situations is a good thing when the failure happens and redundancy works. Ideally, in that situation, nobody but the IT staff should notice that something died.

scottalanmiller

@dafyre said:

Redundancy also means planning for failure. I would never build a single storage server without replication with out much protesting and wailing / gnashing of teeth. Storage systems do fail. Servers do fail. The network does fail.

No, it does not. You missed the point of the conversation. Redundancy means two of something. It implies zero capability of that secondary device being able to protect something in the case of failure. Redundancy can just as easily mean inducing failure as protecting against it or just doing nothing (dual licenses.) The point here has been that IT pros are continuously using the term redundancy to mean something it does not which, in turn, confuses them and allows marketing people to say true things that IT people take to mean false ones - being self misleading.

That things fail and that having a way to protect against that is good is true, but you are mixing the needed result which is reliability with a tool that may help or may hurt or may be indifferent called redundancy.

scottalanmiller

@dafyre said:

Having redundancy in those failure situations is a good thing when the failure happens and redundancy works. Ideally, in that situation, nobody but the IT staff should notice that something died.

This misses the brick point. You are making the assumption that you can always make things safer with redundancy. That's not true. Many times redundancy can make it more dangerous (like with some storage) and the point is always reliability. What if someone makes a single product that is more reliable than two redundant devices can equal?

Once we start using redundancy as a proxy for reliability we start to be at risk of taking a current scenario (the server we have today is risky and needs a second node to be safe enough to use) and stop realizing when factors change. It's memorizing the solution pattern for a specific scenario and missing the factors that drive us to that pattern at the time.

We see this in SW constantly, people following a pattern based on the term redundancy but ignoring the reliability results, in fact often forgetting that reliability was the only reason that they considered redundancy in the first place. No business cares that they have two servers, they care that things work and that money isn't wasted. That's all. If that is achieved with redundancy, that's fine. But if it is achieved in another way, that's fine too.

In all cases, without exception, it is the resulting reliability, cost and risk that determine what is a good solution. Redundancy is commonly a part of a good solution, but never always. In many cases the cost of redundancy is higher than the risk it mitigates. In others, there are means to reliability that are more effective than redundancy. In others, redundancy is actually a negative and creates risk outright.

scottalanmiller

Maybe this is a good example:

Q: Is RAID 0 Redundant?

A: Actually, yes it is. It is redundant in the way that the term is used meaning that the array has "more than one disk." Does it have mirroring, parity, erasure encoding or any other means of protecting against disk failure? No, but the term redundant doesn't imply that. That's why the term RAID is still used, even when RAID 0 makes things far less reliable than if we did not have RAID at all.

dafyre

@scottalanmiller said:

Maybe this is a good example:

Q: Is RAID 0 Redundant?

A: Actually, yes it is. It is redundant in the way that the term is used meaning that the array has "more than one disk." Does it have mirroring, parity, erasure encoding or any other means of protecting against disk failure? No, but the term redundant doesn't imply that. That's why the term RAID is still used, even when RAID 0 makes things far less reliable than if we did not have RAID at all.

This is where we will have to disagree. If you google "define redundancy" without quotes, the second definition is listed as:

the inclusion of extra components that are not strictly necessary to functioning, in case of failure in other components.

In case of using individual drives, RAID 0 is not redundant at all. Because the other drives are necessary for the RAID 0 to function. In the case of a single disk failure, RAID 0 becomes lost data.

I would suggest more to meet the definition of redundancy, RAID 1 would be a better suggestion. In RAID 1, a single disk failure becomes a need to replace the disk, but no lost data, and no down time (assuming hot swappable drives).

@scottalanmiller said:

What if someone makes a single product that is more reliable than two redundant devices can equal?

At that point, it becomes show me a device that won't fail when it experiences a hardware problem. Then we're back to "It has two controllers."

Adding more components into the same chassis does not make it more reliable. It makes it a larger SPOF.

Once we start using redundancy as a proxy for reliability we start to be at risk of taking a current scenario (the server we have today is risky and needs a second node to be safe enough to use) and stop realizing when factors change. It's memorizing the solution pattern for a specific scenario and missing the factors that drive us to that pattern at the time.

I agree that we have to be careful to not recommend solutions by rote memorization. That's how a lot of folks wind up with RAID 5, instead of RAID 6 or OBR10. We also have to be careful, as you say, to get the best bang for our buck when building solutions. There are times when the business simply can't afford two servers for the project, and you have to make a good, reliable one and have good backups.

I get what you are saying about reliability. I think we are talking about two different types of reliability. You are speaking of a single device reliability ( a single server). I am thinking of perceived reliability -- the reliability of the whole system.

I have used these examples with you before. My SAN cluster appears to improve reliability at the top layer. In reality, one node blew out two drives last week, and since it was RAID 5, one node was down until we got new drives in it.

If we had been running on a single SAN device, then we would have been totally dead. However, since we had two that were fully replicated with automagic failover, nobody noticed a thing, and therefore our reliability appears to have increased because of the redundancy. The individual device reliability did not get better or worse, but it did have a failure.

No business cares that they have two servers, they care that things work and that money isn't wasted. That's all. If that is achieved with redundancy, that's fine. But if it is achieved in another way, that's fine too.

^ This, I can definitely agree with.

In all cases, without exception, it is the resulting reliability, cost and risk that determine what is a good solution.

Assuming you mean perceived reliability of the entire system (whether that be one server or ten), then I can agree with that.

Redundancy is commonly a part of a good solution, but never always. In many cases the cost of redundancy is higher than the risk it mitigates.

This is where the IT department has to understand the business and the end-goal that they have been mandated to achieve, and like I said before -- with the best bang for the buck.

scottalanmiller

@dafyre said:

This is where we will have to disagree. If you google "define redundancy" without quotes, the second definition is listed as:
the inclusion of extra components that are not strictly necessary to functioning, in case of failure in other components.

That's true, that's a definition of redundancy, but not the one in use. Since redundancy can mean that but does not necessarily, it doesn't really matter.

Even using the extra definition, it becomes difficult to asses when something taking over from another component still doesn't improve reliability. The result remains the same, redundancy is about having extra things, and doesn't necessarily improve the end goal of reliability.

Just look at the HP MSA SAN devices. They are truly redundant by both definitions. Yet their redundancy loses reliability. So regardless of definition, redundancy on its own is never a goal, it's only a means to an end and never a given.

Here is the Cambridge Dictionary's definition of redundant. Nothing suggesting what you found on Google:

redundant

scottalanmiller

@dafyre said:

In case of using individual drives, RAID 0 is not redundant at all. Because the other drives are necessary for the RAID 0 to function. In the case of a single disk failure, RAID 0 becomes lost data.

In the case of drives it IS redundancy. You only need one drive. Now you have two or more. It is the DATA that is not redundant in RAID 0. The drives are very redundant. That's the difference. RAID refers to the drives explicitly, not the data on the drives.

scottalanmiller

@dafyre said:

I would suggest more to meet the definition of redundancy, RAID 1 would be a better suggestion. In RAID 1, a single disk failure becomes a need to replace the disk, but no lost data, and no down time (assuming hot swappable drives).

That's defining failover, not redundancy. And you are referring to the data, not the drives. If I have a RAID 0 failure and one dies, I still have a working drive. It's redundant at the drive level by either definition of redundant.

scottalanmiller

@dafyre said:

What if someone makes a single product that is more reliable than two redundant devices can equal?

At that point, it becomes show me a device that won't fail when it experiences a hardware problem. Then we're back to "It has two controllers."

The point is that you are stuck on redundancy when it doesn't matter. You say "when it fails." But that's not relevant. What we care about is always the reliability of the whole system. And what we care about is that the whole system doesn't fail. A brick is less likely to fail that two pieces of wood, redundancy doesn't overcome the weakness of the wood.

The thing that changes here is the change of the thing failing at all. Sure IF it fails it would fail. But it's not likely to fail.

If one thing is less likely to fail than two things are to both fail, the single thing is more reliable.

scottalanmiller

@dafyre said:

Adding more components into the same chassis does not make it more reliable. It makes it a larger SPOF.

SPOF is a very dangerous term because it drives an emotional response and gets people to look away from the full system reliability. Adding more components into a single chassis may make it a bigger SPOF, it might make it more fragile or it might make it more reliable. Look at an EMC VMAX or an IBM z/90. They are SPOFS and you've never met someone whose had one die, ever. They run for decades. They use single device reliability instead of multiple device failover to achieve reliability.

That something is a SPOF isn't a problem. The only thing that matters is resulting reliability.

dafyre

@scottalanmiller said:

Just look at the HP MSA SAN devices. They are truly redundant by both definitions. Yet their redundancy loses reliability. So regardless of definition, redundancy on its own is never a goal, it's only a means to an end and never a given.

But see, you are going for device reliability, I'm not arguing that. I agree that here, having redundancy does nothing to help the reliability of the individual devices.

I am speaking hear to the appearance of reliability. The redundant system takes over when the main one fails, thus the system, overall, appears more reliable.

Here is the Cambridge Dictionary's definition of redundant. Nothing suggesting what you found on Google:

Dictionary.com -- http://dictionary.reference.com/browse/redundant?s=t (check out part d)

0_1455808295523_upload-057ff4ad-7bbf-4a8d-8575-a24230ea3a8a