How Does Local Storage Offer High Availability

wirestyle22

@wirestyle22 said:

Semantics?

Semantics are one of the most important things in IT. This isn't a theoretical experiment in language, this is a real problem that plagues SMB IT every day. Go on Spiceworks and the average conversation around storage is someone being hoodwinked by this very bit of semantics. They request the wrong thing, they get what they ask for and they end up paying a lot and getting something negative.

I understand everything that you guys have said here but you both agree. That is the confusing part of it for me.

Redundancy doesn't mean reliability.
Reliability doesn't mean Redundancy.

I would rather have my users complain up and down, calling me the worst SysAdmin ever yet have a better system overall. I think complication with no real reward is a huge problem in IT from what I have read and experienced.

Take my opinion with a grain of salt though. I have never made incredible claims about my knowledge. I can only speak of my experiences.

scottalanmiller

@dafyre said:

Up until my experience with an almost fully virtualized infrastructure, I would rather have reliable servers.

However, after my experience with virtualized infrastructure, my mindset changed.

It should not change. Resultant reliability is the only value.

dafyre

@scottalanmiller said:

If redundancy provides that reliability, no problem. If magic fairy dust does, that's fine too.

Where can I find 3 boxes of Magic fairy dust? My supplies are starting to run low, lol.

That's kinda been my whole point though. If redundancy doesn't provide a better perception of reliability, then why bother with it?

If I knew that redundancy wasn't going to help improve the perception of reliability, I'd much rather work on a single server that I knew was going to fail and restore it from backup when the failure happens.

I've been on both sides of that road.

wirestyle22

@dafyre said:

@scottalanmiller said:

If redundancy provides that reliability, no problem. If magic fairy dust does, that's fine too.

Where can I find 3 boxes of Magic fairy dust? My supplies are starting to run low, lol.

That's kinda been my whole point though. If redundancy doesn't provide a better perception of reliability, then why bother with it?

If I knew that redundancy wasn't going to help improve the perception of reliability, I'd much rather work on a single server that I knew was going to fail and restore it from backup when the failure happens.

I've been on both sides of that road.

Reality > Perception

I say this working at a place where uneducated perception is leaps and bounds the most annoying part of my job. I could write a book on it.

Dashrender

@scottalanmiller said:

@dafyre said:

I would suggest more to meet the definition of redundancy, RAID 1 would be a better suggestion. In RAID 1, a single disk failure becomes a need to replace the disk, but no lost data, and no down time (assuming hot swappable drives).

That's defining failover, not redundancy. And you are referring to the data, not the drives. If I have a RAID 0 failure and one dies, I still have a working drive. It's redundant at the drive level by either definition of redundant.

I guess the term independent in RAID is what drives @scottalanmiller point the most. Redundant Array of Independent Drives = so at a drive only level, Independent, the drives are Redundant, and they are in an Array..

Wow - I've never looked at it this way before.

dafyre

@scottalanmiller said:

@dafyre said:

Up until my experience with an almost fully virtualized infrastructure, I would rather have reliable servers.

However, after my experience with virtualized infrastructure, my mindset changed.

It should not change. Resultant reliability is the only value.

Right. Mine changed because the reliability of the single systems we had (on the budget that we had to work with) resulted in systems being not reliable as they should have been.

The resultant reliability of having two VMware servers with replicated storage was increased, because the perception was that the system was more reliable because things did not go down near as often as was happening otherwise.

Dashrender

@dafyre said:

@scottalanmiller said:

If redundancy provides that reliability, no problem. If magic fairy dust does, that's fine too.

Where can I find 3 boxes of Magic fairy dust? My supplies are starting to run low, lol.

That's kinda been my whole point though. If redundancy doesn't provide a better perception of reliability, then why bother with it?

If I knew that redundancy wasn't going to help improve the perception of reliability, I'd much rather work on a single server that I knew was going to fail and restore it from backup when the failure happens.

I've been on both sides of that road.

You keep using the term perception what does perception have to do with anything?

Dashrender

@dafyre said:

@scottalanmiller said:

@dafyre said:

Up until my experience with an almost fully virtualized infrastructure, I would rather have reliable servers.

However, after my experience with virtualized infrastructure, my mindset changed.

It should not change. Resultant reliability is the only value.

Right. Mine changed because the reliability of the single systems we had (on the budget that we had to work with) resulted in systems being not reliable as they should have been.

The resultant reliability of having two VMware servers with replicated storage was increased, because the perception was that the system was more reliable because things did not go down near as often as was happening otherwise.

That's not perception - that's reality. You found one option, an option through redundancy that provided you with reliability.

The lack of redundancy does not mean lack of reliability. You're continued stance on perception seems to imply that not having redundancy would mean you would have less or no reliability.

I'd argue, in the case of virtualization, redundancy is often a major player in reliability, but not a sole requirement.

scottalanmiller

@dafyre said:

@scottalanmiller said:

@dafyre said:

Up until my experience with an almost fully virtualized infrastructure, I would rather have reliable servers.

However, after my experience with virtualized infrastructure, my mindset changed.

It should not change. Resultant reliability is the only value.

Right. Mine changed because the reliability of the single systems we had (on the budget that we had to work with) resulted in systems being not reliable as they should have been.

The resultant reliability of having two VMware servers with replicated storage was increased, because the perception was that the system was more reliable because things did not go down near as often as was happening otherwise.

I'm confused, though. Sure, you improved reliability (I'm confused about the perception bit too) but why did this make you change your mindset versus a single reliable server? Since you didn't use a single reliable server for comparison, what changed the mindset?

scottalanmiller

@Dashrender said:

I'd argue, in the case of virtualization, redundancy is often a major player in reliability, but not a sole requirement.

I'd argue that virtualization is a red herring. It's good and we should always have it, and high availability systems always have (doing back to the 1960s.) But it's not a factor here.

Redundancy is the most common means of getting reliability, but it is definitely not the sole means.

scottalanmiller

@Dashrender said:

I guess the term independent in RAID is what drives @scottalanmiller point the most. Redundant Array of Independent Drives = so at a drive only level, Independent, the drives are Redundant, and they are in an Array..

Wow - I've never looked at it this way before.

I think the "it must mean the data" perception probably comes from the fact that many people state that RAID is about improving reliability. But it isn't. That's a big reason that people choose it, but RAID is about increasing speed, capacity and/or reliability by using cheap Winchester drives rather than using some other drive type. It's one of the three.

So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.

If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.

wirestyle22

@scottalanmiller said:

@Dashrender said:

I guess the term independent in RAID is what drives @scottalanmiller point the most. Redundant Array of Independent Drives = so at a drive only level, Independent, the drives are Redundant, and they are in an Array..

Wow - I've never looked at it this way before.

I think the "it must mean the data" perception probably comes from the fact that many people state that RAID is about improving reliability. But it isn't. That's a big reason that people choose it, but RAID is about increasing speed, capacity and/or reliability by using cheap Winchester drives rather than using some other drive type. It's one of the three.

So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.

If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.

That is definitely an interesting way to look at it.

Dashrender

@scottalanmiller said:

So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.

If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.

That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.

Dashrender

@scottalanmiller

Is it possible to have a system failover to another system with zero actual failure?

of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

scottalanmiller

@Dashrender said:

That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.

You are stuck on the idea that your array always carries stateful data. That's an incorrect assumption. RAID 0 arrays can be perfectly functional when degraded if they are not used for stateful data. So the redundancy remains fully useful.

Dashrender

@scottalanmiller said:

I'm confused, though. Sure, you improved reliability (I'm confused about the perception bit too) but why did this make you change your mindset versus a single reliable server? Since you didn't use a single reliable server for comparison, what changed the mindset?

I agree with Scott.

Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?

scottalanmiller

@Dashrender said:

of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.

wirestyle22

@scottalanmiller said:

@Dashrender said:

of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.

Even the fact that this is possible is amazing to me

scottalanmiller

@Dashrender said:

Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?

And it doesn't mean that the old system was "bad", it could have just been normal.

Two HP Proliant DL380 servers in a cluster (if the clustering is good) is way more reliable than a single Proliant DL380.

But are two of them as reliable as a single HP Integrity SuperDome? Not likely. Those things never go down. Never. It's unheard of.

Now which is more cost effective? Buying 100 Proliants instead of one SuperDome, of course. Which is more powerful? One SuperDome.

scottalanmiller

@wirestyle22 said:

Even the fact that this is possible is amazing to me

Ever see an HP Integrity withstand an artillery round? There is a video of an HP Integrity doing that (easily ten years old) and another one of an HP 3PAR SAN taking one (more recent, actually the video was made by @HPEStorageGuy who is here in the community.) The HP 3PAR is basically HP's "mini computer" class of storage (same class as the HP Integrity is in servers).

In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.