ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    How Does Local Storage Offer High Availability

    Scheduled Pinned Locked Moved IT Discussion
    storagereplicated local storagerlshigh availabilitysanrisk
    95 Posts 7 Posters 36.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DashrenderD
      Dashrender
      last edited by

      @scottalanmiller

      Is it possible to have a system failover to another system with zero actual failure?

      of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller @Dashrender
        last edited by

        @Dashrender said:

        That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.

        You are stuck on the idea that your array always carries stateful data. That's an incorrect assumption. RAID 0 arrays can be perfectly functional when degraded if they are not used for stateful data. So the redundancy remains fully useful.

        DashrenderD 1 Reply Last reply Reply Quote 0
        • DashrenderD
          Dashrender @scottalanmiller
          last edited by

          @scottalanmiller said:

          I'm confused, though. Sure, you improved reliability (I'm confused about the perception bit too) but why did this make you change your mindset versus a single reliable server? Since you didn't use a single reliable server for comparison, what changed the mindset?

          I agree with Scott.

          Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @Dashrender
            last edited by

            @Dashrender said:

            of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

            There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.

            wirestyle22W 1 Reply Last reply Reply Quote 0
            • wirestyle22W
              wirestyle22 @scottalanmiller
              last edited by

              @scottalanmiller said:

              @Dashrender said:

              of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.

              There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.

              Even the fact that this is possible is amazing to me

              scottalanmillerS 1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller @Dashrender
                last edited by

                @Dashrender said:

                Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?

                And it doesn't mean that the old system was "bad", it could have just been normal.

                Two HP Proliant DL380 servers in a cluster (if the clustering is good) is way more reliable than a single Proliant DL380.

                But are two of them as reliable as a single HP Integrity SuperDome? Not likely. Those things never go down. Never. It's unheard of.

                Now which is more cost effective? Buying 100 Proliants instead of one SuperDome, of course. Which is more powerful? One SuperDome.

                wirestyle22W 1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @wirestyle22
                  last edited by

                  @wirestyle22 said:

                  Even the fact that this is possible is amazing to me

                  Ever see an HP Integrity withstand an artillery round? There is a video of an HP Integrity doing that (easily ten years old) and another one of an HP 3PAR SAN taking one (more recent, actually the video was made by @HPEStorageGuy who is here in the community.) The HP 3PAR is basically HP's "mini computer" class of storage (same class as the HP Integrity is in servers).

                  In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.

                  wirestyle22W 1 Reply Last reply Reply Quote 2
                  • wirestyle22W
                    wirestyle22 @scottalanmiller
                    last edited by

                    @scottalanmiller said:

                    @Dashrender said:

                    Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?

                    And it doesn't mean that the old system was "bad", it could have just been normal.

                    Two HP Proliant DL380 servers in a cluster (if the clustering is good) is way more reliable than a single Proliant DL380.

                    But are two of them as reliable as a single HP Integrity SuperDome? Not likely. Those things never go down. Never. It's unheard of.

                    Now which is more cost effective? Buying 100 Proliants instead of one SuperDome, of course. Which is more powerful? One SuperDome.

                    Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.

                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                    • wirestyle22W
                      wirestyle22 @scottalanmiller
                      last edited by

                      @scottalanmiller said:

                      @wirestyle22 said:

                      Even the fact that this is possible is amazing to me

                      Ever see an HP Integrity withstand an artillery round? There is a video of an HP Integrity doing that (easily ten years old) and another one of an HP 3PAR SAN taking one (more recent, actually the video was made by @HPEStorageGuy who is here in the community.) The HP 3PAR is basically HP's "mini computer" class of storage (same class as the HP Integrity is in servers).

                      In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.

                      That's wild. HP is doin' it right now.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • DashrenderD
                        Dashrender @scottalanmiller
                        last edited by

                        @scottalanmiller said:

                        @Dashrender said:

                        That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.

                        You are stuck on the idea that your array always carries stateful data. That's an incorrect assumption. RAID 0 arrays can be perfectly functional when degraded if they are not used for stateful data. So the redundancy remains fully useful.

                        really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...

                        scottalanmillerS 1 Reply Last reply Reply Quote 1
                        • scottalanmillerS
                          scottalanmiller @wirestyle22
                          last edited by

                          @wirestyle22 said:

                          Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.

                          So the HPE Proliant line is a micro-computer line based on the PC architecture. They are, just for clarify, the industry reference standard for commodity servers (generally considered the best in the business going back to the Compaq Proliant era in the mid-1990s.) They are very good, but they are "commodity". They are basically no different (more or less) than any PC you could build yourself with parts you order online (this is not totally true, there is a tonne of HPE unique engineering, they are tested like crazy, they have custom firmware and boards, they buy parts better than are on the open market, they add some proprietary stuff like the ILO, etc.) but more or less, these are PCs. The DL380 is the best selling server in the world, from any vendor, in any category.

                          The HPE Integrity line is a mini-computer line. They are not PCs. Most of them (not all) are built on the IA64 EPIC architecture and have RAS [Reliability, availability and serviceability] features that the PC architecture does not support. For example, hot swappable memory and CPUs are standard. Things like redundant controllers are common. The overall build and design is less about cost savings and more about never failing (or being fixed without going down.) It's a truly different class of device. They are also bigger devices, you don't put one in just to run your website. But you can fit more workloads on them, making it make more sense to invest in a single device that almost never fails.

                          wirestyle22W 1 Reply Last reply Reply Quote 2
                          • scottalanmillerS
                            scottalanmiller @wirestyle22
                            last edited by

                            @wirestyle22 said:

                            In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.

                            That's wild. HP is doin' it right now.

                            HP has been doing this stuff for decades. This isn't new technology. You can get similar from IBM, Oracle and Fujitsu. Dell does not dabble in the mini and mainframe market.

                            From IBM this would be the i and z series (i is mini and z is main). From Oracle this is the M series. Fujitsu makes the M series for Oracle (they co-design it and Fujitsu makes it) and sells it themselves under their own branding that I don't know as it is not sold in America, you just buy the Oracle branded ones.

                            1 Reply Last reply Reply Quote 2
                            • wirestyle22W
                              wirestyle22 @scottalanmiller
                              last edited by

                              @scottalanmiller said:

                              @wirestyle22 said:

                              Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.

                              So the HPE Proliant line is a micro-computer line based on the PC architecture. They are, just for clarify, the industry reference standard for commodity servers (generally considered the best in the business going back to the Compaq Proliant era in the mid-1990s.) They are very good, but they are "commodity". They are basically no different (more or less) than any PC you could build yourself with parts you order online (this is not totally true, there is a tonne of HPE unique engineering, they are tested like crazy, they have custom firmware and boards, they buy parts better than are on the open market, they add some proprietary stuff like the ILO, etc.) but more or less, these are PCs. The DL380 is the best selling server in the world, from any vendor, in any category.

                              The HPE Integrity line is a mini-computer line. They are not PCs. Most of them (not all) are built on the IA64 EPIC architecture and have RAS [Reliability, availability and serviceability] features that the PC architecture does not support. For example, hot swappable memory and CPUs are standard. Things like redundant controllers are common. The overall build and design is less about cost savings and more about never failing (or being fixed without going down.) It's a truly different class of device. They are also bigger devices, you don't put one in just to run your website. But you can fit more workloads on them, making it make more sense to invest in a single device that almost never fails.

                              Interesting. Thank you for the information.

                              scottalanmillerS 1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @Dashrender
                                last edited by

                                @Dashrender said:

                                really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...

                                Oh there will be a blip, the array has to re-initialize. It's not a transparent fail like a RAID 1 would be. But it can be automatic and very, very fast. Few people do this, but you can, no problem. You could probably get downtime to a couple of seconds.

                                DashrenderD 1 Reply Last reply Reply Quote 1
                                • DashrenderD
                                  Dashrender @scottalanmiller
                                  last edited by

                                  @scottalanmiller said:

                                  @Dashrender said:

                                  really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...

                                  Oh there will be a blip, the array has to re-initialize. It's not a transparent fail like a RAID 1 would be. But it can be automatic and very, very fast. Few people do this, but you can, no problem. You could probably get downtime to a couple of seconds.

                                  Interesting, I had no idea - what would it be good for?

                                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller @Dashrender
                                    last edited by

                                    @Dashrender said:

                                    Interesting, I had no idea - what would it be good for?

                                    Primarily caching or cache-like databases.

                                    If you were doing a large proxy cache, for example, this would be a great way to handle it. Don't invest too much money where it isn't needed, and in the case of a drive loss, the delay as you flush the cache and reload isn't too tragic.

                                    The need for this is rapidly going away because of SSDs. So I'm not about to run out and do this today, mind you. But five years ago or, far more likely, twenty years ago, this would have been absolutely viable and completely obvious if you had the use case for it. Today, meh, who needs RAID 0 for speed?

                                    Or any kind of read only caching system, even on a database.

                                    Or a short term use database where the data isn't useful after, say, a day. On Wall St. we had a lot of systems that took trillions of transactions per day and then, at the end of the day.... dropped them. With a RAID 0, you might just accept losing a few hours of data, recreate and be underway again because the streaming writes are more important than any potential reliability.

                                    1 Reply Last reply Reply Quote 1
                                    • dafyreD
                                      dafyre
                                      last edited by

                                      In a conversation with a few others, someone brought up the point that all anyone cares about is being able to access the service.

                                      And I agree with this. If the users are able to access the services consistently (reliably, perhaps?) and without data loss, that is the ultimate end goal.

                                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                                      • scottalanmillerS
                                        scottalanmiller @dafyre
                                        last edited by

                                        @dafyre said:

                                        In a conversation with a few others, someone brought up the point that all anyone cares about is being able to access the service.

                                        And I agree with this. If the users are able to access the services consistently (reliably, perhaps?) and without data loss, that is the ultimate end goal.

                                        That's the cat. More than one way to skin it 🙂

                                        Although it should be noted, that the ultimate end goal is doing that cost effectively. In IT, cost is always a factor. So even if we can improve uptime, it has to have an ROI that makes it make sense as well. Otherwise we'd all be running redundant mainframes for everything.

                                        1 Reply Last reply Reply Quote 3
                                        • scottalanmillerS
                                          scottalanmiller @wirestyle22
                                          last edited by

                                          @wirestyle22 in a similar vein to the RAS features, some systems like the IBM z series also have a computational verification that is unheard of in lesser platforms. The z series offers the ability to do everything (literally, every clock cycle) twice using two different processors. That way if a CPU fails, memory fails or there is a gamma radiation hit or sun spot or whatever and a bit flips, the system will catch the discrepancy and run the operation again.

                                          1 Reply Last reply Reply Quote 1
                                          • scottalanmillerS
                                            scottalanmiller
                                            last edited by

                                            I forgot about this topic and found it mentioned in a conversation. This thread was a great resource that never got linked anywhere useful. Now to figure out how to make it more referenceable.

                                            1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 4 / 5
                                            • First post
                                              Last post