ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Where to find "best practice" for any given IT scenario

    Water Closet
    11
    58
    4.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @Dashrender
      last edited by

      @Dashrender said:

      @scottalanmiller said:

      Best practice is to simply remove it from consideration to clarify the remaining choices.

      This reminds me of Darren's talk at SpiceWorld - only give the CEO/CFO the choices that you approve. Never provide them one that you don't want, they'll always pick that one.

      By providing it, you are presenting it as an option. Basically meaning you approved it. You can tell them your top choices, but if you include it in the list, it's approved to some degree or the conversation is confused.

      What surprises me is how often IT people will present completely unreasonable options as options to management. If your car got a flat, would you offer to 1) fix the flat or 2) set the car on fire? No, you would not offer something ridiculous that isn't reasonable. But IT often does this to management.

      DashrenderD 1 Reply Last reply Reply Quote 0
      • DustinB3403D
        DustinB3403
        last edited by DustinB3403

        Discussing the ones you don't want simple means others will get confused. It's a great way to avoid unnecessary conversations.

        1 Reply Last reply Reply Quote 0
        • C
          Carnival Boy @scottalanmiller
          last edited by

          @scottalanmiller said:

          @Carnival-Boy said:

          RAID 5 works. I'd consider it not be generally best practice on the grounds that it is slightly less reliable than RAID 10.

          That's not at all why I consider it not a best practice. The reasons are complicated, far more than can be distilled to a quick statement. Also, RAID 5 does not work by most business definitions - it fails to provide the level of protection assumed or required. That's one piece.

          It isn't that RAID 5 is slightly less than ten, it is that it is a full order of magnitude less than RAID 6 which, in turn, is less than RAID 10. But that isn't it either.

          There are business scenarios where RAID 5, risky as it is, is certainly "safe enough". The issue becomes that when RAID 5 is "safe enough" there are other factors. It becomes too costly to make it safe enough, it fails to scale and remain safe, it isn't fast enough, etc. The combination of business factors around safety, speed, capacity and cost, no matter which one you optimize for, RAID 5 doesn't come out to be the right one, ever. It can be functional, but it can't ever be the right choice.

          This, in turn, means that knowing this having RAID 5 in our decision matrix just increases the chances that we, as emotional humans, will get confused by the extra choice and make bad decisions. We do our best decision making when known bad choices are removed from consideration. If a computer was factoring all of the issues and making a determination having RAID 5 in the mix would not matter. But for humans, it really matters.

          Isn't that pretty much what I said - it's slightly less reliable than RAID 10?

          DustinB3403D scottalanmillerS 2 Replies Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403 @Carnival Boy
            last edited by

            @Carnival-Boy It's not just a little bit, its 10's if not 100's of scales less reliable in a recovery situation.

            The likelihood of recovering from a RAID 5 failure vs a RAID 10 failure is apples vs whales.

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @Carnival Boy
              last edited by

              @Carnival-Boy said:

              Isn't that pretty much what I said - it's slightly less reliable than RAID 10?

              I would not use slightly when "order of magnitude" is involved 🙂

              1 Reply Last reply Reply Quote 0
              • C
                Carnival Boy
                last edited by Carnival Boy

                OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                DashrenderD scottalanmillerS 2 Replies Last reply Reply Quote 0
                • DashrenderD
                  Dashrender @Carnival Boy
                  last edited by

                  @Carnival-Boy said:

                  OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                  One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                  I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math). From what I recall, 3.3 TB has like a 30% chance of hitting a URE, AKA total failure of the array. At something around 12TB there is statistically a 100% chance of hitting a URE (OK it might actually be 99.99%)

                  C scottalanmillerS 2 Replies Last reply Reply Quote 0
                  • DashrenderD
                    Dashrender @scottalanmiller
                    last edited by

                    @scottalanmiller said:

                    @Dashrender said:

                    @scottalanmiller said:

                    Best practice is to simply remove it from consideration to clarify the remaining choices.

                    This reminds me of Darren's talk at SpiceWorld - only give the CEO/CFO the choices that you approve. Never provide them one that you don't want, they'll always pick that one.

                    By providing it, you are presenting it as an option. Basically meaning you approved it. You can tell them your top choices, but if you include it in the list, it's approved to some degree or the conversation is confused.

                    What surprises me is how often IT people will present completely unreasonable options as options to management. If your car got a flat, would you offer to 1) fix the flat or 2) set the car on fire? No, you would not offer something ridiculous that isn't reasonable. But IT often does this to management.

                    The more likely scenario is that management will reject all provided solutions and ask why it can't be done cheaper. Of course it can be done cheaper, but with orders of magnitude more risk. What is the recommendation then?

                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                    • dafyreD
                      dafyre
                      last edited by

                      Using RAID 6 or RAID 10 which are both safer than RAID 5.

                      1 Reply Last reply Reply Quote 0
                      • C
                        Carnival Boy @Dashrender
                        last edited by

                        @Dashrender said:

                        I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math).

                        Cool. Facts are important here. A failure probability of 0.001% is 100 times higher than 0.00001%, so on that grounds it is an order of magnitude less reliable. But both are such tiny numbers that they could be ignored. That's where 'slightly' more reliable would also apply.

                        To go back to your car analogy, a Fiat is (probably) an order of magnitude less reliable than a Honda, but both are so reliable that you wouldn't necessarily say buying a Honda is best practice. Importing a car from North Korea might, however, be considered bad practice.

                        scottalanmillerS 1 Reply Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller @Carnival Boy
                          last edited by

                          @Carnival-Boy said:

                          OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                          One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                          Can you even buy 300GB drives today? 🙂

                          Reliability should never be considered from a point of "already failed", that misses part of the big picture. A RAID 5 array is more likely to experience a drive failure than RAID 10 as a starting point. We need to think about the total reliability, not the reliability from a single scenario.

                          Imagine this question to demonstrate why this is important:

                          "Which is more likely to survive a front end collision of 20mph, a Volvo C70 or a Ford Pinto?" You'd say the Volvo C70, of course.

                          But that assumes both cars HAVE had that accident. What if that wasn't the whole scenario? Let's ask again...

                          "Which is more likely to injure its passengers, a Volvo C70 driving 50pmh on the highway or a Ford Pinto sitting idle in a garage?"

                          Suddenly the tables turn, because while one is more likely to survive an accident, the other is safer by avoiding the accident which is even more effective.

                          1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @Dashrender
                            last edited by

                            @Dashrender said:

                            @scottalanmiller said:

                            @Dashrender said:

                            @scottalanmiller said:

                            Best practice is to simply remove it from consideration to clarify the remaining choices.

                            This reminds me of Darren's talk at SpiceWorld - only give the CEO/CFO the choices that you approve. Never provide them one that you don't want, they'll always pick that one.

                            By providing it, you are presenting it as an option. Basically meaning you approved it. You can tell them your top choices, but if you include it in the list, it's approved to some degree or the conversation is confused.

                            What surprises me is how often IT people will present completely unreasonable options as options to management. If your car got a flat, would you offer to 1) fix the flat or 2) set the car on fire? No, you would not offer something ridiculous that isn't reasonable. But IT often does this to management.

                            The more likely scenario is that management will reject all provided solutions and ask why it can't be done cheaper. Of course it can be done cheaper, but with orders of magnitude more risk. What is the recommendation then?

                            You say that it cannot be done cheaper while meeting goals. Ask them what goal they want to drop to reduce cost.

                            1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller @Dashrender
                              last edited by

                              @Dashrender said:

                              @Carnival-Boy said:

                              OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                              One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                              I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math). From what I recall, 3.3 TB has like a 30% chance of hitting a URE, AKA total failure of the array. At something around 12TB there is statistically a 100% chance of hitting a URE (OK it might actually be 99.99%)

                              Not that risky on the small SAS drives that are implied. But still riskier.

                              1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @Carnival Boy
                                last edited by

                                @Carnival-Boy said:

                                @Dashrender said:

                                I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math).

                                Cool. Facts are important here. A failure probability of 0.001% is 100 times higher than 0.00001%, so on that grounds it is an order of magnitude less reliable. But both are such tiny numbers that they could be ignored. That's where 'slightly' more reliable would also apply.

                                Easy way to think of it is.... RAID 10 you should expect to go a lifetime without hearing about anyone who has ever had this issue. RAID 5 you should expect multiple complete failures in your career.

                                RAID 10 failure rates are less than 1 in 80,000 array years. RAID 5 is closer to 1 in 20.

                                There are so many factors that go into this from drives being more likely to fail, longer time for rebuilds, risk during rebuild, rebuild causing other drives to fail, risk of memory issues, etc.

                                1 Reply Last reply Reply Quote 0
                                • scottalanmillerS
                                  scottalanmiller
                                  last edited by

                                  Based on using the different RAID types, of course.

                                  1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller
                                    last edited by

                                    Trying to eyeball the math, at 3.3TB of usable data, that RAID 5 array would fail way over 50% of the time with consumer class drives (like Red Pro.) So enterprise drives (like RE) which are 10x more reliable in regards to URE we would expect rebuilt risk from URE alone to be 5% or higher.

                                    That is a one in twenty chance that the RAID 5 array would lose all of its data. This does not take into account secondary drive failure risk which is pretty big as well.

                                    I would not put a one in twenty or maybe one in ten chance of failure on the same playing field as "so reliable no study can measure it completely." RAID 10 failures at 80,000 array years was only the known healthy rate, all that is know is that it is more reliable than that. Zero failures at 80,000 array years!

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      Carnival Boy
                                      last edited by

                                      OK, RAID 5 isn't best practice. That's a relatively easy one. Give me some more examples where the term "best practice" might apply. I'm not convinced the term is that meaningful.

                                      I'm having an extension built on my house at the moment, and I hear the term used quite a bit by my builders. There's building regulations that are legally required and there's ones that are best practice. For example, a shaver point should be located at least 30cm from the sink. That's not a legal requirement, but it's best practice. Smoke detectors should be mains powered not battery powered. Again, that's best practice rather than a legal requirement. These practices are pretty formal though - either by the manufacturer, or by the building regulators. I don't see much equivalence in the IT industry (sadly, as it would be super useful).

                                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                                      • scottalanmillerS
                                        scottalanmiller
                                        last edited by scottalanmiller

                                        Best Practice: If data is valuable enough to be stored, it should be backed up.

                                        1 Reply Last reply Reply Quote 1
                                        • scottalanmillerS
                                          scottalanmiller @Carnival Boy
                                          last edited by

                                          @Carnival-Boy said:

                                          OK, RAID 5 isn't best practice. That's a relatively easy one.

                                          Actually it is a hard one, while it is a well documented best practice among storage experts, the industry as a whole lacks that expertise and pushes it heavily.

                                          1 Reply Last reply Reply Quote 0
                                          • C
                                            Carnival Boy
                                            last edited by

                                            It's an easy one for anyone who hangs around the same forums you do 🙂

                                            1 Reply Last reply Reply Quote 2
                                            • 1
                                            • 2
                                            • 3
                                            • 2 / 3
                                            • First post
                                              Last post