ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Where to find "best practice" for any given IT scenario

    Water Closet
    11
    58
    4.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      Carnival Boy
      last edited by Carnival Boy

      OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

      One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

      DashrenderD scottalanmillerS 2 Replies Last reply Reply Quote 0
      • DashrenderD
        Dashrender @Carnival Boy
        last edited by

        @Carnival-Boy said:

        OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

        One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

        I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math). From what I recall, 3.3 TB has like a 30% chance of hitting a URE, AKA total failure of the array. At something around 12TB there is statistically a 100% chance of hitting a URE (OK it might actually be 99.99%)

        C scottalanmillerS 2 Replies Last reply Reply Quote 0
        • DashrenderD
          Dashrender @scottalanmiller
          last edited by

          @scottalanmiller said:

          @Dashrender said:

          @scottalanmiller said:

          Best practice is to simply remove it from consideration to clarify the remaining choices.

          This reminds me of Darren's talk at SpiceWorld - only give the CEO/CFO the choices that you approve. Never provide them one that you don't want, they'll always pick that one.

          By providing it, you are presenting it as an option. Basically meaning you approved it. You can tell them your top choices, but if you include it in the list, it's approved to some degree or the conversation is confused.

          What surprises me is how often IT people will present completely unreasonable options as options to management. If your car got a flat, would you offer to 1) fix the flat or 2) set the car on fire? No, you would not offer something ridiculous that isn't reasonable. But IT often does this to management.

          The more likely scenario is that management will reject all provided solutions and ask why it can't be done cheaper. Of course it can be done cheaper, but with orders of magnitude more risk. What is the recommendation then?

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • dafyreD
            dafyre
            last edited by

            Using RAID 6 or RAID 10 which are both safer than RAID 5.

            1 Reply Last reply Reply Quote 0
            • C
              Carnival Boy @Dashrender
              last edited by

              @Dashrender said:

              I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math).

              Cool. Facts are important here. A failure probability of 0.001% is 100 times higher than 0.00001%, so on that grounds it is an order of magnitude less reliable. But both are such tiny numbers that they could be ignored. That's where 'slightly' more reliable would also apply.

              To go back to your car analogy, a Fiat is (probably) an order of magnitude less reliable than a Honda, but both are so reliable that you wouldn't necessarily say buying a Honda is best practice. Importing a car from North Korea might, however, be considered bad practice.

              scottalanmillerS 1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller @Carnival Boy
                last edited by

                @Carnival-Boy said:

                OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                Can you even buy 300GB drives today? 🙂

                Reliability should never be considered from a point of "already failed", that misses part of the big picture. A RAID 5 array is more likely to experience a drive failure than RAID 10 as a starting point. We need to think about the total reliability, not the reliability from a single scenario.

                Imagine this question to demonstrate why this is important:

                "Which is more likely to survive a front end collision of 20mph, a Volvo C70 or a Ford Pinto?" You'd say the Volvo C70, of course.

                But that assumes both cars HAVE had that accident. What if that wasn't the whole scenario? Let's ask again...

                "Which is more likely to injure its passengers, a Volvo C70 driving 50pmh on the highway or a Ford Pinto sitting idle in a garage?"

                Suddenly the tables turn, because while one is more likely to survive an accident, the other is safer by avoiding the accident which is even more effective.

                1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @Dashrender
                  last edited by

                  @Dashrender said:

                  @scottalanmiller said:

                  @Dashrender said:

                  @scottalanmiller said:

                  Best practice is to simply remove it from consideration to clarify the remaining choices.

                  This reminds me of Darren's talk at SpiceWorld - only give the CEO/CFO the choices that you approve. Never provide them one that you don't want, they'll always pick that one.

                  By providing it, you are presenting it as an option. Basically meaning you approved it. You can tell them your top choices, but if you include it in the list, it's approved to some degree or the conversation is confused.

                  What surprises me is how often IT people will present completely unreasonable options as options to management. If your car got a flat, would you offer to 1) fix the flat or 2) set the car on fire? No, you would not offer something ridiculous that isn't reasonable. But IT often does this to management.

                  The more likely scenario is that management will reject all provided solutions and ask why it can't be done cheaper. Of course it can be done cheaper, but with orders of magnitude more risk. What is the recommendation then?

                  You say that it cannot be done cheaper while meeting goals. Ask them what goal they want to drop to reduce cost.

                  1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @Dashrender
                    last edited by

                    @Dashrender said:

                    @Carnival-Boy said:

                    OK, take two typical SMB servers, each with 12 x 300GB disks. One is configured with RAID 10 and one is configured with RAID 5.

                    One of the disks in each machine fails and is replaced. What is the probability in each case that the array will not rebuild successfully? Roughly speaking.

                    I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math). From what I recall, 3.3 TB has like a 30% chance of hitting a URE, AKA total failure of the array. At something around 12TB there is statistically a 100% chance of hitting a URE (OK it might actually be 99.99%)

                    Not that risky on the small SAS drives that are implied. But still riskier.

                    1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller @Carnival Boy
                      last edited by

                      @Carnival-Boy said:

                      @Dashrender said:

                      I can see Scott in the corner right now doing the math (or just posting a link to where he's already done the math).

                      Cool. Facts are important here. A failure probability of 0.001% is 100 times higher than 0.00001%, so on that grounds it is an order of magnitude less reliable. But both are such tiny numbers that they could be ignored. That's where 'slightly' more reliable would also apply.

                      Easy way to think of it is.... RAID 10 you should expect to go a lifetime without hearing about anyone who has ever had this issue. RAID 5 you should expect multiple complete failures in your career.

                      RAID 10 failure rates are less than 1 in 80,000 array years. RAID 5 is closer to 1 in 20.

                      There are so many factors that go into this from drives being more likely to fail, longer time for rebuilds, risk during rebuild, rebuild causing other drives to fail, risk of memory issues, etc.

                      1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller
                        last edited by

                        Based on using the different RAID types, of course.

                        1 Reply Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller
                          last edited by

                          Trying to eyeball the math, at 3.3TB of usable data, that RAID 5 array would fail way over 50% of the time with consumer class drives (like Red Pro.) So enterprise drives (like RE) which are 10x more reliable in regards to URE we would expect rebuilt risk from URE alone to be 5% or higher.

                          That is a one in twenty chance that the RAID 5 array would lose all of its data. This does not take into account secondary drive failure risk which is pretty big as well.

                          I would not put a one in twenty or maybe one in ten chance of failure on the same playing field as "so reliable no study can measure it completely." RAID 10 failures at 80,000 array years was only the known healthy rate, all that is know is that it is more reliable than that. Zero failures at 80,000 array years!

                          1 Reply Last reply Reply Quote 0
                          • C
                            Carnival Boy
                            last edited by

                            OK, RAID 5 isn't best practice. That's a relatively easy one. Give me some more examples where the term "best practice" might apply. I'm not convinced the term is that meaningful.

                            I'm having an extension built on my house at the moment, and I hear the term used quite a bit by my builders. There's building regulations that are legally required and there's ones that are best practice. For example, a shaver point should be located at least 30cm from the sink. That's not a legal requirement, but it's best practice. Smoke detectors should be mains powered not battery powered. Again, that's best practice rather than a legal requirement. These practices are pretty formal though - either by the manufacturer, or by the building regulators. I don't see much equivalence in the IT industry (sadly, as it would be super useful).

                            scottalanmillerS 1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller
                              last edited by scottalanmiller

                              Best Practice: If data is valuable enough to be stored, it should be backed up.

                              1 Reply Last reply Reply Quote 1
                              • scottalanmillerS
                                scottalanmiller @Carnival Boy
                                last edited by

                                @Carnival-Boy said:

                                OK, RAID 5 isn't best practice. That's a relatively easy one.

                                Actually it is a hard one, while it is a well documented best practice among storage experts, the industry as a whole lacks that expertise and pushes it heavily.

                                1 Reply Last reply Reply Quote 0
                                • C
                                  Carnival Boy
                                  last edited by

                                  It's an easy one for anyone who hangs around the same forums you do 🙂

                                  1 Reply Last reply Reply Quote 2
                                  • scottalanmillerS
                                    scottalanmiller
                                    last edited by

                                    Another best practice: virtualize every workload (unless it is impossible to do so)

                                    dafyreD 1 Reply Last reply Reply Quote 0
                                    • dafyreD
                                      dafyre @scottalanmiller
                                      last edited by

                                      @scottalanmiller said:

                                      Another best practice: virtualize every workload (unless it is impossible to do so)

                                      What are some workloads it would be impossible to virtualize? With the exception of real-time, ulta-low latency requirements, I cannot think of anything.

                                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                                      • scottalanmillerS
                                        scottalanmiller @dafyre
                                        last edited by

                                        @dafyre said:

                                        What are some workloads it would be impossible to virtualize? With the exception of real-time, ulta-low latency requirements, I cannot think of anything.

                                        Those and ones with very specific hardware requirements either technically or politically. That's about it. It is rare enough that it is effective to just say "never".

                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          Carnival Boy
                                          last edited by

                                          Workloads that you can't get working virtualised for whatever reason. I couldn't get Hamachi to work virtualised. Googling suggested a common problem with Hamachi not liking the VMware network drivers or something.

                                          I've virtualised our firewall. I wonder if there's an argument that says I shouldn't because it means I have a hypervisor on a public facing host. Maybe? I dunno, could that be a security risk? It's not something I'm going to lose any sleep over.

                                          scottalanmillerS dafyreD 2 Replies Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @Carnival Boy
                                            last edited by

                                            @Carnival-Boy said:

                                            I've virtualised our firewall. I wonder if there's an argument that says I shouldn't because it means I have a hypervisor on a public facing host. Maybe? I dunno, could that be a security risk? It's not something I'm going to lose any sleep over.

                                            You can virtualize that without exposing the hypervisor in any way.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 3 / 3
                                            • First post
                                              Last post