ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Hot Swap vs. Blind Swap

    Scheduled Pinned Locked Moved Announcements
    storageraidhot swapblind swapcold swap
    66 Posts 10 Posters 26.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @BRRABill
      last edited by

      @BRRABill said:

      Can you pull a hotplug drive out at any time, or is that dependent on server manufacturer and RAID card?

      Depends what you are asking.

      The hardware will determine if pulling a hot plug drive will cause a short. Hot plug hardware will allow a random drive yank to not cause an electrical issue.

      Hot plug software will allow the OS to allow you to tell it that a drive has been removed and then have it add it when you replace it.

      Blind swap allows you to just walk up to an array, pull a drive without preparing anything and put a new one in without needing to tell them system what you have done.

      1 Reply Last reply Reply Quote 0
      • BRRABillB
        BRRABill
        last edited by

        I ask because I had an issue yesterday on our DELL server. Which, admittedly, is very, very old. Experienced, I should say. No one likes to be called old.

        It's our main data server. One of two servers that really matter.

        We have 4 drives in a RAID5 array. (This is from the dark ages when that was considered OK.)

        I went into the server room for something else, and noticed one of the drives was blinking amber. I go from a 1 to a 5 on the 1 to 10 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the old drive. No problem. I put in the new drive, no problem. I go to log in to start rebuilding the array, and I notice that the server is rebooting. Hmm, that's odd. I look at the drive. Now TWO of the four are blinking amber. I've now gone to a 10, LOL.

        Turns out a second drive failed after I did the hot plug. I'm not sure if it was just random (which seems unlikely) or something wierd happened during the hot plug.

        I spent a long, long time getting everything back to how it was.

        1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller
          last edited by

          RAID 5 induces other failures when you go to rebuild. It's extremely common and just an artifact of that RAID level. Doesn't mean that it will always do it or even normally do it, but it is very common. Once you do a drive swap it immediately increases the load on the drives and makes them more likely to fail.

          BRRABillB 1 Reply Last reply Reply Quote 1
          • BRRABillB
            BRRABill
            last edited by

            Interesting. The second failed drive definitely sounded like it was dead...mechanical issue.

            I think that happened to me a long time ago on a server, which is why I'm always nervous doing it.

            THOUGH thanks to ML I'll never have another RAID 5 array, so no need to worry!

            It doesn't do that for any other RAID level?

            And I am assuming RAID 5 of SSDs wouldn't do that?

            scottalanmillerS 1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @BRRABill
              last edited by

              @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

              BRRABillB 1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller
                last edited by

                RAID 6 induces even more immediate wear and tear so is even more likely to kill off a second drive at the time of drive replacement PLUS has one extra drive to have fail but can withstand losing one additional drive so is dramatically safer overall.

                1 Reply Last reply Reply Quote 0
                • BRRABillB
                  BRRABill @scottalanmiller
                  last edited by

                  @scottalanmiller said:

                  @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                  Is the rate the same? Or is this a random (but common) thing?

                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @BRRABill
                    last edited by

                    @BRRABill said:

                    @scottalanmiller said:

                    @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                    Is the rate the same? Or is this a random (but common) thing?

                    Sorry that was a typo. SSDs do NOT suffer mechanically induced failure.

                    1 Reply Last reply Reply Quote 0
                    • BRRABillB
                      BRRABill
                      last edited by

                      Oh. Phew.

                      What is the point of RAID if that happens?

                      That's it. I'm quitting IT.

                      I've had enough.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @BRRABill
                        last edited by

                        @BRRABill not much point to RAID 5, that's what we've been saying for years. By 2009 it was so dangerous that it was actually worse in most cases than doing nothing at all.

                        1 Reply Last reply Reply Quote 0
                        • BRRABillB
                          BRRABill
                          last edited by

                          Well this server is from well before 2009.

                          It's a miracle nothing has happened yet.

                          drewlanderD 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller
                            last edited by

                            That is indeed pretty old.

                            1 Reply Last reply Reply Quote 0
                            • BRRABillB
                              BRRABill
                              last edited by

                              I'm not going to say exactly HOW old because I've not sure I can take any more heads shaking at me this month. LOL.

                              1 Reply Last reply Reply Quote 1
                              • drewlanderD
                                drewlander
                                last edited by

                                @BRRABill said:

                                0 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the

                                In complete honesty I will admit that one time I was cold swapping a failed drive in a proliant dl360G5 and replaced the wrong one. Fortunately the server wouldnt even boot and I was able to power it down, sort it out and bring it back up. Since then I will never run a server without the backplane kit and hot swappable drive caddies with the status indicator LED.

                                1 Reply Last reply Reply Quote 2
                                • drewlanderD
                                  drewlander @BRRABill
                                  last edited by

                                  @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                  BRRABillB 1 Reply Last reply Reply Quote 3
                                  • BRRABillB
                                    BRRABill @drewlander
                                    last edited by

                                    @drewlander said:

                                    @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                    This was a PowerEdge 2800. I've been kind of proud of the fact that I kept these things up and running for so long. And considering the low RAM and age, they still run awesome.

                                    BUT ... like I said it's a miracle that things haven't gone south quicker. The second drive that failed was a replacement drive, which of course was not new.

                                    Key point, as in anything, is to always have a good backup. 🙂

                                    1 Reply Last reply Reply Quote 1
                                    • scottalanmillerS
                                      scottalanmiller
                                      last edited by

                                      We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                      BRRABillB 1 Reply Last reply Reply Quote 0
                                      • BRRABillB
                                        BRRABill @scottalanmiller
                                        last edited by

                                        @scottalanmiller said:

                                        We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                        That's about where we are. I've hung lucky mementos in there, and am hoping for the best. 🙂

                                        I actually have a construction paper good luck charm a vendor's wife once gave me a long time ago (before these servers even) that's actually hanging in there. It has done it's job pretty good so far.

                                        1 Reply Last reply Reply Quote 0
                                        • BRRABillB
                                          BRRABill
                                          last edited by

                                          True story. Right after I posted that last post, I went into the server room to take a picture of this paper good luck charm. On the way back down the hall, the building's power went out, and has been out the past 3 hours. This week is just AWESOME!

                                          Anyway, here is the picture:
                                          0_1447356517797_goodluckcharm.JPG

                                          Note the failed DELL right below it.

                                          It did its job for many years, though. No complaints.

                                          J 1 Reply Last reply Reply Quote 0
                                          • BRRABillB
                                            BRRABill
                                            last edited by

                                            P.S. If anyone can read that, and it DOESN'T say good luck, please don't let me know. 🙂

                                            J drewlanderD 2 Replies Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 3 / 4
                                            • First post
                                              Last post