Dell PERC Question (Server Down)
-
@todd-at-xByte said in Dell PERC Question (Server Down):
@BRRABill said in Dell PERC Question (Server Down):
@todd-at-xByte these were a new line of drives to begin with, right?
I think you just recently switched, right?
Not exactly. Edge rebranded the 960 Boost Pro Plus drives we were selling to E3 to consolidate their line. Edge stated that the 960 E3's were exactly the same as their 960 Boost Pro Plus.
How long had/have you been selling the Boost Pro Plus drives? And were there ever any issues with them?
-
So, the array went bye-bye again. Actually, it was just drive 1:0. (I'm still wondering if there is just something off with that drive.)
Anyway, I said, the hell with this, time to get these new DELL drives in there.
So the array came up degraded (just running on 1:1). I Rebooted and went into the PERC config. I unplugged 1:0 (which was missing anyway), and plugged the DELL drive into 1:2. It instantly powered up, and the PERC config saw it. I added it as a hot spare, and it instantly started rebuilding. AWESOME!
So I rebooted the server. As soon as the server rebooted, the LED on the DELL drive started blinking. Hmm, that's odd, I think. Of course an error comes up, saying drives are missing. I look at the DELL drive, no LEDs. WTF.
I'll cut through the 2+ hours on support with DELL, trying everything. They basically said, the array is toast. Great.
I have 2 more of these DELL SSDs, so I think, WTH, let me try one of them. I plug it in, and reboot a few times with it outside the array. Comes back. So the big test, try it with the array. I do the same steps. But this time when it reboots, the array stays up.
AWESOME AWESOME AWESOME!
It is still currently rebuilding, so we shall see where we get with this. I wonder if the one drive was just a lemon. DELL says no, but I think the results say otherwise.
-
you rebooted after it started a rebuild? That wasn't wise.
-
@Dashrender said in Dell PERC Question (Server Down):
you rebooted after it started a rebuild? That wasn't wise.
That's what I have seen all the DELL techs do.
I doubled checked tonight and that it definitely the supported way to go.
-
it may be allowable, but seems like an unwise thing to do. For example, I would never do that on a RAID 5 array, all that math, one little bit gets messed up.. array is lost. kinda like what happened to you.
-
@Dashrender said in Dell PERC Question (Server Down):
it may be allowable, but seems like an unwise thing to do. For example, I would never do that on a RAID 5 array, all that math, one little bit gets messed up.. array is lost. kinda like what happened to you.
Just on a few drives. For other drives, it worked fine.
If you had to wait the whole time for it to rebuild, it would take forever. (Think of the really long RAID5 times @scottalanmiller has mentioned.)
-
of course, but normally you would have the OS up and running during that time, so the users don't see that long downtime.
-
The end of the night had me trying to add the third DELL disk in, which also failed in the same way as the first.
So I now have the array fully functional with the EDGE drive that didn't ever fail, and the 1 DELL drive that worked.
Ugh.
Please stay up tonight, gentle array.
-
@BRRABill said in Dell PERC Question (Server Down):
@Dashrender said in Dell PERC Question (Server Down):
you rebooted after it started a rebuild? That wasn't wise.
That's what I have seen all the DELL techs do.
I doubled checked tonight and that it definitely the supported way to go.
I've been a Dell tech. They are not trained and are just random people grabbed from third party staffing firms at the last second. Never use "Dell techs do it" as a guide to anything. It's the same as saying "random out of work guy did this".
-
@scottalanmiller said
I've been a Dell tech. They are not trained and are just random people grabbed from third party staffing firms at the last second. Never use "Dell techs do it" as a guide to anything. It's the same as saying "random out of work guy did this".
Sorry, I didn't mean the techs who come out to your location. I mean their US-based phone technical support.
Is that who you mean? Is technical support safe to trust?
-
Also, what is your take? Safe to reboot on a rebuilding array if you have to configure it from the controller config in BIOS?
-
@BRRABill said in Dell PERC Question (Server Down):
Is that who you mean? Is technical support safe to trust?
Probably. But... that was a very reckless decision on their part. So... no.
-
@BRRABill said in Dell PERC Question (Server Down):
Also, what is your take? Safe to reboot on a rebuilding array if you have to configure it from the controller config in BIOS?
No, it's reckless and crazy. You don't induce a failure risk during a repair operation.
-
If it is rebuilding, why is configuration needed? Any why is configuration limited to the BIOS?
-
@scottalanmiller said in Dell PERC Question (Server Down):
If it is rebuilding, why is configuration needed? Any why is configuration limited to the BIOS?
Because when this happens, the drives totally disappear from the PERC config and the server crashes. You have to go in and clear the foreign config. There is nothing to rebuild.
It's not like a drive has failed, and the LED is orange and you can pull it. Nothing like that.
-
@scottalanmiller said
Any why is configuration limited to the BIOS?
I mean the PERC configuration utility access during POST but hitting <CTRL>R.
-
@BRRABill said in Dell PERC Question (Server Down):
@scottalanmiller said
Any why is configuration limited to the BIOS?
I mean the PERC configuration utility access during POST but hitting <CTRL>R.
Me too
-
@BRRABill said in Dell PERC Question (Server Down):
@scottalanmiller said in Dell PERC Question (Server Down):
If it is rebuilding, why is configuration needed? Any why is configuration limited to the BIOS?
Because when this happens, the drives totally disappear from the PERC config and the server crashes. You have to go in and clear the foreign config. There is nothing to rebuild.
It's not like a drive has failed, and the LED is orange and you can pull it. Nothing like that.
That means that the PERC has failed. That's a different issue.
-
@scottalanmiller said
That means that the PERC has failed. That's a different issue.
Are we back to that?
I mean, it's definitely on the table. Just not sure why all of a sudden you think that's so likely.
-
@BRRABill said in Dell PERC Question (Server Down):
@scottalanmiller said
That means that the PERC has failed. That's a different issue.
Are we back to that?
I mean, it's definitely on the table. Just not sure why all of a sudden you think that's so likely.
Because you said that it was rebuilding. The PERC should remain up and viable even without disks attached to it. You can continue to manage it online through the iDRAC or a VM that isn't using those drives.