Dell PERC Question (Server Down)
-
If you (or someone else) could speak to why this happened, and if trying another set of SSDs might help, that would be helpful for me, and also for the ML community looking to buy SSDs in the future.
-
@JaredBusch Thanks!
-
We are reaching out to Edge so they can reply directly to this thread.
-
@ryan-from-xbyte said:
We are reaching out to Edge so they can reply directly to this thread.
Great.
Maybe it is "fixable" and I won't have to do anything. (Fingers crossed.)
-
Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:
- You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
- Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?
-
@jleskovsky said:
Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:
- You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
- Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?
I can kind of answer both the questions at the same time.
After importing the foreign configuration, the array came back up, and the drives were again recognized by the PERC.
They were NOT recognized at the hardware level (by iDRAC) until the config was re-imported.
xByte seemed to think it was a drive issue. The DELL rep, while he said it might have been a fluke, said that error almost always happens with faulty drives.
So, you think it was just a fluke?
-
BTW: welcome to MangoLassi!
-
Thank you for hopping in to answer questions @jleskovsky
-
Based on all the testing that I have done personally on Dell blade and rack server here in house @ EDGE, I've only personally seen a similar issue only once before, and it was after a PERC controller firmware update on a 12th Gen R720xd. Like your situation, I was able to simply import the "foreign" configuration and the system was back to business as usual. It's actually still up and running Ubuntu now, months later, without giving any indication that the "error" ever even occurred.
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
-
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
-
@BRRABill said:
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.
-
@Dashrender said:
updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.
I'm not blaming it, I guess. Just thought it was weird it didn't reboot the server. And in about 24 hours the issue happened.
I've run that SSD array up in testing for weeks with no problems.
Well, hopefully it was a fluke. @scottalanmiller said flukes happen all the time.
-
@BRRABill said:
@Dashrender said:
updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.
I'm not blaming it, I guess. Just thought it was weird it didn't reboot the server. And in about 24 hours the issue happened.
I've run that SSD array up in testing for weeks with no problems.
Well, hopefully it was a fluke. @scottalanmiller said flukes happen all the time.
iLo and iDrac are completely independent from the servers. They are designed to allow you access to the system regardless of the system's state. Though iLo and iDrac you can mount an ISO through your desktop/laptop as if it was a DVD Rom and boot the server so you can install it completely remotely, etc.
The general idea is that IT personal generally stay out of DCs and bench techs take care of the hardware, cabinets, etc in the DC.
-
-
@BRRABill said:
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
Can't require a reboot, it is its own computer with its own cup, memory, firmware, etc. if you reboot the server the ilo or idrac do not reboot. They stay on even when the system is powered down.
-
@scottalanmiller said:
@BRRABill said:
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
Can't require a reboot, it is its own computer with its own cup, memory, firmware, etc. if you reboot the server the ilo or idrac do not reboot. They stay on even when the system is powered down.
Assuming the server has power.
-
Well, everything has been OK thus far. We shall see.
Gremlins, perhaps.
-
Is there any issue with updating any of the firmware/BIOS of the DELL server? Is there ever a chance that might mess with the co-operation between the EDGE drive and the server?
-
@BRRABill said:
Is there any issue with updating any of the firmware/BIOS of the DELL server? Is there ever a chance that might mess with the co-operation between the EDGE drive and the server?
While not directly related to drives - but sadly the answer is - Of course. In my case updating the iLo caused the fans in the system to spin up and down. But updating the firmware in a RAID card could add or remove a feature that the drive is unaware of and then have an issue. Is that that likely? Probably not - we don't see those kinds of problems on other systems when you update firmware on Mobo's and drives in general.
-
Well, this happened again today.
I think it is time to yank those drives.