Dell PERC Question (Server Down)
-
@Dashrender said:
So I wonder - where is the issue with the Edge drives.
I think it is something with how the DELL servers talk to them.
The DELLs only like it when their drives are in there. The DELL tech I spoke with said that if the drives don't return exactly what the PERC is looking for, it can offline the array, and that the error I saw is almost always a drive issue.
I'm still waiting to hear back from my rep (Brad) at xByte on the specifics of why they think this did not work.
Who (whom?) is the main xByte contact here at ML. Maybe we can loop them in.
-
@Dashrender said:
If you were building a SuperMicro server I might suggest something like the Samsung 850 Pro drives. Under provision them by 20% and you'll statically be fine for the life of a standard server.
That was the first thing I was told. If you buy DELL, stick with the ecosystem. But I felt confident the EDGE drives would be OK.
-
@BRRABill said:
@Dashrender said:
If you were building a SuperMicro server I might suggest something like the Samsung 850 Pro drives. Under provision them by 20% and you'll statically be fine for the life of a standard server.
That was the first thing I was told. If you buy DELL, stick with the ecosystem. But I felt confident the EDGE drives would be OK.
Stick with ecosystem = yes. But - as I understand it - and frankly I can't believe no one from xByte has jumped in here yet - xByte had the EDGE drives built to answer the exact calls coming from the PERC cards so they basically look like DELL drives to the PERC cards. Is that not the case? Psst.. that's not for you to answer, that's for xByte to answer.
-
I'll ping my rep.
-
Hi @BRRABill , who is your rep at xByte? I am going to look into this.
-
-
HI:
It is Brad. I just spoke with him on the phone. (We've been back and forth all morning.)
He took care of it from a customer service aspect 100% to my liking. (Full refund for the drives.) He said he was going to get someone else to jump on here to explain what happened.
-
@JaredBusch said:
@BradfromxByte is awesome.
Yeah, he has been awesome so far, in my limited dealings with xByte. Really above and beyond.
They are going to 100% refund the drives. So now I just have to decide what to upgrade to.
They offered to send out replacement SSDs, but I'm not sure if I trust that route.
I'll wait to someone techy from xByte pops on to describe what happened, and how they feel about trying another SSD.
-
If you (or someone else) could speak to why this happened, and if trying another set of SSDs might help, that would be helpful for me, and also for the ML community looking to buy SSDs in the future.
-
@JaredBusch Thanks!
-
We are reaching out to Edge so they can reply directly to this thread.
-
@ryan-from-xbyte said:
We are reaching out to Edge so they can reply directly to this thread.
Great.
Maybe it is "fixable" and I won't have to do anything. (Fingers crossed.)
-
Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:
- You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
- Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?
-
@jleskovsky said:
Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:
- You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
- Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?
I can kind of answer both the questions at the same time.
After importing the foreign configuration, the array came back up, and the drives were again recognized by the PERC.
They were NOT recognized at the hardware level (by iDRAC) until the config was re-imported.
xByte seemed to think it was a drive issue. The DELL rep, while he said it might have been a fluke, said that error almost always happens with faulty drives.
So, you think it was just a fluke?
-
BTW: welcome to MangoLassi!
-
Thank you for hopping in to answer questions @jleskovsky
-
Based on all the testing that I have done personally on Dell blade and rack server here in house @ EDGE, I've only personally seen a similar issue only once before, and it was after a PERC controller firmware update on a 12th Gen R720xd. Like your situation, I was able to simply import the "foreign" configuration and the system was back to business as usual. It's actually still up and running Ubuntu now, months later, without giving any indication that the "error" ever even occurred.
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
-
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
-
@BRRABill said:
@jleskovsky said:
Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!
OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.
I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.
updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.
-
@Dashrender said:
updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.
I'm not blaming it, I guess. Just thought it was weird it didn't reboot the server. And in about 24 hours the issue happened.
I've run that SSD array up in testing for weeks with no problems.
Well, hopefully it was a fluke. @scottalanmiller said flukes happen all the time.