Dell PERC Question (Server Down)
-
@Dashrender said in Dell PERC Question (Server Down):
@scottalanmiller said in Dell PERC Question (Server Down):
@BRRABill said in Dell PERC Question (Server Down):
I know I've been told the iDRAC is out of band and has NOTHING to do with the server, but all of these problems started within 24 hours of enabling the iDRAC on my server. (This array was running for months wth no issues.) And clearly it was having trouble getting the status info of the drives. There is no possible way the two things could be related?
That is correct insofar as you have nothing to do with the server. The DRAC is an outside actor, just like you are. You can make changes to the system as an outside actor, so can the DRAC. So the DRAC is not part of the server itself, but it does have access to manage it just like a person would.
So perhaps in his case, the DRAC was trying to take logs from the drives and caused them to crash?
Possible. Like maybe they had an issue and the DRAC just tried to read data from them and triggered the event. Probably would have done the same if accessed manually.
-
So (as usual) I misread/misunderstood what you guys were saying.
It IS possible that some bad code in either the iDRAC or the way it interfaces with the server could cause the server to crash?
-
@BRRABill said in Dell PERC Question (Server Down):
So (as usual) I misread/misunderstood what you guys were saying.
It IS possible that some bad code in either the iDRAC or the way it interfaces with the server could cause the server to crash?
Yes, it is possible that the iDRAC is triggering an issue. The interface, other than shorting things out electrically, should be protection against code issues. If code issues make it past the demarcation point, that's an error on the server side, not the iDRAC side. Even if it is triggered by bad code in the iDRAC, the iDRAC only gets to do as much damage as the server lets it do.
So in the same way that you could tell the server to do something bad (delete your configuration) or could simple pull out a taser and fry the motherboard, the iDRAC can, too. But the iDRAC's code can only make it become a bad actor like you might yourself after a weekend bender.
-
@scottalanmiller said
So in the same way that you could tell the server to do something bad (delete your configuration) or could simple pull out a taser and fry the motherboard, the iDRAC can, too. But the iDRAC's code can only make it become a bad actor like you might yourself after a weekend bender.
I feel my admin skills get better after a few beers, thank you!
-
How does my fan issue get caused by the iLo?
Would it be something like, the iLo software has a bug, when it tries to read the fans it uses the wrong API calls, which the fans read as an error, and when the fans read an error this spin up?
-
And I'm not blaming the iDRAC.
Like I said, I just figured it would a reboot when it got licensed, but it did not. (That's actually the way things should ALWAYS work!)
And while I said it did not happen in the months prior to enabling the licensing on the iDRAC, that was also the weekend I installed my production mail server on a XS VM on this. SO, there is likely a lot more array activity.
I have reached out to EDGE and xByte, and will try to work with them further on the issue and report back. They have not been on ML in a while so hopefully they will chime in. (Actually, I will send an e-mail to let them know.)
-
@scottalanmiller sai
If code issues make it past the demarcation point, that's an error on the server side, not the iDRAC side. Even if it is triggered by bad code in the iDRAC, the iDRAC only gets to do as much damage as the server lets it do.
But the iDARC has full access to the server.
I understand if I did sometthing stupid, the server would let me, but I just don't see how using the iDRAC I should be expecting that kind of behavior.
Or that if the iDRAC told the server to do something bad we should be blaming anyone else than the iDRAC.
-
@Dashrender said in Dell PERC Question (Server Down):
How does my fan issue get caused by the iLo?
Would it be something like, the iLo software has a bug, when it tries to read the fans it uses the wrong API calls, which the fans read as an error, and when the fans read an error this spin up?
ILO reads the temperature sensors, it might pass the sensors out and then the speed control back in.
-
@BRRABill said in Dell PERC Question (Server Down):
@scottalanmiller sai
If code issues make it past the demarcation point, that's an error on the server side, not the iDRAC side. Even if it is triggered by bad code in the iDRAC, the iDRAC only gets to do as much damage as the server lets it do.
But the iDARC has full access to the server.
I understand if I did sometthing stupid, the server would let me, but I just don't see how using the iDRAC I should be expecting that kind of behavior.
Or that if the iDRAC told the server to do something bad we should be blaming anyone else than the iDRAC.
If the code in the iDRAC actually issues a call like "drop a drive", then the iDRAC is being a bad actor, just like you could do from the PERC console. If the issue is that the iDRAC is issueing gibbering and the PERC decides to drop the drive because of gibberish, that's the PERC's fault for doing something it wasn't told to do.
-
@BRRABill Thank you for reaching out! Even though I have set up an email notification for this post, I haven't been receiving them. Much appreciate you keeping me in the loop. This is beyond my basic IT knowledge (marketing gal here), but I will alert one of our engineers to see if they can chime in.
-
@Lyndsie_xByte said in Dell PERC Question (Server Down):
@BRRABill Thank you for reaching out! Even though I have set up an email notification for this post, I haven't been receiving them. Much appreciate you keeping me in the loop. This is beyond my basic IT knowledge (marketing gal here), but I will alert one of our engineers to see if they can chime in.
Email notices run out after about 5 days or less. I think ML is working on it, but it's a cost issue.
-
@BRRABill
I'm coming late into this thread and I'm having problems discerning exactly what the issue is right now. Please contact your xByte rep Brad and he will get a support request going. Our techs can assist directly and can get Edge officially involved instead of trying to rely on ML posts.
--Todd -
@Dashrender said in Dell PERC Question (Server Down):
@Lyndsie_xByte said in Dell PERC Question (Server Down):
@BRRABill Thank you for reaching out! Even though I have set up an email notification for this post, I haven't been receiving them. Much appreciate you keeping me in the loop. This is beyond my basic IT knowledge (marketing gal here), but I will alert one of our engineers to see if they can chime in.
Email notices run out after about 5 days or less. I think ML is working on it, but it's a cost issue.
Not exactly. It is a choice to use a service with limits over sending directly and updating records.
-
@todd-at-xByte said
@BRRABill
I'm coming late into this thread and I'm having problems discerning exactly what the issue is right now. Please contact your xByte rep Brad and he will get a support request going. Our techs can assist directly and can get Edge officially involved instead of trying to rely on ML posts.
--ToddTodd:
I reached out to Brad yesterday to open a case with your tech support. Though we already kind of went through them and they sent us to EDGE. I've been having problems with EDGE responding to me, which is why I reached back out to Lyndsey who set that up the first time.
-
@JaredBusch said in Dell PERC Question (Server Down):
@Dashrender said in Dell PERC Question (Server Down):
@Lyndsie_xByte said in Dell PERC Question (Server Down):
@BRRABill Thank you for reaching out! Even though I have set up an email notification for this post, I haven't been receiving them. Much appreciate you keeping me in the loop. This is beyond my basic IT knowledge (marketing gal here), but I will alert one of our engineers to see if they can chime in.
Email notices run out after about 5 days or less. I think ML is working on it, but it's a cost issue.
Not exactly. It is a choice to use a service with limits over sending directly and updating records.
We tried sending directly and were blacklisted. We could try to get that to work but have tried this in the past and not had luck. We couldn't get even test messages to go out locally. If we switched to local, email would just stop for nearly everyone, all the time, completely.
-
@todd-at-xByte said in Dell PERC Question (Server Down):
@BRRABill
I'm coming late into this thread and I'm having problems discerning exactly what the issue is right now. Please contact your xByte rep Brad and he will get a support request going. Our techs can assist directly and can get Edge officially involved instead of trying to rely on ML posts.
--ToddWhy not get Edge responding here?
-
@todd-at-xByte said in Dell PERC Question (Server Down):
I'm coming late into this thread and I'm having problems discerning exactly what the issue is right now.
From what I could tell, the issue is that Edge does not respond.
-
@StrongBad said
From what I could tell, the issue is that Edge does not respond.
Yes, the tech who was working with me has not responded.
Now, in the past few weeks I have dealt with people on vacation, and people who were sick, and everything else. So I always like to give them the benefit of the doubt as to why they are not responding.
-
@BRRABill Any update to share?
-
This was the latest e-mail from earlier this afternoon:
"That information is good. I was hoping that your iDRAC log would shine some light on what the actual fault error was being recorded when the drive array is actually going down. I’m working on this now with one of our SSD engineers and I am hoping to have some additional information or potential resolutions about this issue today. "