Strange Smart Array p410i problem
-
@scottalanmiller said:
Well if they drive was having issues but not ones that would make it fail, it might have been marked as healthy while still slowing down. It's possible. Any one drive being slow will definitely kill the performance of any RAID array as they all have to wait on that one for each read or write operation. That it slowed things down isn't surprising, that's not uncommon to have happen ahead of an actual failure.
But, I have seen earlier on other DL380s that a drive will have an amber LED indicating that it is in "pre-failure" state, probably because of SMART-errors. That didn't happen here, and I have not seen a RAID before that has been so slow and troubled by a bad disk... I don't have that much experience, though... I will be interesting to hear what HP or our dealer will say about this. I also asked them if they think the RAID card is faulty. The ease of having a controller guiding you is sort of not present anymore, might as well buy an LSI HBA and go for ZFS. I woudl surely have been notified about this.
-
@scottalanmiller said:
FreeBSD is a great OS but if the hpacucli is not available for it then it sucks a bit running it on the bare metal. If you use a hypervisor like ESXi, HyperV or XenServer you can get around that as the errors go to the hypervisor instead of to a guest OS. Then you can virtualize FreeBSD on top of that without any problems.
Perhaps I should try that? This incident got me thinking that perhaps I have to look closer at my ESX server also, since this machine is a DL385 with the same controller...
-
Interesting. This sounds like a similar issue I'm having with G8 server.
-
I live in Norway and it's getting late here. I will return to this thread tomorrow!
-
@flomer said:
I live in Norway and it's getting late here. I will return to this thread tomorrow!
Well WELCOME...in Norwegian...
-
@flomer said:
But, I have seen earlier on other DL380s that a drive will have an amber LED indicating that it is in "pre-failure" state, probably because of SMART-errors. That didn't happen here, and I have not seen a RAID before that has been so slow and troubled by a bad disk... I don't have that much experience, though... I will be interesting to hear what HP or our dealer will say about this. I also asked them if they think the RAID card is faulty. The ease of having a controller guiding you is sort of not present anymore, might as well buy an LSI HBA and go for ZFS. I woudl surely have been notified about this.
Any chance that you are using third party drives instead of HP drives? Non-HP drives will not fully report to the SmartArray.
-
@flomer said:
Perhaps I should try that? This incident got me thinking that perhaps I have to look closer at my ESX server also, since this machine is a DL385 with the same controller...
In this era I would definitely strongly consider virtualizing even a dedicated storage device. The stability and flexibility are almost always worth it.
-
@thanksaj said:
@flomer said:
I live in Norway and it's getting late here. I will return to this thread tomorrow!
Well WELCOME...in Norwegian...
velkommen
-
Welcome to the MangoLassi Community! Great group here.
-
@Dashrender said:
Interesting. This sounds like a similar issue I'm having with G8 server.
HP drives or third party? The G8 has even more firmware integration than in the past.
-
Welcome. You will find some very smart people around here.
-
@flomer said:
I live in Norway and it's getting late here. I will return to this thread tomorrow!
Welcome to Mangolassi..
Glad you find this Community -
Welcome to ML. I think that you'll like it around here.
-
@StrongBad said:
Welcome to ML. I think that you'll like it around here.
Sir, It's nice to have people from around the world . If people from US are sleeping at least someone from Europe still around.
-
@Joyfano said:
@StrongBad said:
Welcome to ML. I think that you'll like it around here.
Sir, It's nice to have people from around the world . If people from US are sleeping at least someone from Europe still around.
@Joyfano is all excited!
-
@scottalanmiller said:
@flomer said:
Perhaps I should try that? This incident got me thinking that perhaps I have to look closer at my ESX server also, since this machine is a DL385 with the same controller...
In this era I would definitely strongly consider virtualizing even a dedicated storage device. The stability and flexibility are almost always worth it.
But, will this not lead to slower operation? There must be some overhead? And this should be the only VM on that server, then?
-
@scottalanmiller said:
@flomer said:
But, I have seen earlier on other DL380s that a drive will have an amber LED indicating that it is in "pre-failure" state, probably because of SMART-errors. That didn't happen here, and I have not seen a RAID before that has been so slow and troubled by a bad disk... I don't have that much experience, though... I will be interesting to hear what HP or our dealer will say about this. I also asked them if they think the RAID card is faulty. The ease of having a controller guiding you is sort of not present anymore, might as well buy an LSI HBA and go for ZFS. I woudl surely have been notified about this.
Any chance that you are using third party drives instead of HP drives? Non-HP drives will not fully report to the SmartArray.
I have not touched the RAID since setting it up almost three years ago, and all parts are HP parts. See below for snapshot of error message. The RAID is RAID 6, not 60, by the way...
I still haven't heard from the reseller why the RAID controller didn't kick the drive out.
-
@flomer said:
But, will this not lead to slower operation? There must be some overhead? And this should be the only VM on that server, then?
Yes but if you can measure it that would be shocking. There is effectively no overhead on the disk IO and all of your bottlenecks are from spinning disks and such. You still get all of your available threads, nearly all of your memory, etc. The CPU hit is nominal and the disk IO hit is nominal. The benefits are huge and the caveats are things you probably can't even measure.
But yes, the only VM on the server most likely. Virtualization won't cause performance issues. Consolidation might. But test it, you might get a lot of consolidation out of it too. All depends on your workload, hardware changes, etc.
-
@flomer said:
I still haven't heard from the reseller why the RAID controller didn't kick the drive out.
There is a really good chance, and I really mean VERY GOOD chance, that the threshold of "too many errors" was not hit until the reboot. Rebooting a system will cause changes in drive activity that could easily trigger the difference between "too few" and "too many" errors. I have seen this a lot.
-
@scottalanmiller said:
@flomer said:
I still haven't heard from the reseller why the RAID controller didn't kick the drive out.
There is a really good chance, and I really mean VERY GOOD chance, that the threshold of "too many errors" was not hit until the reboot. Rebooting a system will cause changes in drive activity that could easily trigger the difference between "too few" and "too many" errors. I have seen this a lot.
Well, I actually rebooted the machine thinking it might help before I knew it was a failing drive. It seemed to help a little, but I guess I must just have imagined it getting better. Or at least the situation got just as a bad after a little while.