Failed Global Drive
-
I have a client that has a RAID 5 array with 3 drives in the array and 1 Global Spare. The server suddenly started running very slowly so I checked the RAID array. I have a pre-fail condition on the Global Spare. Since the global spare is not in use right now, would that have an impact on performance?
-
Yes, while not likely, it is possible for this to affect things. What might have happened is a failure that is spewing garbage onto the SCSI bus causing the controller to have to deal with that instead of dealing with the array.
-
Wow, I've never seen that before - but then again, I've rarely dealt with hot spares.
-
It's going to be replaced in the next few hours... but wanted to see if I could deal with a slow server issue in the meantime.
-
Telling the RAID to remove the failed/failing drive from the system (unmount it) should remove any issues it's causing.
-
@ccwtech said in Failed Global Drive:
It's going to be replaced in the next few hours... but wanted to see if I could deal with a slow server issue in the meantime.
Yank it out now. It's already failing and you don't want a spare with a RAID 5 array, so I'd pull it out and not replace it.
-
How large is each drive?
-
600 GB
-
@scottalanmiller said in Failed Global Drive:
@ccwtech said in Failed Global Drive:
It's going to be replaced in the next few hours... but wanted to see if I could deal with a slow server issue in the meantime.
Yank it out now. It's already failing and you don't want a spare with a RAID 5 array, so I'd pull it out and not replace it.
Why is a spare bad?
-
@ccwtech said in Failed Global Drive:
@scottalanmiller said in Failed Global Drive:
@ccwtech said in Failed Global Drive:
It's going to be replaced in the next few hours... but wanted to see if I could deal with a slow server issue in the meantime.
Yank it out now. It's already failing and you don't want a spare with a RAID 5 array, so I'd pull it out and not replace it.
Why is a spare bad?
If you have a spare why not got RAID 10 instead of RAID 5? Faster and safer.
-
@ccwtech said in Failed Global Drive:
@scottalanmiller said in Failed Global Drive:
@ccwtech said in Failed Global Drive:
It's going to be replaced in the next few hours... but wanted to see if I could deal with a slow server issue in the meantime.
Yank it out now. It's already failing and you don't want a spare with a RAID 5 array, so I'd pull it out and not replace it.
Why is a spare bad?
Because it is RAID 5 and the spare automates the highest risk of total data loss. RAID 5 is not a super safe RAID technology, so the way to use it is to take a backup before replacing a failed drive as it is the replacement process that creates ~98% of the array failures. If you have a hot spare on a RAID 5, it means that the process of triggering that full data loss risk is instant, rather than giving the customer time to take a fresh backup.
Also, a spare means that the disks were available for RAID 10, so both the reliability and the speed of RAID 10 are lost, but the cost is there.
-
Hot Spare or a Hot Mess, why spares never are used with RAID 5.
-
@ccwtech said in Failed Global Drive:
600 GB
At least the drives are fairly small, but there is still a noticeable percent of URE failure.
You're best bet would be to take an image of these system, wipe out the RAID 5, create a RAID 10 and restore the image.
You'll have a faster, safer system.
-
@dashrender said in Failed Global Drive:
@ccwtech said in Failed Global Drive:
600 GB
At least the drives are fairly small, but there is still a noticeable percent of URE failure.
Yes, the overall risk isn't too bad. But the improper setup remains. It's like driving eratically, but very slowly. The risk of the crash is decently low, but fixing the eratic driving is still wise.
-
@dashrender said in Failed Global Drive:
You're best bet would be to take an image of these system, wipe out the RAID 5, create a RAID 10 and restore the image.
You'll have a faster, safer system.
That is what I would do. Fix it before replacing the spare in a bad way.
-
New server soon so I'm going to keep it the same, but I am a Raid 10 fan, big time.
-
@ccwtech said in Failed Global Drive:
New server soon so I'm going to keep it the same, but I am a Raid 10 fan, big time.
Test you backups then... Test, test - again.
As mentioned, RAID 5 at that size is not horrible, but definitely don't put the hot spare back in the system.
-
Ok, will do. Thanks!
-
Just put the new spare on top of the server or something - somewhere so it can be replaced the moment that you need it, but only with intent.