Configure Software RAID 1 in Centos

A Former User

What was the point of this? if the raid failed you just put a new one in and rebuild it. The RAID Controller/Software will report when a raid is degraded, there is no need to "check it" manually.

scottalanmiller

@thecreativeone91 said:

What was the point of this? if the raid failed you just put a new one in and rebuild it. The RAID Controller/Software will report when a raid is degraded, there is no need to "check it" manually.

"Checking it" is what will break things, too. It's just pointless, it is actually reckless and self defeating. If you want to ensure that the data is good, you can't do it this way.

Lakshmana

@scottalanmiller said:

you can't do it this way.

Ok Sir.I will explain this to my Manager and after that I acknowledge you sir

Lakshmana

@scottalanmiller Screenshot from 2015-03-02 19:04:32.png
Sir Please refer this image one person said me that RAID 1 member can be boooted by installing bootloader.Whether it is possible Sir.I don't know about that one small doubt came after this image seen

scottalanmiller

Yes, in case of a broken RAID array you can install a bootloader, of course. But that doesn't change that what you are doing should not be done. This is for cases where the other drive is broken, not when a manager asks you to damage the system.

Lakshmana

@scottalanmiller Ok I understood.I went to Manager said that it cannot be done and the RAID configured device will lost the data and this is impossible.But he ask the explanation behind this and said to explain briefly

Dashrender

I'm a bit lost - is @Lakshmana using hardware RAID or software RAID inside CentOS?

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

If it's software RAID, then I understand why he has failures.

Lakshmana

@Dashrender Its Sofware RAID1

scottalanmiller

@Lakshmana said:

@scottalanmiller Ok I understood.I went to Manager said that it cannot be done and the RAID configured device will lost the data and this is impossible.But he ask the explanation behind this and said to explain briefly

It's not that it is impossible, of course it can be done, the problem is is that you break the RAID device so the system is broken. So you are "checking" data that is no longer valid. You can verify that the RAID was working, but you've broken that RAID device and are starting over. So there was no value in checking it because now you have to rebuild it and.... check again? The act of checking it breaks the array.

scottalanmiller

@Dashrender said:

I'm a bit lost - is @Lakshmana using hardware RAID or software RAID inside CentOS?

Linux MD RAID.

Dashrender

No wonder pulling a drive causes this to fail.

scottalanmiller

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

Dashrender

@scottalanmiller said:

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

I completely understand that - and maybe @Lakshmana does as well. So the act of testing this only proves the system will continue to function while the system is running, but probably won't survive if you reboot the system or power it down and back up. And once you put reattach a drive, even if it's the original one you pulled for the test, it has to be completely rebuilt. But pulling the drive does prove if the system is working or not.

scottalanmiller

@Dashrender said:

@scottalanmiller said:

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

I completely understand that - and maybe @Lakshmana does as well. So the act of testing this only proves the system will continue to function while the system is running, but probably won't survive if you reboot the system or power it down and back up. And once you put reattach a drive, even if it's the original one you pulled for the test, it has to be completely rebuilt. But pulling the drive does prove if the system is working or not.

No, it proves that the system was working. It also proves that the is not working anymore. The act of pulling the drive breaks the array. So yes, you can verify that you used to have things right. But it puts you into a degraded state and you now have to get the system to repair itself. Presumably, you'd want to check that too.... and the cycle of never having a working array begins. It is literally Schrodinger's cat. The act of observation changes the system.

Lakshmana

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

scottalanmiller

@Lakshmana said:

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

That's likely because the one that was removed was the one with the bootloader on it. So replacing the bootloader is needed.

In theory, you can also keep the bootloader on another device, like a USB stick.

Dashrender

@scottalanmiller said:

@Dashrender said:

@scottalanmiller said:

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

I completely understand that - and maybe @Lakshmana does as well. So the act of testing this only proves the system will continue to function while the system is running, but probably won't survive if you reboot the system or power it down and back up. And once you put reattach a drive, even if it's the original one you pulled for the test, it has to be completely rebuilt. But pulling the drive does prove if the system is working or not.

No, it proves that the system was working. It also proves that the is not working anymore. The act of pulling the drive breaks the array. So yes, you can verify that you used to have things right. But it puts you into a degraded state and you now have to get the system to repair itself. Presumably, you'd want to check that too.... and the cycle of never having a working array begins. It is literally Schrodinger's cat. The act of observation changes the system.

I understand the Schrodinger's cat reference and agree (mostly). but just because something says it's working, there are times when they don't yet nothing bad is reported indicating as such. That said, pulling a drive from a RAID is not something I would do to test a RAID - instead I'd be testing my backups. If you have to ensure your system's ability to recover from a single drive failure is that good and provides no down time, you probably really need to be using additional solutions such as clustering, etc.

Dashrender

@Lakshmana said:

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

Do you shut down the server before you remove sdb? and install sbc before you power on? I'm guessing that Linux MD RAID does not support this.

From what I THINK Scott is saying, the only way you could test this system is by leaving it running 100% of the time and pulling a drive out while running, and putting a new drive in also, while the system is running.

Now for a question. @scottalanmiller if my above example happens and he pulls sda (which holds the boot partition), and the re mirroring (is it called resilvering in RAID 1(10)?) is complete, there still won't be a boot partition so if the server has to be rebooted it will fail, right?

scottalanmiller

@Dashrender said:

I understand the Schrodinger's cat reference and agree (mostly). but just because something says it's working, there are times when they don't yet nothing bad is reported indicating as such.

Agreed, and you can do this one time to see if the process works conceptually when the device is not in production. Just attach the disk to another system and view the contents. But you can't do it for running systems, it is not a sustainable process.

But this is a case where "it says it is working" is all that you get. If you don't your RAID, stop using it and find one you do trust. That mechanism is doing a real test and is the best possible indicator. If the best possible isn't good enough, stop using computers. There's no alternative.

There is also a certain value to... if it is good enough for Wall St. the CIA, NASA, Canary Wharf, military, hospitals, nuclear reactors and other high demand scenarios, isn't it a bit silly to not trust it somewhere else?

scottalanmiller

@Dashrender said:

From what I THINK Scott is saying, the only way you could test this system is by leaving it running 100% of the time and pulling a drive out while running, and putting a new drive in also, while the system is running.

No, what I am saying is that RAID can never be tested in a live system by examining removed disks. Ever. It tells you the past, not the current or the future. So doing so puts you at risk without validating anything useful. It's a flawed concept to attempt.