Configure Software RAID 1 in Centos

scottalanmiller

@scottalanmiller said:

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

I completely understand that - and maybe @Lakshmana does as well. So the act of testing this only proves the system will continue to function while the system is running, but probably won't survive if you reboot the system or power it down and back up. And once you put reattach a drive, even if it's the original one you pulled for the test, it has to be completely rebuilt. But pulling the drive does prove if the system is working or not.

No, it proves that the system was working. It also proves that the is not working anymore. The act of pulling the drive breaks the array. So yes, you can verify that you used to have things right. But it puts you into a degraded state and you now have to get the system to repair itself. Presumably, you'd want to check that too.... and the cycle of never having a working array begins. It is literally Schrodinger's cat. The act of observation changes the system.

Lakshmana

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

scottalanmiller

@Lakshmana said:

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

That's likely because the one that was removed was the one with the bootloader on it. So replacing the bootloader is needed.

In theory, you can also keep the bootloader on another device, like a USB stick.

Dashrender

@scottalanmiller said:

@Dashrender said:

@scottalanmiller said:

@Dashrender said:

If it's hardware RAID, and he pulls a drive, won't the system always work? because both drives are suppose to be identical?

It's not hardware but hot swap that enables this. You can easily do this with MD RAID too, if done correctly. The problem is, the moment that you do this, the drive that you pull is out of date and no longer a valid member of the array. So anything that you "check" is now ruined AND you are running without RAID until you replace the drive and it rebuilds. So the thing that you check gets blown away the instant that you check it.

I completely understand that - and maybe @Lakshmana does as well. So the act of testing this only proves the system will continue to function while the system is running, but probably won't survive if you reboot the system or power it down and back up. And once you put reattach a drive, even if it's the original one you pulled for the test, it has to be completely rebuilt. But pulling the drive does prove if the system is working or not.

No, it proves that the system was working. It also proves that the is not working anymore. The act of pulling the drive breaks the array. So yes, you can verify that you used to have things right. But it puts you into a degraded state and you now have to get the system to repair itself. Presumably, you'd want to check that too.... and the cycle of never having a working array begins. It is literally Schrodinger's cat. The act of observation changes the system.

I understand the Schrodinger's cat reference and agree (mostly). but just because something says it's working, there are times when they don't yet nothing bad is reported indicating as such. That said, pulling a drive from a RAID is not something I would do to test a RAID - instead I'd be testing my backups. If you have to ensure your system's ability to recover from a single drive failure is that good and provides no down time, you probably really need to be using additional solutions such as clustering, etc.

Dashrender

@Lakshmana said:

@DashrenderWhen I reboot with two hard disk(sda,sdb) there is no issue.When I connect a new hard dis(sdc) after removing sdb I check the hard disk separately the ssue came as kernel panic

Do you shut down the server before you remove sdb? and install sbc before you power on? I'm guessing that Linux MD RAID does not support this.

From what I THINK Scott is saying, the only way you could test this system is by leaving it running 100% of the time and pulling a drive out while running, and putting a new drive in also, while the system is running.

Now for a question. @scottalanmiller if my above example happens and he pulls sda (which holds the boot partition), and the re mirroring (is it called resilvering in RAID 1(10)?) is complete, there still won't be a boot partition so if the server has to be rebooted it will fail, right?

scottalanmiller

@Dashrender said:

I understand the Schrodinger's cat reference and agree (mostly). but just because something says it's working, there are times when they don't yet nothing bad is reported indicating as such.

Agreed, and you can do this one time to see if the process works conceptually when the device is not in production. Just attach the disk to another system and view the contents. But you can't do it for running systems, it is not a sustainable process.

But this is a case where "it says it is working" is all that you get. If you don't your RAID, stop using it and find one you do trust. That mechanism is doing a real test and is the best possible indicator. If the best possible isn't good enough, stop using computers. There's no alternative.

There is also a certain value to... if it is good enough for Wall St. the CIA, NASA, Canary Wharf, military, hospitals, nuclear reactors and other high demand scenarios, isn't it a bit silly to not trust it somewhere else?

scottalanmiller

@Dashrender said:

From what I THINK Scott is saying, the only way you could test this system is by leaving it running 100% of the time and pulling a drive out while running, and putting a new drive in also, while the system is running.

No, what I am saying is that RAID can never be tested in a live system by examining removed disks. Ever. It tells you the past, not the current or the future. So doing so puts you at risk without validating anything useful. It's a flawed concept to attempt.

scottalanmiller

@Dashrender said:

Now for a question. @scottalanmiller if my above example happens and he pulls sda (which holds the boot partition), and the re mirroring (is it called resilvering in RAID 1(10)?) is complete, there still won't be a boot partition so if the server has to be rebooted it will fail, right?

Correct, boot partitions need to be handled manually.

Dashrender

So the next question - Why are you using MX RAID instead of hardware RAID?

If I have to guess, it's because this is a test box, probably an old PC that doesn't have real RAID in it, so you can't test real RAID.

Testing MX RAID does not validate hardware RAID, so this test is also moot, assuming the production box will have hardware RAID.

Dashrender

@scottalanmiller said:

@Dashrender said:

From what I THINK Scott is saying, the only way you could test this system is by leaving it running 100% of the time and pulling a drive out while running, and putting a new drive in also, while the system is running.

No, what I am saying is that RAID can never be tested in a live system by examining removed disks. Ever. It tells you the past, not the current or the future. So doing so puts you at risk without validating anything useful. It's a flawed concept to attempt.

With hardware RAID, you shut down the system and boot the system from either drive. In a software solution like the one in question, that does not appear to be the case. This is what I was getting at. I wasn't talking at all about how useful this test would or wouldn't be.

scottalanmiller

@Dashrender said:

With hardware RAID, you shut down the system and boot the system from either drive. In a software solution like the one in question, that does not appear to be the case. This is what I was getting at. I wasn't talking at all about how useful this test would or wouldn't be.

In hardware RAID you would break the array and cause the same problems. Sure it would boot, but it would not test what you thought you were testing and it would leave you with a broken array. There is no value to the test but a lot of risk.

scottalanmiller

@Dashrender said:

So the next question - Why are you using MX RAID instead of hardware RAID?

If I have to guess, it's because this is a test box, probably an old PC that doesn't have real RAID in it, so you can't test real RAID.

Testing MX RAID does not validate hardware RAID, so this test is also moot, assuming the production box will have hardware RAID.

MD RAID is completely real and very enterprise. This isn't Windows, no reason to avoid MD RAID in production.

A Former User

@Dashrender said:

So the next question - Why are you using MX RAID instead of hardware RAID?

If I have to guess, it's because this is a test box, probably an old PC that doesn't have real RAID in it, so you can't test real RAID.

Testing MX RAID does not validate hardware RAID, so this test is also moot, assuming the production box will have hardware RAID.

Hardware RAID vs Soft raid isn't as big a deal as it used to be. The only big issues is no hardware cache and your cpu will take a slight performance hit, and possibly slightly longer re-build times. Neither of which are a big deal.

Dashrender

@scottalanmiller said:

@Dashrender said:

With hardware RAID, you shut down the system and boot the system from either drive. In a software solution like the one in question, that does not appear to be the case. This is what I was getting at. I wasn't talking at all about how useful this test would or wouldn't be.

In hardware RAID you would break the array and cause the same problems. Sure it would boot, but it would not test what you thought you were testing and it would leave you with a broken array. There is no value to the test but a lot of risk.

What is it you think @Lakshmana is testing? Let's assume I asked this same question. The only thing I would be testing is - A: can either disk boot back up to the previous state? Do both drives have the same data as of the time I took them offline? I'm not sure what else i would be testing? If I saw that a drive didn't have any data on it, but the other did, I would know there was something wrong withe the RAID system.

Now that said, I've personally NEVER tested a RAID system, hardware or software to this degree. I just trust that it's working out of the box, and so far I've never been let down - one drive fails, I replace it, some time later another drive fails, I replace it, etc.. and my server experiences no downtime.

But just because I trust the system doesn't mean everyone does. So doing this test on a system before it goes live in production (but never while in production) isn't unreasonable if the manager wants it.

scottalanmiller

@thecreativeone91 said:

Hardware RAID vs Soft raid isn't as big a deal as it used to be. The only big issues is no hardware cache and your cpu will take a slight performance hit, and possibly slightly longer re-build times. Neither of which are a big deal.

Actually rebuild times have been, on average, faster with software RAID since around 2001. The Pentium III was the first CPU where software RAID typically rebuilt faster with software than hardware because the main CPU was just so much faster than the offload RAID processing unit.

scottalanmiller

@Dashrender said:

What is it you think @Lakshmana is testing?

What they are trying to do is "look at the files" to see if they replicated. But this only tells them that they DID replicate on an old array that they've now blown away in order to test this.

What you CAN do, and this is not advised, is power down the system, remove the disk, attach it to a secondary system, observe it read only, replace it and power back on.

Dashrender

@scottalanmiller said:

@Dashrender said:

So the next question - Why are you using MX RAID instead of hardware RAID?

If I have to guess, it's because this is a test box, probably an old PC that doesn't have real RAID in it, so you can't test real RAID.

Testing MX RAID does not validate hardware RAID, so this test is also moot, assuming the production box will have hardware RAID.

MD RAID is completely real and very enterprise. This isn't Windows, no reason to avoid MD RAID in production.

Well in RAID 1/10 I suppose the added load today probably isn't an issue for the processor compared to say a RAID 6. But have processors become so powerful that on SMB systems we no longer need to worry about performance drain from doing RAID 6?

scottalanmiller

@Dashrender said:

But just because I trust the system doesn't mean everyone does. So doing this test on a system before it goes live in production (but never while in production) isn't unreasonable if the manager wants it.

No, it is very unreasonable. Just because people lack trust doesn't mean that it is reasonable to not trust things. Literally millions of these are in use and have been for decades and work every day. Not trusting this is completely unreasonable and irrational. There are so many places to place your worries that are realistic. Spinning wheels trying to validate an irrational lack of faith in something so insanely well proven is completely unreasonable.

A Former User

@Dashrender said:

@scottalanmiller said:

@Dashrender said:

So the next question - Why are you using MX RAID instead of hardware RAID?

If I have to guess, it's because this is a test box, probably an old PC that doesn't have real RAID in it, so you can't test real RAID.

Testing MX RAID does not validate hardware RAID, so this test is also moot, assuming the production box will have hardware RAID.

MD RAID is completely real and very enterprise. This isn't Windows, no reason to avoid MD RAID in production.

Well in RAID 1/10 I suppose the added load today probably isn't an issue for the processor compared to say a RAID 6. But have processors become so powerful that on SMB systems we no longer need to worry about performance drain from doing RAID 6?

Your load is likely less in SMB. I think a lot of the fuss has been some admins don't understand the tools of software raid and how to use it as it can be seen as more complex than just going into your HW raid boot rom and setting up a LUN.

Many SANs are using only software RAID

scottalanmiller

@Dashrender said:

Well in RAID 1/10 I suppose the added load today probably isn't an issue for the processor compared to say a RAID 6. But have processors become so powerful that on SMB systems we no longer need to worry about performance drain from doing RAID 6?

That was in 2001!! RAID 7, which uses way more processor power than anything else, is software only! There is no need for SMB to be a factor, RAID has the same load and impact no matter what the environment size. It is the array size that makes the difference and these vary little between company sizes. You've not needed to worry about the "drain" of any RAID level for nearly a decade and a half. And fifteen years ago it was only small Windows-based systems on Intel Pentium II and lower than were an issue. Enterprise servers have always been pure software RAID, even twenty years ago.