Large or small Raid 5 with SSD

Donahue

I am considering one big raid 5 with SSD's. My question is this, with equal capacity, is there any real difference between a large number of smaller disks vs a smaller number of larger disks? Assume that everything else is equal in regards to the controller, etc. I am looking at a drive count between say 4 and 16 and ~10-14TB capacity.

Donahue

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

Yes

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

I am considering one big raid 5 with SSD's. My question is this, with equal capacity, is there any real difference between a large number of smaller disks vs a smaller number of larger disks? Assume that everything else is equal in regards to the controller, etc. I am looking at a drive count between say 4 and 16 and ~10-14TB capacity.

Generally with old spinning Winchesters, rebuild time is so dramatic that more smaller drives is better for reliability. With SSDs, rebuilds are so fast that the opposite is often true. Fewer, larger drives means fewer items to fail.

Donahue

@scottalanmiller it seems like the trade off becomes something like this:

Larger disks means less drive bays used, less risk because there are less disks, but a higher cost per effective TB and a higher cost of having a cold spare on the shelf.

Smaller disks means more bays used (at some point this becomes important), more risk because of more risk sources, but less effect cost per TB, and cheaper cold spares?

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

@scottalanmiller it seems like the trade off becomes something like this:

Larger disks means less drive bays used, less risk because there are less disks, but a higher cost per effective TB and a higher cost of having a cold spare on the shelf.

Smaller disks means more bays used (at some point this becomes important), more risk because of more risk sources, but less effect cost per TB, and cheaper cold spares?

The spares might be cheaper, but you consume them more often. Probably not cheaper overall.

Donahue

@scottalanmiller said in Large or small Raid 5 with SSD:

@Donahue said in Large or small Raid 5 with SSD:

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

Yes

is there a rule of thumb for this point?

Donahue

@scottalanmiller said in Large or small Raid 5 with SSD:

@Donahue said in Large or small Raid 5 with SSD:

@scottalanmiller it seems like the trade off becomes something like this:

Larger disks means less drive bays used, less risk because there are less disks, but a higher cost per effective TB and a higher cost of having a cold spare on the shelf.

Smaller disks means more bays used (at some point this becomes important), more risk because of more risk sources, but less effect cost per TB, and cheaper cold spares?

The spares might be cheaper, but you consume them more often. Probably not cheaper overall.

interesting point

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

@scottalanmiller said in Large or small Raid 5 with SSD:

@Donahue said in Large or small Raid 5 with SSD:

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

Yes

is there a rule of thumb for this point?

Not really, it's a decently difficult calculation based on the value of uptime, data loss, cost of the extra drive, performance offsets, etc. Very hard to produce a RoT for that.

Because it really comes down to market prices, you tend to build out a RAID 5 and then just run the numbers to see the difference.

Of course you always do a RAID 6 before you consider a spare of any kind.

Dashrender

@scottalanmiller said in Large or small Raid 5 with SSD:

@Donahue said in Large or small Raid 5 with SSD:

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

Yes

The number of drives can play a factor? not just the amount of storage? and if so, what is that number, and how is it determined?

Dashrender

@scottalanmiller said in Large or small Raid 5 with SSD:

Of course you always do a RAID 6 before you consider a spare of any kind.

Really? The RAID 6 penalty isn't high enough to warrant keeping a hot spare?

Donahue

@Dashrender said in Large or small Raid 5 with SSD:

@scottalanmiller said in Large or small Raid 5 with SSD:

Of course you always do a RAID 6 before you consider a spare of any kind.

Really? The RAID 6 penalty isn't high enough to warrant keeping a hot spare?

Scott, I assume that not having the drive bay for a spare is the exception?

scottalanmiller

@Dashrender said in Large or small Raid 5 with SSD:

@scottalanmiller said in Large or small Raid 5 with SSD:

Of course you always do a RAID 6 before you consider a spare of any kind.

Really? The RAID 6 penalty isn't high enough to warrant keeping a hot spare?

The difference in reliability is huge. The difference in write performance is trivial, especially in modern systems buffered by cache. Yes, there is write expansion to think about, but modern systems using parity RAID are not concerned with IOPS, you'd be on NVMe if that were the case, and you'd need RAID handled a completely different way.

scottalanmiller

@Dashrender said in Large or small Raid 5 with SSD:

@scottalanmiller said in Large or small Raid 5 with SSD:

@Donahue said in Large or small Raid 5 with SSD:

also, with larger drive count SSD arrays, is there a point at which I should be looking at raid 6?

Yes

The number of drives can play a factor? not just the amount of storage? and if so, what is that number, and how is it determined?

The number of drives is the primary factor in whether or not a device will fail. More drives = more risk.

Amount of storage is the primary factor in how long it will take for an array to recovery.

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

@Dashrender said in Large or small Raid 5 with SSD:

@scottalanmiller said in Large or small Raid 5 with SSD:

Of course you always do a RAID 6 before you consider a spare of any kind.

Really? The RAID 6 penalty isn't high enough to warrant keeping a hot spare?

Scott, I assume that not having the drive bay for a spare is the exception?

Correct, but that's a super rare case. But can happen. But normally in that case, consider larger drives.

Donahue

So in general, an 8 drive raid 5 is more risky than a 4 drive raid 5, but how much so? I want to know how to calculate the tipping point between safety and cost.

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

So in general, an 8 drive raid 5 is more risky than a 4 drive raid 5, but how much so? I want to know how to calculate the tipping point between safety and cost.

It's pretty close, but not exactly, twice as likely to lose a drive. For loose calculations, just use double. If the four drive array is going to lose a drive once every five years, the eight drive array will lose two.

Donahue

Let me know if I am thinking about this correctly. If each drive in the 4 drive array was twice the price of the smaller drives, then the cost per year is basically the same. But what is different is you are twice as likely to have a second loss (with 8 drives) during a rebuild because there are twice as many primary failures, correct? So would this make a 4 drive raid 5 and an 8 drive raid 6 be similar in reliability?

Donahue

and rebuild times are more dependent on capacity, not on drive count? So with equal capacity, the rebuild should take the same amount of time?

scottalanmiller

@Donahue said in Large or small Raid 5 with SSD:

Let me know if I am thinking about this correctly. If each drive in the 4 drive array was twice the price of the smaller drives, then the cost per year is basically the same. But what is different is you are twice as likely to have a second loss (with 8 drives) during a rebuild because there are twice as many primary failures, correct? So would this make a 4 drive raid 5 and an 8 drive raid 6 be similar in reliability?

Your first theory is correct, drive failure during rebuild would make the primary failure mode roughly equal during that tiny window, assuming rebuilds are automate and instantaneous, which they are not.

The result though is incorrect. They would not be even remotely close in reliability. They would be orders of magnitude different. The eight drive RAID 6 would be thousands of times more reliable then the four drive RAID 5.

You can never isolate a failure mode, which as failure during a rebuild, and look at it in a vacuum to approximate a total failure rate. RAID reliability is the result of the interplay between several different failure modes.

This of it like an equation... x * y * z = resulting reliability. You can't isolate y and have any idea how the result will be, if x and z skyrocket when y reduces, the result might still be much higher.