Configuration for Open Source Operating systems with the SAM-SD Approach

GotiTServicesInc

I apologize for the way this may be laid out, I'm putting my thoughts down the best I can (I'm a bit scatter brained at the moment). Assume everything is a question

So Assuming I'm using a Linux OS for the base, dual controller SAS setup What different ways do we have to configure the system for a SAM-SD setup?

to make things simple, lets not talk about how to make the cluster, but assume there's four server, each server has 100TB attached to each controller (10 10TB drives, yes I know they don't exist but for easy numbers we'll use it).

Starting from the ground goind up (before clustering). There are a couple options for redundancy. Are all of these feasible? are all of these feasible for HA?

JBOD on each controller, md for software RAID 0 (200TB Total Storage)
JBOD on each controller, md for software RAID 1 between controllers (100TB Total Storage)
Raid 5/6 on each controller, md for software RAID 0 between controllers (160-180TB Total Storage)
Raid 5/6 on each controller, md for software RAID 1 between controllers (80-90TB Total Storage)
RAID 1 Array on each controller, md for software RAID 0 between controllers (100TB Total Storage)
RAID 1 Array on each controller, md for software RAID 1 between controllers (100TB Total Storage)

I think this would related to the number of SANs you have, for example, if you have 4 SANS in the cluster, JBOD with RAID 0 might be fine for you???

I feel like RAID 5/6 might be pointless? Sure you can support 1 to 2 failed drives in each hardware array, but considering you are supporting multiple software arrays this might just be extra overhead with no real benefit? or maybe the rebuild time on a big RAID 5/6 array is worth having this fail safe in place?

Hardware RAID 1, with software RAID 1, in a cluster, seems super over redundant, or maybe not if the data is that important to you (which I can see it being that important).

Also is the cluster for redundancy? or higher storage amounts? or some kind of both? (Can you do that?)

Maybe a question should be what is the normal setup for a dual controller SAN?

Any input is welcome!

Dashrender

I thought most dual controller SANs had all drives on both controllers to cover a controller failure, not a RAID between controllers setup - but then again I've never personally used a SAN, so what do I know?

scottalanmiller

@GotiTServicesInc said:

So Assuming I'm using a Linux OS for the base, dual controller SAS setup What different ways do we have to configure the system for a SAM-SD setup?

Linux as the base is almost always the right choice, although it seems reasonable to step back and consider that an open question as well depending on your needs.

Dual controller is very uncommon. What is driving that as an assumption? (And I realize you mean dual SAS not dual hardware RAID in that statement. Pointing out for anyone who glossed over it.)

scottalanmiller

@GotiTServicesInc said:

I think this would related to the number of SANs you have, for example, if you have 4 SANS in the cluster, JBOD with RAID 0 might be fine for you???

Might be, but the cluster itself matters there. Is that a big Network RAID 1 cluster? I know that you said to discount the clustering bit, but that info is necessary to answer this question. That they are clustered doesn't tell us anything specific that we could use to determine reliability or feasibility.

Dashrender

@scottalanmiller said:

@GotiTServicesInc said:

So Assuming I'm using a Linux OS for the base, dual controller SAS setup What different ways do we have to configure the system for a SAM-SD setup?

Linux as the base is almost always the right choice, although it seems reasonable to step back and consider that an open question as well depending on your needs.

Dual controller is very uncommon. What is driving that as an assumption? (And I realize you mean dual SAS not dual hardware RAID in that statement. Pointing out for anyone who glossed over it.)

Good point, I read dual SAS controller, but mentally read dual RAID controller setup - which didn't make sense.

Though, now, like Scott, I ask why the dual controller SAS setup instead of a single controller? Especially if you're considering clustering the setup with another SAM-SD.

Is there ever a case where you would cluster SAM-SDs and not do it for a RAID 1 like setup?

scottalanmiller

@GotiTServicesInc said:

I feel like RAID 5/6 might be pointless? Sure you can support 1 to 2 failed drives in each hardware array, but considering you are supporting multiple software arrays this might just be extra overhead with no real benefit? or maybe the rebuild time on a big RAID 5/6 array is worth having this fail safe in place?

RAID 5 almost never makes sense unless these are SSDs. RAID 6 will often make sense because the cost of rebuilding a full node, depending on what your cluster is, might be horrific. So often that is done not to protect against data loss but to protect against performance or availability loss.

But again, cluster specifics are required.

GotiTServicesInc

I wasn't too sure, I figured depending upon your needs, that would dictate how many SANs in a cluster and what RAID/ non RAID setup you'd go with

GotiTServicesInc

In my research I may have confused dual channel raid controllers with dual controllers. I know for current leads in our company they're requesting a dual controller box, but being that this box is operating by itself I'm assuming with the additional controller they're buying a bit more redundancy for that.

scottalanmiller

@GotiTServicesInc said:

I wasn't too sure, I figured depending upon your needs, that would dictate how many SANs in a cluster and what RAID/ non RAID setup you'd go with

Typically SAN come in three forms:

Single SAN, no replication. Just a server sharing out block storage.
Dual SAN, replicated. If the master fails the slave takes over. Data is mirrored.
Scale Out SAN. RAID does not apply as you are using RAIN.

Those are really the only models. The idea of large SAN clusters, while theoretically possible, using local RAID and network RAID to replicate the storage, effectively does not exist as it would be impractical. You'd be limited to RAID 1 meaning the more SANs you add you would become more reliable but you'd never be able to scale up. So put in 10 SANs with 100TB each and you'd still only have 100TB to use. No matter how many SANs you add to the cluster, you'd never get more than 100TB to use.

scottalanmiller

@GotiTServicesInc said:

In my research I may have confused dual channel raid controllers with dual controllers.

Definitely different things. One is more or less just for SAS level channel throughput. The other is for redundancy at the controller level (in theory.)

GotiTServicesInc

@scottalanmiller said:

@GotiTServicesInc said:

In my research I may have confused dual channel raid controllers with dual controllers.

Definitely different things. One is more or less just for SAS level channel throughput. The other is for redundancy at the controller level (in theory.)

Yeah I think I just had a brain fart when going over that, I understand the differences between it but got lost in the sauce swimming between the two topics heh

GotiTServicesInc

so really a HA SAN for an enterprise wouldn't really be more than 2 ish SANs? And more than that you would go for more of a RAIN setup?

I figured duplication over a network even with 40GB pipes would still be painful but wasn't sure

scottalanmiller

@GotiTServicesInc said:

I know for current leads in our company they're requesting a dual controller box, but being that this box is operating by itself I'm assuming with the additional controller they're buying a bit more redundancy for that.

You don't do dual controllers for protection, not in the real world. RAID controllers do very bad things when you pair them up, that's why enterprise servers don't ship that way, ever. Not even $100K servers are like that. And that's why dual controllers in SMB range SANs are bad, they actually cause outages rather than protecting against it.

One of the most important reads here is: Understanding the Relationship of Reliability and Redundancy

Reliability is the goal, redundancy is a tool. In this case, your redundancy would not support your goal so isn't a viable option.

scottalanmiller

In your OP you said dual SAS controllers, that is how this is handled for high end enterprise servers $50K and up. The high end doesn't use RAID controllers at all, only SAS. SAS controller with software RAID can do the redundancy that can't be done with what is on the market for RAID controllers well.

GotiTServicesInc

I assumed SAS has built in raid which I realize now was a bad assumtion. dual controllers would just be used to increase the number of drives you could stick in a box then which makes more sense.

scottalanmiller

@GotiTServicesInc said:

so really a HA SAN for an enterprise wouldn't really be more than 2 ish SANs? And more than that you would go for more of a RAIN setup?

Correct, enterprise SANs have traditional always been "scale up" or "vertical scaling" devices. Your size is determined by how big a single SAN can go. HA is provided by a combination of mainframe class design and features and mirroring to a second SAN. That's all we've ever had traditionally.

Moving to scale out systems is very and very niche still. Using scale out with SAN is still relatively rare and problematic. The first vendors are just starting to get their footing on this and there are generally performance issues.

Dashrender

@scottalanmiller said:

In your OP you said dual SAS controllers, that is how this is handled for high end enterprise servers $50K and up. The high end doesn't use RAID controllers at all, only SAS. SAS controller with software RAID can do the redundancy that can't be done with what is on the market for RAID controllers well.

So with the OP, would the question be more like, which RAID level should I use over all theses drives in software, vs one controller RAIDed against the other controller?

And would a SAM-SD really look at skipping hardware RAID for software?

GotiTServicesInc

I did skim that but wasn't sure where the risk vs. reward scale crossed in regards to RAID

scottalanmiller

@GotiTServicesInc said:

I assumed SAS has built in raid which I realize now was a bad assumtion. dual controllers would just be used to increase the number of drives you could stick in a box then which makes more sense.

SAS and SATA are the protocols that storage uses to communicate. RAID controllers talk SAS and/or SATA, but most controllers do this without having hardware RAID.

You don't use multiple controllers for more drives. There isn't any normal server on the market that goes beyond what a single controller can handle. A single good controller will do hundreds of drives.

MattSpeller

@Dashrender well, if it's not running other apps on the OS why save the CPU with a RAID card?

edit: I just realized you were tl;dr'ing, sorry - this post was for OP