Gluster and RAID question

stacksofplates

@biggen said in Gluster and RAID question:

@JaredBusch Once the volume is up and running how the heck does one share it out? That what I'm trying to do. I have a successful two node system running:
joe@glusternode1:/mnt$ sudo gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: ab19d123-eb34-4186-8a03-316a3fc790e3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusternode1:/data/xvdb1/brick
Brick2: glusternode2:/data/xvdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
That volume must now be mounted "somewhere" to access it. How do I mount it so Windows clients can access it? Do I simply mount the share in one of the nodes under /mnt/big_ole_gluster_space and then share out that mount point via Samba from that same Gluster node?

The preferred way is to use the GlusterFS FUSE client. Last I knew it's the only one that automatically handles failover and HA.

travisdh1

@biggen said in Gluster and RAID question:

@JaredBusch Once the volume is up and running how the heck does one share it out? That what I'm trying to do. I have a successful two node system running:
joe@glusternode1:/mnt$ sudo gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: ab19d123-eb34-4186-8a03-316a3fc790e3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusternode1:/data/xvdb1/brick
Brick2: glusternode2:/data/xvdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
That volume must now be mounted "somewhere" to access it. How do I mount it so Windows clients can access it? Do I simply mount the share in one of the nodes under /mnt and then share out that mount point via Samba?

If you want to experiment with it properly, you'll want to follow https://docs.gluster.org/en/latest/Administrator Guide/Accessing Gluster from Windows/

Creating the storage is just the first piece, if you want to share the storage and have it be fault tolerant, then there is a whole lot of other hoops to jump through. Which is also why everyone is saying to just mount it on one of the gluster boxes and create a normal samba share.

biggen

This was the piece of the puzzle I was missing. It explains at the bottom how to configure a simple Samba share.

When one types in "samba gluster" in Google, this unwieldy page is the very first hit. And since its from the official Gluster docs, it makes it seems that is the RIGHT way to do it. That was my confusion when I asked earlier about CTDB.

If one doesn't want to mess with CTDB then sharing out a simple Samba share on one of the Gluster nodes is real easy as I just found out. There is no fault tolerance as far as Samba goes since you are only dealing with a single Samba connection however.

scottalanmiller

@biggen said in Gluster and RAID question:

And since its from the official Gluster docs, it makes it seems that is the RIGHT way to do it.

It's a bit of a conceptual break. Gluster is the wrong place to be looking. That's a filesystem. In no other circumstance, ever, do you look at filesystem documentation (NTFS, XFS, EXT4, etc.) to ask about SMB networking.

So looking at Gluster in this way will be confusing because it doesn't really make any sense. Gluster is a filesystem. Samba is an SMB server. It just reads Gluster the same as any other filesystem if you want.

How do you share out from XFS, ZFS, NTFS, etc.? You do the same way with Gluster. However you answer the first part, is how you will normally answer the second part.

scottalanmiller

@biggen said in Gluster and RAID question:

There is no fault tolerance as far as Samba goes since you are only dealing with a single Samba connection however.

That's because you are "being weird" and acting like Gluster is replacing your hypervisors and virtualization. Since when do we build file servers without virtualizing them? Virtualize Samba and you solve it that way at the platform level. Or make Samba failover the way that Samba normally does.

Basically you are acting like Gluster is a special case, but it is not. Ignore that Gluster is the mechanism that you are using and everything gets really simple. Get fixated and Gluster, and you'll be looking for Gluster-specific answers to all the normal problems.

It's a bit like looking for a guide on how to drive a Ford. But you'll never find one. You'll just find guides to driving cars. The brand of car just doesn't matter, it's all the same. If you are convinced that you need a guide that is specific to steering a Ford, you'll be forever lost and confused thinking that it can't be done when, in reality, it's so simple that no guide exists outside of basic steering guides.

Dashrender

@scottalanmiller that's why I suggested that you make a post about the entire stack :

Hardware
hypervisor
storage (or vice versa with hypervisor)
VMs
storage inside VM
share from inside VM

biggen

I appreciate the explanation guys. Not being in the IT field (directly) for some time means I'm playing catch up with a lot of the stuff.

Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs. It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

Once the Gluster volume is up and running, you could then connect the hypervisor to the cluster assuming the hypervisor had Gluster Client support and then you have the massive cluster attached to the hypervisor as a SR to be used appropriately.

I'm just wondering if something like this would work.

travisdh1

@biggen said in Gluster and RAID question:

I appreciate the explanation guys. Not being in the IT field (directly) for some time means I'm playing catch up with a lot of the stuff.

Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs. It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

Once the Gluster volume is up and running, you could then connect the hypervisor to the cluster assuming the hypervisor had Gluster Client support and then you have the massive cluster attached to the hypervisor as a SR to be used appropriately.

I'm just wondering if something like this would work.

Would it work, of course. It wouldn't be very efficient tho.

Gluster would make more sense as the storage for VMs. No matter what size, you really don't need 3 boxes of drives just for backups till your environment is absolutely gigantic!

biggen

@travisdh1 Great thanks for that info. When you say storage for VMs are you speaking of a SAN? So your VMs are running off the Gluster?

Yeah I thought 3 nodes of storage + the hypervisor node sounded like a ton of equipment. I know you can buy single boxes that have 2 - 4 nodes inside of them to reduce the footprint.

travisdh1

@biggen said in Gluster and RAID question:

@travisdh1 Great thanks for that info. When you say storage for VMs are you speaking of a SAN? So your VMs are running off the Gluster?

Yeah I thought 3 nodes of storage + the hypervisor node sounded like a ton of equipment. I know you can buy single boxes that have 2 - 4 nodes inside of them to reduce the footprint.

Something like that. Basically Gluster would replace the SAN.

Those 2-4 node in a boxes are just horrible solutions if you want fault-tolerance. Basically, you still have a single point of failure, but now it takes down all 3 nodes instead of a single node.

Dashrender

@biggen said in Gluster and RAID question:

I appreciate the explanation guys. Not being in the IT field (directly) for some time means I'm playing catch up with a lot of the stuff.

Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs. It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

Once the Gluster volume is up and running, you could then connect the hypervisor to the cluster assuming the hypervisor had Gluster Client support and then you have the massive cluster attached to the hypervisor as a SR to be used appropriately.

I'm just wondering if something like this would work.

Why would you need a fault tolerant storage solution for your backups? i would think if it was that important - you'd more likely go to tapes as part of your backups D2D2T.

Dashrender

Question for those in the know - Can Gluster run on the same boxes as the hypervisor like in a hyperconveraged setup? It seems crazy to have a solution as @biggen is suggesting - 3 Gluster nodes and a single VM host using that Gluster cluster - i.e. SPOF in that one VM host.
And as he mentioned, that's a ton of hardware.

travisdh1

@Dashrender said in Gluster and RAID question:

Question for those in the know - Can Gluster run on the same boxes as the hypervisor like in a hyperconveraged setup? It seems crazy to have a solution as @biggen is suggesting - 3 Gluster nodes and a single VM host using that Gluster cluster - i.e. SPOF in that one VM host.
And as he mentioned, that's a ton of hardware.

Yes. Really easy if using a linux based KVM. Just create your Gluster storage and mount it as your VM config and storage directory. I've not done a setup like this myself, so I'm probably missing some high-points, but that's the basic idea.

Dashrender

@biggen said in Gluster and RAID question:

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

This seems to be a misunderstanding. There's nothing wrong with physical servers. Something has to run on the physical hardware to make it work, I don't know diddily squat about Gluster, but I image it works something like this:
A Linux OS is installed onto some smallish disk, possibly SD card, that is used to setup a Gluster cluster.
KVM, or some other hypervisor is installed into the Linux OS as well, the hypervisor is pointed to the Gluster cluster for SR
VM's are made in that hypervisor.

Now I'm guessing this can't be done with Hyper-V, since that can't run inside Linux (as far as I know), so you'd be forced to have hypervisor hosts and storage hosts (i.e. SAN/NAS) for Hyper-V and other hypervisors.

I'm looking forward to someone shredding this post.

biggen

@travisdh1 said in Gluster and RAID question:

@biggen said in Gluster and RAID question:

@travisdh1 Great thanks for that info. When you say storage for VMs are you speaking of a SAN? So your VMs are running off the Gluster?

Yeah I thought 3 nodes of storage + the hypervisor node sounded like a ton of equipment. I know you can buy single boxes that have 2 - 4 nodes inside of them to reduce the footprint.

Something like that. Basically Gluster would replace the SAN.

Those 2-4 node in a boxes are just horrible solutions if you want fault-tolerance. Basically, you still have a single point of failure, but now it takes down all 3 nodes instead of a single node.

Yeah I've always wondered about that multiple nodes in one case setup. Especially since I'd imagine the PSU backplane is probably being shared between all the nodes inside in some fashion.

@Dashrender said in Gluster and RAID question:

@biggen said in Gluster and RAID question:

I appreciate the explanation guys. Not being in the IT field (directly) for some time means I'm playing catch up with a lot of the stuff.

Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs. It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

Once the Gluster volume is up and running, you could then connect the hypervisor to the cluster assuming the hypervisor had Gluster Client support and then you have the massive cluster attached to the hypervisor as a SR to be used appropriately.

I'm just wondering if something like this would work.

Why would you need a fault tolerant storage solution for your backups? i would think if it was that important - you'd more likely go to tapes as part of your backups D2D2T.

You probably wouldn't. I was just trying to dream up a solution of doing a three cluster Gluster. Perhaps a VM SR would be a better scenario OR perhaps a massive NAS storage Gluster cluster holding raw 4K footage for a production company. Again, it was a hypothetical. I have a hard time imagining any scenario where I would need to ever contain this much storage unless I'm starting up my own YouTube or some sort. The guys over on Reddit in the r/Datahoarder sub are commonly collecting hundreds of TB of junk but that is mostly on spare parts and cobbled together machinery. I've never seen any massive storage scale done with my own eyes using production level equipment and software so I guess its more curiosity on my own part as to how it would work.

@travisdh1 said in Gluster and RAID question:

@Dashrender said in Gluster and RAID question:

Question for those in the know - Can Gluster run on the same boxes as the hypervisor like in a hyperconveraged setup? It seems crazy to have a solution as @biggen is suggesting - 3 Gluster nodes and a single VM host using that Gluster cluster - i.e. SPOF in that one VM host.
And as he mentioned, that's a ton of hardware.

Yes. Really easy if using a linux based KVM. Just create your Gluster storage and mount it as your VM config and storage directory. I've not done a setup like this myself, so I'm probably missing some high-points, but that's the basic idea.

I know there are lots of ways to skin the cat, but wouldn't you still need three separate Gluster nodes? Gluster recommends at least three in order to avoid split brain. If you used a two physical node system I don't think they want you to do that without an arbiter which is something I have no idea about.

@Dashrender said in Gluster and RAID question:

@biggen said in Gluster and RAID question:

On the three Gluster nodes, would you be installing a Linux OS directly to them (bare metal)? I know from reading here physical servers have fallen out of style. Is this a use case where a physical server still serves a purpose?

This seems to be a misunderstanding. There's nothing wrong with physical servers. Something has to run on the physical hardware to make it work, I don't know diddily squat about Gluster, but I image it works something like this:
A Linux OS is installed onto some smallish disk, possibly SD card, that is used to setup a Gluster cluster.
KVM, or some other hypervisor is installed into the Linux OS as well, the hypervisor is pointed to the Gluster cluster for SR
VM's are made in that hypervisor.

Now I'm guessing this can't be done with Hyper-V, since that can't run inside Linux (as far as I know), so you'd be forced to have hypervisor hosts and storage hosts (i.e. SAN/NAS) for Hyper-V and other hypervisors.

I'm looking forward to someone shredding this post.

I don't know any about Gluster either other than what I've gleaned in the last 24 hours. From what I toyed with, I spun up two Debian VMs and installed and configured the Gluster volume from those two VMs. Then I could (I didn't though) install the Glusterfs client on xcp-ng in order to connect to the cluster and then the hypervisor uses the cluster as a SR.

If you were talking about ONLY two physical nodes for everything, then what you say makes sense. I think you'd have to install your base OS (Debian, Cent, whatever...) on each node, configure the cluster, and install the hypervisor inside the same OS on both nodes in order to utilize the cluster.

There is a split brain issue with only using two nodes from what I've read though.

travisdh1

Yes. Really easy if using a linux based KVM. Just create your Gluster storage and mount it as your VM config and storage directory. I've not done a setup like this myself, so I'm probably missing some high-points, but that's the basic idea.

I know there are lots of ways to skin the cat, but wouldn't you still need three separate Gluster nodes? Gluster recommends at least three in order to avoid split brain. If you used a two physical node system I don't think they want you to do that without an arbiter which is something I have no idea about.

Yes, it would be a minimum of 3 hosts.

1337

@biggen said in Gluster and RAID question:

I know you can buy single boxes that have 2 - 4 nodes inside of them to reduce the footprint.

Yes, for instance the 2U 4-node servers from Supermicro.

Each node has 6 hot swap bays, dual CPUs, PCIe slot etc. So 4 complete servers in one.
alt text

Even if they're small they can be extremely powerful.

For instance each node can have 2 x 64 core AMD Epyc Rome CPU, 1TB RAM, 6x4TB SSD, NVMe Optane cache, 100 Gigabit ethernet.

With a price to match the specs....

scottalanmiller

@biggen said in Gluster and RAID question:

Lets say as a hypothetical one wanted to build out a 500TB Gluster cluster to be used as a backup target for VMs.

That's the first problem. Bottom line is: you don't.

And by that I don't mean that the tech is wrong. I mean the approach is wrong. Gluster is JUST a filesystem, it's not a target for VMs or anything like that. It's you want to transport fruit from your farm to the market. And you are asking "okay, so I want to use these Goodyear tires, how do I do it?"

You just don't. You look at the job holistically: "How do I store backups of VMs?" Then you answer it at the high level "I store them on an SMB file server!"

Then in the process of answering "How do I host an SMB file server?" you come up with "On a hypervisor."

Eventually you get "under the hood enough" that maybe, MAYBE, the question of "on what storage platform do I run my VMs" the answer becomes "Gluster". But the Gluster piece is not connected at all to the "backup target for VMs".

Just like the tires aren't connected to the fruit. Sure, there is a decent chance that the vehicle that hauls your fruit will use tires, and maybe even Goodyear tires, but it's an under the hood detail that has nothing directly to do with the fact that higher up the chain you are hauling fruit.

Gluster doesn't solve the kind of problem you are trying to solve. So the question doesn't make sense. And it is making you really confused.

scottalanmiller

@biggen said in Gluster and RAID question:

It looks like you need at least 3 nodes to build out the Gluster Cluster. Then, of course, you need an additional node for the hypervisor - so 4 nodes minimum.

No, the hypervisor would never be on a different node. It would almost be on the same cluster as the Gluster storage. If you separate it out, you break the vast majority of the value.

You are trying to use Gluster as if it were a SAN. Gluster can be used underneath a SAN. But a SAN would have no role to play in this kind of setup.

I think you are trying to ask good questions, but are adding so many assumptions by accident that you are floundering.

Start with your goal: "How do I make a backup target for VMs, it likely needs to be 500GB?"

And let's go from there. Absolutely nowhere should Gluster be involved until after loads and loads of other things have been figured out. And then, maybe, Gluster will come into the picture. But if Gluster is an option or not depends on lots of other things.

scottalanmiller

@biggen said in Gluster and RAID question:

So your VMs are running off the Gluster?

Gluster is generally used for that, yes. Because backup storage rarely can leverage the advantages of Gluster, it just doesn't make sense. But for VMs, that's Gluster's bread and butter.

VMs really "never" should be running off of a SAN. That's exactly the least likely option to make sense.