Posts made by olivier

olivier

Because it allows an abstraction of the hardware, for replacing/patching/rebooting stuff without even lose service (or to avoid to do so a week end for instance)

olivier

I consider myself as a SMB (3 sockets!) and I need live migration, that's really something useful. That's also used a LOT by our customers. Maybe a XenServer users bias. But it's real there.

olivier

@kooler said in Xenserver and Storage:

@olivier said in Xenserver and Storage:

Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.

I did a recap on local vs (non hyperconverged) shared storage in XS:

Most of the "budget" SMB customers shouldn't care about that.

This is not my point of view. Eg even for my small production setup, hosted in a DC, it's not obvious to migrate big VMs on local SR from a host to another to avoid service interruption.

edit: I'm using XOSAN for my own production setup, best way to sell a product

olivier

Having local storage is good for perfs, but you can't live migrate without moving the disks or HA on the other host.

I did a recap on local vs (non hyperconverged) shared storage in XS:

olivier

That's a lot of interesting technical stuff, but it's not like that when you build a product Let me explain.

Market context

Here, we are in the hyperconvergence world. In this world, users want some advantages against traditional model (storage separated from compute). So, the first question you need to answer before building a solution, is "what users want?". So, we made some research and found that in general, people want, in decreasing order:

Cost/ROI (in short, simpler infrastructure to manage will reduce cost)
HA features
Ease of scaling and correct level of performances (in short: they don't want to be blocked and have worst performances than with existing solutions, or that perfs are too bad compared to using advantages of costs/features/flexibility/security).

These "priorities" came from our studies but also from Gartner.

Technical context

Then, we are addressing XenServer world. When you have hyperconverged solution there, you have a some "limitations":

a shared SR can only be used within a pool (so 1 to 16 hosts max)
more you modify the Dom0, worst it is

So with this context, you won't scale more than 16 hosts MAX.

Real life usage

So we decided to take a look with some benchmarks, and despite choosing in priority something safe/flexible, we had pretty nice performances, as you can see in our multiple benchmarks.

In short: performances are correct. If it wasn't, we would have stopped the project (or switched to another technology).

Regarding the "cluster goes boom", no: it goes in RO for your VMs, so it won't erase/corrupt your data.

olivier

Gluster client is installed in Dom0 (the client to access data). But Gluster server are in VMs, so you got more flexibility.

If the node with arbiter goes down, yes, you are in RO. But you won't enter a split brain scenario (which is the worst case in 2 nodes thing).

Eg using DRBD, in 2 nodes in multi-master, if you just lose the replication link, and you wrote on both sides, you are basically f***ed (you'll need to discard data on one node).

There is no miracle: play defensive (RO if one node down) or risky (split brain). We chose the "intermediate" way, safe and having 50% of chance to lose the "right" node without being in RO then.

Obviously, 3 nodes is the best spot when you decide to use hyperconvergence at small scale. Because the usual 3rd physical server used previously for storage, can be also now a "compute" node (hypervisor) with storage, and you could lose any host of the 3 without being in read only (disperse 3).

edit: XOSAN allow to go from 2 to 3 nodes while your VM are running, ie without any service interruption. So you can start with 2 and extend later

olivier

@matteo-nunziati This is why we have an extra arbiter VM in 2 nodes setup. I node got 2 VMs (1x normal and 1x arbiter), and the other one just a normal VM.

This way, if you lose the host with one gluster VM, it will still work and you can't have a split-brain scenario.

An arbiter node cost very few resources (it just works with metadata)

olivier

@emad-r said in Xenserver and Storage:

@jrc

Hmm I see alot of Vsan advice which is the correct way to go, but I also wonder cant he do a simple thing like GlusterFS VM ? as well ? will that work in this case, and be simpler route ?

No simpler if not understood or with a turnkey "layer" on top.

Gluster is not that complicated, but still, you need to grasp some concepts. It's like Xen vs XenServer in short. Second is turnkey and you don't need to get all stuff needed vs learning Xen "alone" on your distro.

olivier

Gluster on 2 nodes won't be slow or problematic (which problems?) just a bit complicated without a turnkey deployment method (ie XOSAN).

olivier

It very likely means: replicate blocks are made on 2 nodes, and then other pool members are connected to this setup. It doesn't mean storage is scaled on all local SR within all hosts.

That's also something you could do on XOSAN: use a limited number of hosts to store data (from 2 to n). But you could also use all of them.

So in HA Lizard case, it means if you lose 2 hosts on a 16 pool for example, your data is gone.

Hope it's more clear this way

olivier

@danp I don't think it could do more than that because that's block replication without a cluster FS.

olivier

@jrc So HA lizard could do the trick in your case (or you can set ip up manually, basically DRBD between 2 hosts)

olivier

Indeed, XOSAN could fill the gap and create a "VSAN" like solution.

However, if you don't plan to get bigger than 2 hosts, you can also take a look a "HA lizard" (which is basically a DRBD block replication between 2 local storages).

Anyway extra questions:

do you have RAID support in your XS host machines?
would you like to add hosts in the future? (if you think it's yes, HA lizard is out of the equation)

olivier

We validated our first provider: https://xen-orchestra.com/blog/xosan-on-10gbps-io/

Next? Probably a hardware provider

olivier

Thanks for the reminder, upgraded it on xen-orchestra.com/forum

olivier

@fateknollogee said in XenServer hyperconverged:

What is the difference in performance between the two options?

Disperse requires more compute performance because it's a complex algorithm (based on reed-solomon). So it's slower vs replication, but it's not a big deal if you are using HDDs.

However, if you are using SSDs, disperse will be a bottleneck, so it's better to go on replicate.

Ideal solution? Disperse for large storage space on HDDs, and Replicated on SSDs… at the same time (using tiering, which will be available soon). Chunks that are read often will be promoted to the replicated SSDs storage automatically (until it's almost full). If more accessed chunks appears in the future, some chunks will be demoted to "slower" tier and replaced by the new hot ones.

olivier

@dustinb3403 That's it, indeed

fist picture: you can lose up to 2 hosts (any of them)
second picture: you can lose up to 3 hosts (1 by pair)

olivier

Here it is with improved pics of XOSAN, I suppose it's more clear now:

0_1505215577248_8_DISPERSE_6(2).PNG

0_1505215604111_5_DISTRIB-REP 3x2.PNG

What do you think?

olivier

@r3dpand4 That has nothing to do with deduplication. There is just chunks of files replicated or distributed-replicated (or even disperse for disperse mode).

By the way, nobody talks about this mode, but it's my favorite Especially for large HDD, it's perfect. Thanks to the ability to lose any of n disk in your cluster. Eg with 6 nodes:

This is disperse 6 with redundancy 2 (like RAID6 if you prefer). Any 2 XenServer hosts can be destroyed, it will continue to work as usual:

And in this case (6 with redundancy of 2), you'll be able to address 4/6th of your total disk space!

olivier

@r3dpand4 This is a good question. We made the choice to use "sharding", which means making blocks of 512MB for your data to be replicated or spread.

So the heal time will be time to fetch all new/missing 512MB blocks of data since node was down. It's pretty fast on the tests I've done.