XenServer hyperconverged

FATeknollogee

Are the hosts shown in your example using HW or software RAID?

What is preferred, HW or software RAID?

DustinB3403

@fateknollogee said in XenServer hyperconverged:

Are the hosts shown in your example using HW or software RAID?

What is preferred, HW or software RAID?

Based on the blog post I'm guessing HW raid

olivier

@r3dpand4 said in XenServer hyperconverged:

@olivier We're talking about node failure where you're replacing hardware correct?
"On a 2 node setup, there is an arbiter VM that acts like the witness. If you lose the host with the 2x VMs (one arbiter and one "normal"), you'll go in read only."
So are Writes suspended until the 2nd node is brought back online and introduced to the XOSAN? Obviously that's going to be a lot longer than a few seconds even if you're talking about scaling outside of a 2 node cluster. Do you mean that Writes are suspended or cached in the event of a failure while Host is shutting down and then Writes resume as normal on the Active node? If this is the case when the new Host is introduced to the cluster the replication resumes, correct?

Nope, writes are suspended when a node is down (time for system to know what to do). If there is enough nodes to continue, writes are resumed after being paused few secs. If there isn't enough nodes to continue, it will be then in read only.

Let's imagine you have 2x2 (distributed-replicated). You lose one XenServer host in the first mirror. After few secs, writes are back without having any service failed. Then, when you'll replace the faulty node, this fresh node will "keep up" the missing data in the mirror, but your VM won't notice it (healing status).

olivier

@fateknollogee said in XenServer hyperconverged:

Are the hosts shown in your example using HW or software RAID?

What is preferred, HW or software RAID?

@dustinb3403 said in XenServer hyperconverged:

@fateknollogee said in XenServer hyperconverged:

Are the hosts shown in your example using HW or software RAID?

What is preferred, HW or software RAID?

Based on the blog post I'm guessing HW raid

It's not that easy to answer. Phase III will bring multi-disk capability on each host (and even tiering). So it means you could use any number of disks on each hosts to make inception-like scenario (replication on host level + on cluster level). But obviously, hardware raid is perfectly fine too

DustinB3403

During an event where a host goes down, and for that brief time period where writes are paused, are those writes cached and then written once the system determines what to do?

Or are those writes lost?

R3dPand4

@olivier Thank you for clarifying, I'm assuming this would apply principally at least the same to a 2 node cluster? One goes down, writes are briefly suspended, writes resume on the Active node, failed node is replaced, then rebuild/healing process continues on the New node. How long are you expecting for rebuilds? I'm sure that's a loaded question because it's data dependent.....

olivier

@dustinb3403 No writes are lost, it's handled on your VM level (VM OS wait for "ack" of virtual HDD but it's not answering, so it waits). Basically, cluster said: "writes command won't be answered as long as we figured it out".

So it's safe

olivier

@r3dpand4 This is a good question. We made the choice to use "sharding", which means making blocks of 512MB for your data to be replicated or spread.

So the heal time will be time to fetch all new/missing 512MB blocks of data since node was down. It's pretty fast on the tests I've done.

R3dPand4

@olivier So essentially just deduplication?

olivier

@r3dpand4 That has nothing to do with deduplication. There is just chunks of files replicated or distributed-replicated (or even disperse for disperse mode).

By the way, nobody talks about this mode, but it's my favorite Especially for large HDD, it's perfect. Thanks to the ability to lose any of n disk in your cluster. Eg with 6 nodes:

This is disperse 6 with redundancy 2 (like RAID6 if you prefer). Any 2 XenServer hosts can be destroyed, it will continue to work as usual:

And in this case (6 with redundancy of 2), you'll be able to address 4/6th of your total disk space!

olivier

Here it is with improved pics of XOSAN, I suppose it's more clear now:

0_1505215577248_8_DISPERSE_6(2).PNG

0_1505215604111_5_DISTRIB-REP 3x2.PNG

What do you think?

DustinB3403

@olivier That picture helps make it way more clear.

Each server is providing 100GB and either are standalone systems (disperse) or are paired (dist. repl).

olivier

@dustinb3403 That's it, indeed

fist picture: you can lose up to 2 hosts (any of them)
second picture: you can lose up to 3 hosts (1 by pair)

FATeknollogee

What is the difference in performance between the two options?

olivier

@fateknollogee said in XenServer hyperconverged:

What is the difference in performance between the two options?

Disperse requires more compute performance because it's a complex algorithm (based on reed-solomon). So it's slower vs replication, but it's not a big deal if you are using HDDs.

However, if you are using SSDs, disperse will be a bottleneck, so it's better to go on replicate.

Ideal solution? Disperse for large storage space on HDDs, and Replicated on SSDs… at the same time (using tiering, which will be available soon). Chunks that are read often will be promoted to the replicated SSDs storage automatically (until it's almost full). If more accessed chunks appears in the future, some chunks will be demoted to "slower" tier and replaced by the new hot ones.

olivier

We validated our first provider: https://xen-orchestra.com/blog/xosan-on-10gbps-io/

Next? Probably a hardware provider

JaredBusch

@olivier said in XenServer hyperconverged:

We validated our first provider: https://xen-orchestra.com/blog/xosan-on-10gbps-io/

Next? Probably a hardware provider

Congrats