Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment

fabiorpn

Someone using @halizard IN 2021? I am implementing it on 2 XCP-NG 8.2 hosts and everything is going well but the tests are not over yet. But I was unable to create the SR using XCP-NG Center. I get the error: the SR could not be connected because the driver gfs2 was not recognized. But it is possible to see that it is installed with the command yum search gfs2 (this command finds glusterfs and also gives the correct name). the command yum install <full name I don't remember> --enablerepo = base,updates downloads from the debian base repository but without enabling this repository in the default XCP-NG settings (the XCP-NG documentation guides you through this) . Finally, the command mkfs.gfs2 -V shows that this is installed, but it still doesn't work. I suppose (based on various forums) it is a problem with the XCP-NG Center because creating and connecting the SR via the graphical screen directly on the server (or through xsconsole) everything worked perfectly. When doing in the primary, the same SR appeared immediately in the secondary.
Now the war continues. After all the configuration is done on an open internet network, I need to change the IP of the management card to an IP of my internal network and perform some reboots to see how the system behaves. This part is causing some problems but they will be overcoming. I'll keep you informed.
thank you all.

dbeato

Good topic to start here
https://xcp-ng.org/forum/topic/2434/ha-questions/4

Also have you checked out XOSAN
https://xen-orchestra.com/#!/xosan-features (Now That I read it, you would need XOSANv2 since v1 is not compatible with XCP-ng.

fabiorpn

@dbeato hi!!
but did i post it in the wrong place? if yes i apologize, it's my first time here.

DustinB3403

@dbeato said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

Good topic to start here
https://xcp-ng.org/forum/topic/2434/ha-questions/4

Also have you checked out XOSAN
https://xen-orchestra.com/#!/xosan-features (Now That I read it, you would need XOSANv2 since v1 is not compatible with XCP-ng.

XOSAN is a closed feature, you can build your own vsan using xcp-ng but have no orchestra (which is what XOSAN is)

fabiorpn

@dbeato The idea is to use only 2 nodes. sharing local storage. we don't have the resources to invest in a SAN. functionality that HA-LIZARD proposes to do. The tests continue. we are currently simulating failures and describing what to do if failures happen.

When exchanging IP from an open network to a closed network, from both nodes simultaneously the master works fine (after restart) but the slave loses all network connections. After the master wakes up, it is necessary to perform an emergency reset of the network settings on the slave and reboot it. when he wakes up, it is necessary to recreate his bond (through the xcp-ng center) and so the synchronism is resumed. but even so the vm didn’t stop working on the master. We will still improve this procedure by doing this process in maintenance mode one node at a time.

dbeato

@fabiorpn no, I meant that the post would indicate where you want to go. HA-Lizard even on the days I used it on XenServer was still subpar. Very slow and error prone.

fabiorpn

@dbeato ah, thank you. I am really seeing these errors, but I have read a lot about them and we do not have the resources to invest in paid solutions. Our structure is small: 1 Windows server 2019 (AD, DHCP, DNS ...) an old redhat, an ubuntu for an intranet page, a windows server 2016 for some applications. All are accessed 24 hours a day but by FEW users. So we are betting on this solution. We are hopeful.

dbeato

@fabiorpn So do you have any downtime for updates on the actually VMs?

DustinB3403

@fabiorpn said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@dbeato ah, thank you. I am really seeing these errors, but I have read a lot about them and we do not have the resources to invest in paid solutions. Our structure is small: 1 Windows server 2019 (AD, DHCP, DNS ...) an old redhat, an ubuntu for an intranet page, a windows server 2016 for some applications. All are accessed 24 hours a day but by FEW users. So we are betting on this solution. We are hopeful.

So why not settle for Continuous Replication (a free feature if you use Xen Orchestra) to replicate between the 2 servers every 2-5 minutes.

@dbeato said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@fabiorpn So do you have any downtime for updates on the actually VMs?

This by itself is worth investigating, I assume someone said "We need HA" without understanding what it actually implies.

DustinB3403

@fabiorpn said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

So we are betting on this solution. We are hopeful.

Quoting this again, so you don't need High Availability, you just need near HA, 99.9% uptime. Use Continuous Replication, I assume you've installed Xen Orchestra to administer these hypervisors, correct?

If not, see my github

1337

@dustinb3403 said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@fabiorpn said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

So we are betting on this solution. We are hopeful.

Quoting this again, so you don't need High Availability, you just need near HA, 99.9% uptime. Use Continuous Replication, I assume you've installed Xen Orchestra to administer these hypervisors, correct?

If not, see my github

I was thinking the same thing. The only thing is that there is no fail-over mechanism in Xen Orchestra for this right?

So if one host fails, you have to start the replicated VMs on the other host manually. Is that correct?

DustinB3403

@pete-s said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@dustinb3403 said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@fabiorpn said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

So we are betting on this solution. We are hopeful.

Quoting this again, so you don't need High Availability, you just need near HA, 99.9% uptime. Use Continuous Replication, I assume you've installed Xen Orchestra to administer these hypervisors, correct?

If not, see my github

I was thinking the same thing. The only thing is that there is no fail-over mechanism in Xen Orchestra for this right?

So if one host fails, you have to start the replicated VMs on the other host manually. Is that correct?

You're asking "how do we ensure the system is running" and the challenge to that is even with HA, the Host doesn't know that the guest is actually hosting services, it just knows that it's powered up (or attempting to start).

The services this guest is providing (and if its running or not) is something of a fuzzy situation.

You can get the guest to automatically start on the remote if a host goes offline by enabling HA (auto start) with best effort for the pool.

But this is again a "fuzzy" situation because running to the Host means "I see it has "power" so it must be running."

Running to your users means "I can access the services this system is hosting." Your mileage may vary...

DustinB3403

Even just configuring the 2 servers in a pool, and enabling HA would likely be way more than enough of a uptime configuration and that configuring CR is likely overkill.

Which, by default the servers would be in the same pool (if following recommended settings).

Granted it's not shared storage, so the VM would have to move to the other host, but even then it might be good enough.

There are too many unknowns to solidly answer this, but the two options I would be looking at would be CR or just a standard pool with HA turned on for the VM's.

1337

@dustinb3403 said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

There are too many unknowns to solidly answer this, but the two options I would be looking at would be CR or just a standard pool with HA turned on for the VM's.

But you need shared storage for a standard pool with HA. So either your have a SPOF in which case HA doesn't really make sense or you're back to finding some kind of hyperconverged solution.

Or do you mean to restore the VM from backup on the second host?

DustinB3403

@pete-s said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

@dustinb3403 said in Ha-lizard on the XCP-NG 8.2 in 2021. Progress of my deployment:

There are too many unknowns to solidly answer this, but the two options I would be looking at would be CR or just a standard pool with HA turned on for the VM's.

But you need shared storage for a standard pool with HA. So either your have a SPOF in which case HA doesn't really make sense or you're back to finding some kind of hyperconverged solution.

Or do you mean to restore the VM from backup on the second host?

With many HA systems you still have a single point of failure, the SAN. . .

fabiorpn

@dbeato Hi.
my vms are on an intranet via a bond from network cards 0 and 1. However, the management interface (network card 4) also appears as a connection available on VMs. When I need to update something on VMs without worrying about Proxy, I just change the network from interface 4 to external internet. So there is no downtime for VMs. However the same procedure I still can't do with the XCP-NG and keep the VMs connected. To recognize the new network, the servers need to reboot even though I have done several procedures before, for example stopping all HA and replication services, etc. But I haven't been too concerned with this case because we already have alternatives in case we need to update xcp-ng. I was unable to make the proxy work directly on Dom0, even though it is a debian. Would anyone know how to do this permanently?

I don't know if I answered your question

dbeato

@fabiorpn Basically you answered the question as to whether you do Maintenance on your hosts which you do. However updating a Host and rebooting it, it will render the VMs offline if not on HA. Assuming you do HA and that you move the VMs to the next host before reboot then you are good on that end. However my questions was geared more to the Security and Application Updates of the Severs (Be it Linux or Windows). When you install the updates and reboot it, essentially the VM is not accessible no matter how great your HA is. So that is where having a downtime/maintenance period is good even if you want to accomplish HA.