Why HALizard and XenServer Failed so heavily
-
What if you had just pulled out the cables so that it looked like a power failure. Wouldn't that have fixed things, except for the corrupted VMs? Might have protected against that too, but that's just random.
-
Dustin, how did you discover the failure?
Did things just stop working? IE your VMs went to read Only or just stopped responding?
-
Stories like thie are why I'll happily continue to run my hypervisor on spinning rust / raid 1.
Yeah, sure it could happen to HDD too, but that's why we have raid1. If he'd been able to come up with a way to RAID1 two USB drives together, it might not have been an issue for him either.
(Can you not use MDRAID on two USB sticks?)
-
@Dashrender said in Why HALizard and XenServer Failed so heavily:
Dustin, how did you discover the failure?
Did things just stop working? IE your VMs went to read Only or just stopped responding?
If they went read only, that would cause us to look at DRBD. That's the kind of thing that a DRBD failure would look like.
-
@dafyre said in Why HALizard and XenServer Failed so heavily:
(Can you not use MDRAID on two USB sticks?)
Yup
-
@scottalanmiller said in Why HALizard and XenServer Failed so heavily:
What if you had just pulled out the cables so that it looked like a power failure. Wouldn't that have fixed things, except for the corrupted VMs? Might have protected against that too, but that's just random.
We didn't know at the time that the boot drive was dead on host1. So we weren't certain of the status of the cluster.
Just that both systems had XAPI hung (vm's were running fine until we touched the cluster)
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
@scottalanmiller said in Why HALizard and XenServer Failed so heavily:
What if you had just pulled out the cables so that it looked like a power failure. Wouldn't that have fixed things, except for the corrupted VMs? Might have protected against that too, but that's just random.
We didn't know at the time that the boot drive was dead on host1. So we weren't certain of the status of the cluster.
Just that both systems had XAPI hung (vm's were running fine until we touched the cluster)
One of the risks of clusters, so much complexity. A single host you could have dropped to Xen directly and shut things down.
-
@scottalanmiller Which is why we're going with the standalone servers using XO's continuous replication.
-
Wow, that's crazy. Glad that it recovered.
-
@Reid-Cooper said in Why HALizard and XenServer Failed so heavily:
Wow, that's crazy. Glad that you had a recovery solution planned out, great job!
I FTFY.
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
@scottalanmiller Which is why we're going with the standalone servers using XO's continuous replication.
Will you still use USB sticks for boot?
-
@FATeknollogee yes, have to.
Just have to make sure we keep our backup drive current, and we're moved away from the cluster approach, and using single servers with CR.
-
@DustinB3403 "Why" do you have to?
-
@FATeknollogee said in Why HALizard and XenServer Failed so heavily:
@DustinB3403 "Why" do you have to?
Well, at the moment I have to see if I can create two partitions on the same array with the equipment I have.
As of last night I couldn't find a way to do it.
I have to use LVM to create the partitions needed. Yet not sure how I'll be able to do that.
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
@FATeknollogee said in Why HALizard and XenServer Failed so heavily:
@DustinB3403 "Why" do you have to?
Well, at the moment I have to see if I can create two partitions on the same array with the equipment I have.
As of last night I couldn't find a way to do it.
I have to use LVM to create the partitions needed. Yet not sure how I'll be able to do that.
For at least the fifth time... we don't make partitions here, it's volumes. LVMs make volumes. I've corrected you every time you've used the word partitions. Partitions and volumes are not the same thing.
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
I have to use LVM to create the volumes needed. Yet not sure how I'll be able to do that.
Resize what is there. Then lvcreate what you need.
-
He doesn't have to, but in the emergency situation he was in yesterday, it was the fastest solution to getting himself back online.
I'm pretty sure we could get him running, this time only on the single HDD that's presented by the RAID controller.
-
@scottalanmiller ffs, volumes
Running on minimal sleep, get off my back lol...
I need 2 logical volumes one to boot from, one to hold the data, and couldn't figure out how to do it last night.
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
@scottalanmiller ffs, volumes
Running on minimal sleep, get off my back lol...
I need 2 logical volumes one to boot from, one to hold the data, and couldn't figure out how to do it last night.
If it only creates one when you install, then you resize it to make extra space and use lvcreate to make the new volume and then add that volume as an SR.
-
@DustinB3403 said in Why HALizard and XenServer Failed so heavily:
@scottalanmiller ffs, volumes
Running on minimal sleep, get off my back lol...
I need 2 logical volumes one to boot from, one to hold the data, and couldn't figure out how to do it last night.
the volumes (is that the same as partitions?) were already there. XS created sba1, sba2 and sba3. sba3 was the majority of the disk. The issue was that XS wasn't automounting it as a SR and we don't know why not. My install on a 500 GB drive mounted automatically, his 14TB drive didn't, but it was clearly seen.