ESXi recovery woes

Dashrender

How is the storage different? Not that I disagree, just looking for more information.

scottalanmiller

How is the storage different? Not that I disagree, just looking for more information.

That we don't know. What we know is that snapping will cause corruption with a database. So the corruption is expected and universal. What we don't know is why the snaps are loading one way in one version and another in another. I can only imagine that the block driver was changed between the two and something additionally is being affected.

Carnival Boy

I'm not sure what you mean by snapping? Do you mean VSS?

The issue happens if I take a backup on a 5.5 host and try and restore it on a 5.5 host. If I do that it will fail. But I can restore that same 5.5 host backup to a 5.1 host and it works fine. So the source of the backup doesn't seem to be an issue as much as the destination.

scottalanmiller

@Carnival-Boy said in ESXi recovery woes:

I'm not sure what you mean by snapping?

Slang for "taking a snapshot." That's the process that is introducing the initial corruption, I assume. The corruption should come from a block-based snapshot of the running database files.

scottalanmiller

It was the phrase "online backup of the VM" that I took to be a description of a snapshot based backup. Like Veeam would do.

Carnival Boy

I assume so. I have used both Veeam and Unitrends.

Dashrender

@Carnival-Boy said in ESXi recovery woes:

I'm not sure what you mean by snapping? Do you mean VSS?

The issue happens if I take a backup on a 5.5 host and try and restore it on a 5.5 host. If I do that it will fail. But I can restore that same 5.5 host backup to a 5.1 host and it works fine. So the source of the backup doesn't seem to be an issue as much as the destination.

OH than I stand corrected, you are trying to take backups (using snaps) on the 5.5 as well as the 5.1. So you have two of these Hypertrieve servers? one on 5.1 and another on 5.5?

Dashrender

So here's a question - does Hypertrieve have their own backup process for an online db? Some things do. before you kick off the backup of the VM, you kick off the backup process on the Hypertrieve DB, then the VM backup happens. Then when you restore, the Hypertrieve stuff will do it's own restore (you might have to do it manually) this is all in the name of preventing corruption.

Carnival Boy

I shut down the VM on the 5.1 host, migrated it to the 5.5 host, powered it on, took another backup, then restored it back to both the 5.1 and the 5.5 host. I've been pretty busy!

Dashrender

@Carnival-Boy said in ESXi recovery woes:

I shut down the VM on the 5.1 host, migrated it to the 5.5 host, powered it on, took another backup, then restored it back to both the 5.1 and the 5.5 host. I've been pretty busy!

So you can restore a snap taken from a 5.5 on a 5.1, but not back to the original 5.5 it came from... hmmm..

Carnival Boy

I'm still working on this

Even if I shut down the VM, back it up in a powered off state, restore to a 5.5 host, and power it on, the Hypertrieve service starts and opens the database, which I can successfully browse, then after about ten seconds it crashes and I can no longer browse the database.

Since this is a restore of a powered off VM, it can't be a snapshotting issue.

I have had a reply from the vendor, who writes:
"So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

I don't really understand this. Anyone?

By "backup procedure" I think they are talking about taking a Hypertrieve backup via the Hypertrieve software and restoring the database that way after migrating.

Which I'm hoping to try next, but, to compound the issue, Unitrends (which I hate, by the way) has stopped working for me, so I can no longer restore the VM! It's just one thing after another with this - I can feel my life slowly slipping away!

DustinB3403

"So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

This means that when you are importing the VM into the other host, it has a new Disk ID which is causing the issue, as the snapshot process creates a custom disk ID.

What they are recommending you do is a full backup, and import that which should resolve the issue.

Is there no built in way with the ESXi version to create a full backup? (I'm thinking of XO at this point so don't mind me if I'm completely wrong)

Dashrender

@Carnival-Boy said in ESXi recovery woes:

I have had a reply from the vendor, who writes:

"So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

So does ESXi 5.1 somehow maintain the Disk HASH, and VMWare changed this practice in 5.5? Something for you to investigate.

@DustinB3403 said in ESXi recovery woes:

This means that when you are importing the VM into the other host, it has a new Disk ID which is causing the issue, as the snapshot process creates a custom disk ID.

eh? Actually, the OP proved it has nothing to do with the snap shots by taking a backup while the VM was shutdown.

This is a restore to a new VM problem. It's a problem because the vendor has the system checking the Disk ID, presumably for copy protection reasons, yet is easily thwarted by using a backup and restore procedure of the DB/application software itself. This of course means that restoring a system takes a potentially much longer time because not only do you have to restore the VM, but then you have to restore the DB inside the VM - assuming this is even possible, because I suppose you might have to reinstall the application before restoring the DB so that the application recognizes the new DISK HASH.

Carnival Boy

I'm guessing that there is more of a hardware change when migrating from an ESXi 5.1 host to a 5.5 host than when migrating from 5.1 to 5.1. When I first boot into the restored VM on 5.5 I get "Microsoft Windows: You must restart your computer to apply these changes", which I don't get with 5.1 - I assume that's Windows adjusting to the new hardware?

DustinB3403

@Carnival-Boy likely it is, Windows is saying "hey I'm on new hardware"

Which means it's an issue with the drivers between the versions of ESXi 5.1 and 5.5 as the VM never sees the physical hardware.

JaredBusch

I have had to reactivate windows when upgrading VMWare versions like that. I cannot tell you if it was every time or not, but it was more than once.

Dashrender

Why would you say it's a driver issue? I suppose it could be, but the vendor already told you the DISK HASH is probably different, so you now know your problem.

If backing up a 5.5 and restoring back onto a 5.5 still fails, then the only option you have (currently) is to do what the vendor said, backup the DB separately and the restore in the prescribed fashion.

Carnival Boy

@Dashrender said in ESXi recovery woes:

If backing up a 5.5 and restoring back onto a 5.5 still fails, then the only option you have (currently) is to do what the vendor said, backup the DB separately and the restore in the prescribed fashion.

I think you're right.