XenServer 6.5 - Clean Up Storage Repository

anthonyh

Well...

I know XenServer supports growing SRs, but does it support shrinking SRs?

If so, I can grow the SR to 4 TB, perform the online coalesce, then shrink the SR back to 2 TB.

Thoughts?

momurda

You can also move storage to another SR to get rid of these.
Or Export/Import the vm.

You can also try a
xe sr-scan uuid=uuid of SR
which is supposed to force a coalesce.

momurda

@anthonyh No shrinking of SRs.
Are there other vm on the SR you could move?

anthonyh

@momurda said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh No shrinking of SRs.
Are there other vm on the SR you could move?

Nope, the SR is dedicated to Zimbra. I suppose I could create a new 2TB SR for Zimbra and move the disks, then delete the old SR when it's done. Though moving a 1TB VHD of this importance makes me nervous, haha.

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

From what I'm reading here, I either need to grow the SR or do an offline coalesce: https://support.citrix.com/article/CTX201296

SR_BACKEND_FAILURE_44 insufficient space:

The process of taking snapshots requires additional overhead on your SR. So you need sufficient room to perform the operation. For a running VM with a single snapshot to get coalesce you need twice the space in case of LVM SR’s (Active VDI + Single Snapshotting VDI). If we are in short of space in the SR, we get the following error.

Either do an offline coalesce or increase the SR size to accommodate online coalescing.

I could grow the SR, but I don't want to throw disk space at it for the sake of more disk space. Perhaps I'll need to plan some downtime to do an offline coalesce.

Offline coalesce is the better option

DustinB3403

The coalesce job in my experience usually doesn't take too long, but it's purely based on your host Sr performance.

And while you can kick it off manually once offline, you would still have no status.

Do you have XO? I believe it'll show you unhealthy SRs.

anthonyh

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

The coalesce job in my experience usually doesn't take too long, but it's purely based on your host Sr performance.

And while you can kick it off manually once offline, you would still have no status.

Do you have XO? I believe it'll show you unhealthy SRs.

I do not have XenOrchestra. I should set that up...

anthonyh

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

The coalesce job in my experience usually doesn't take too long, but it's purely based on your host Sr performance.

And while you can kick it off manually once offline, you would still have no status.

Do you have XO? I believe it'll show you unhealthy SRs.

I do not have XenOrchestra. I should set that up...

It literally takes longer to update the guest.

https://mangolassi.it/topic/12809/xen-orchestra-community-edition-installing-with-yarn

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

momurda

@anthonyh I have had a few disk migrations fail over the last 2 years. Most of the time you just end up with a broken vdi on the destination and the source is fine without issue.
But i have had to restart guest after disk migration failure before.
@DustinB3403 It is his exchange server.

DustinB3403

@momurda said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh I have had a few disk migrations fail over the last 2 years. Most of the time you just end up with a broken vdi on the destination and the source is fine without issue.
But i have had to restart guest after disk migration failure before.
@DustinB3403 It is his exchange server.

So even more reason to let the coalesce process complete.

Disable any backup jobs for a bit (snapshots cause this issue).

https://techblog.jeppson.org/2015/02/reclaim-lost-space-xenserver-6-5/

anthonyh

@momurda said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh I have had a few disk migrations fail over the last 2 years. Most of the time you just end up with a broken vdi on the destination and the source is fine without issue.
But i have had to restart guest after disk migration failure before.

I'm OK with that. I will have to think about this...

anthonyh

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

Production system. The problem is that there isn't enough space on the SR to perform a coalesce. I'm trying to avoid bringing Zimbra down if at all possible. Might not be possible, but I can at least try.

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@momurda said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh I have had a few disk migrations fail over the last 2 years. Most of the time you just end up with a broken vdi on the destination and the source is fine without issue.
But i have had to restart guest after disk migration failure before.

I'm OK with that. I will have to think about this...

This can mean unplanned downtime. Which for an exchange system can be costly. Granted the downtime is usually nominal but is worth considering allowing the system to clean up this issue at the nearest possible planned down time.

anthonyh

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@momurda said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh I have had a few disk migrations fail over the last 2 years. Most of the time you just end up with a broken vdi on the destination and the source is fine without issue.
But i have had to restart guest after disk migration failure before.

I'm OK with that. I will have to think about this...

This can mean unplanned downtime. Which for an exchange system can be costly. Granted the downtime is usually nominal but is worth considering allowing the system to clean up this issue at the nearest possible planned down time.

Yes, it's a gamble for sure.

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

Production system. The problem is that there isn't enough space on the SR to perform a coalesce. I'm trying to avoid bringing Zimbra down if at all possible. Might not be possible, but I can at least try.

The coalesce process is likely already running attempting to clean up an old snapshot. Performing a manual SR scan, should clean up this issue.

anthonyh

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

Production system. The problem is that there isn't enough space on the SR to perform a coalesce. I'm trying to avoid bringing Zimbra down if at all possible. Might not be possible, but I can at least try.

The coalesce process is likely already running attempting to clean up an old snapshot. Performing a manual SR scan, should clean up this issue.

Already done this several times. No dice.

DustinB3403

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

Production system. The problem is that there isn't enough space on the SR to perform a coalesce. I'm trying to avoid bringing Zimbra down if at all possible. Might not be possible, but I can at least try.

The coalesce process is likely already running attempting to clean up an old snapshot. Performing a manual SR scan, should clean up this issue.

Already done this several times. No dice.

I should probably just re-read the thread, but what type of backup are you running with this guest? You aren't using XO so I'm curious where/what is causing this issue.

anthonyh

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

@dustinb3403 said in XenServer 6.5 - Clean Up Storage Repository:

@anthonyh said in XenServer 6.5 - Clean Up Storage Repository:

Last night I set up a new SR for Zimbra and live migrated the OS disk over to the new SR without any issues. Took ~5 minutes for the 20 GB VHD.

I'd like to do the same for the 1 TB VHD, but it makes me nervous...

If the process was to bomb mid-progress, what would happen? Is it easy to recover from?

You would have a backup, but why go through this process? Why not let the system run a coalesce? Is this a production system or your lab?

Production system. The problem is that there isn't enough space on the SR to perform a coalesce. I'm trying to avoid bringing Zimbra down if at all possible. Might not be possible, but I can at least try.

The coalesce process is likely already running attempting to clean up an old snapshot. Performing a manual SR scan, should clean up this issue.

Already done this several times. No dice.

I should probably just re-read the thread, but what type of backup are you running with this guest? You aren't using XO so I'm curious where/what is causing this issue.

I am using Alike's "enhanced backup" model, which is snapshot based backups. I also take snapshots of the VM (all VMs, really) whenever I do any sort of maintenance, so I can't really point the blame at Alike. I don't know how long the orphaned snapshots have been around. The interesting thing is that except for Saturday night's run (when I got the alert from the pool that the coalesce failed due to not enough disk space), backups have been successful. Backups aren't performed on Sundays.

I don't have the orphaned snapshots issue with any other VM that I'm aware of. SR usage everywhere else looks to be what is expected.