Deduplication on CSV storage
-
@Jimmy9008 said in Deduplication on CSV storage:
Hi folks,
We have a three node Windows Failover Cluster, with several CSVs, each provided by Starwind using local storage.
One interesting idea has been brought up internally and i'd like to know if it sounds sensible or possible.
Is it possible to enable deduplication of CSV storage within the cluster? My initial thoughts are that as VM files are open in vmms.exe, the deduplication process will not be able to work on those open files. Is that wrong though? I would expect iso files to dedup a they are offline/not in use, but have a feeling live running VM VHD(X) files on the host will be unable to run through the process...
I could try this, but don't want to risk causing issues to the storage. So, thought best to put the idea out first.
Best,
JimNo, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.
-
@Obsolesce said in Deduplication on CSV storage:
No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.
Why not on the CSV?
-
@scottalanmiller said in Deduplication on CSV storage:
@Obsolesce said in Deduplication on CSV storage:
No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.
Why not on the CSV?
That does it across the board then for everything in it. If that's what he wants, go for it. Just know what you need to do to do it properly. It also depends on a few variables.
Just look over the docs well and know it first.
-
@Obsolesce said in Deduplication on CSV storage:
@Jimmy9008 said in Deduplication on CSV storage:
Hi folks,
We have a three node Windows Failover Cluster, with several CSVs, each provided by Starwind using local storage.
One interesting idea has been brought up internally and i'd like to know if it sounds sensible or possible.
Is it possible to enable deduplication of CSV storage within the cluster? My initial thoughts are that as VM files are open in vmms.exe, the deduplication process will not be able to work on those open files. Is that wrong though? I would expect iso files to dedup a they are offline/not in use, but have a feeling live running VM VHD(X) files on the host will be unable to run through the process...
I could try this, but don't want to risk causing issues to the storage. So, thought best to put the idea out first.
Best,
JimNo, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.
Such as? I mean, I thought it shouldn't be done. A few people have said no in this thread, some have said yes...
-
-
@Obsolesce said in Deduplication on CSV storage:
@scottalanmiller said in Deduplication on CSV storage:
@Obsolesce said in Deduplication on CSV storage:
No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.
Why not on the CSV?
That does it across the board then for everything in it. If that's what he wants, go for it. Just know what you need to do to do it properly. It also depends on a few variables.
Just look over the docs well and know it first.
Sorry, was supposed to quote that. I mean, such as what variables? I've looked at various resources and can't see definitive information about doing this on a csv with VM files and drives...
Not sure if it's sensible. Won't it have an overhead too?
-
@Jimmy9008 said in Deduplication on CSV storage:
Not sure if it's sensible. Won't it have an overhead too?
Yes, but normal tuning makes it only use that overhead when the system is idle. So unless you have some really specific workload that will cause problems that we don't know about, the system docs are pretty clear that the overhead won't impact you.
-
@Jimmy9008 said in Deduplication on CSV storage:
I've looked at various resources and can't see definitive information about doing this on a csv with VM files and drives...
That's because you are looking for "negative" documentation. It's like "can I store my video games on X hard drive." It's a hard drive, of course you can store a video game on it. You can't look for "Civilization VI on WD 4TB Red Drive" because it's not a proper question. The drive is SATA and supports any file system, Civ 6 will go on any file system, ergo, it works. You can't document every use case imaginable like that.
That's what is going on here. The dedupe is block level. Ergo, talking about the file types on it isn't relevant. Hence you are feeling like you aren't getting answers. but knowing that it is block level answers all of that automatically.
-
For example SQL Servers or Exchange cannot have deduplication on them so maybe you don’t have that. I have attempted to enable deduplication on Hyperv Servers and it really break things but I would open a ticket with Starwinds Support and they should be able to advise you on that. Their support is great.
-
@dbeato said in Deduplication on CSV storage:
For example SQL Servers or Exchange cannot have deduplication on them so maybe you don’t have that.
I'm pretty sure that they can. Maybe not inside the VM, but certainly from outside of it. How would they know or be affected? They can't be. If dedupe works, it works for every workload. If it doesn't, it never does. It's an all of nothing system, the workload on top of it can't determine if you can use it or not. It would only be affected by performance.
-
@dbeato said in Deduplication on CSV storage:
For example SQL Servers or Exchange cannot have deduplication on them
Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...
https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable
-
@scottalanmiller said in Deduplication on CSV storage:
@dbeato said in Deduplication on CSV storage:
For example SQL Servers or Exchange cannot have deduplication on them
Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...
https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable
Well, I would say this is the part with Hyper-V that is so ambiguous
-
@dbeato said in Deduplication on CSV storage:
@scottalanmiller said in Deduplication on CSV storage:
@dbeato said in Deduplication on CSV storage:
For example SQL Servers or Exchange cannot have deduplication on them
Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...
https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable
Well, I would say this is the part with Hyper-V that is so ambiguous
It's not, it can't be. VDI has massive overlap and can use more aggressive deduplication, but that's all. Dedupe by definition is either always safe, or never safe. There cannot be an inbetween. Not when it runs below the filesystem.
Hyper-V support and safety is never in question. Only Hyper-V for VDI has a specialized tuning option.