Windows Failover Clustering... what are your views and why?

Dashrender

Is the local physical storage all part of the pool for the CSV? If so, could you be spiking a single server's storage with a VM on that host, which could then cause performance delay for the whole CSV?

Jimmy9008

@Dashrender said in Windows Failover Clustering... what are your views and why?:

Is the local physical storage all part of the pool for the CSV? If so, could you be spiking a single server's storage with a VM on that host, which could then cause performance delay for the whole CSV?

The storage currently looks like this. Each server has 21 TB of SSD as a $V. There are 5 CSVs/vSAN images on $V, each of 3 TB. That uses 15 TB of $V storage to provide the vSAN to the WFC. This leaves ~ 6TB usable on each host (18 TB total) as non CSV storage non HA local storage. The vSAN could of course be expanded in to this, rather than as is, but I dont think all should be CSV.

Dashrender

Oh I see your points for sure, I was only asking if a non CSV based VM could cause a performance bottleneck on a single host - the CSV itself could get stalled out, causing a problem for all CSV based VMs. Now you have another issue to consider when troubleshooting CSV based issues.

Now - perhaps you have so many IOPs that this isn't a real issue, it was only a thought.

Jimmy9008

@Dashrender said in Windows Failover Clustering... what are your views and why?:

Oh I see your points for sure, I was only asking if a non CSV based VM could cause a performance bottleneck on a single host - the CSV itself could get stalled out, causing a problem for all CSV based VMs. Now you have another issue to consider when troubleshooting CSV based issues.

Now - perhaps you have so many IOPs that this isn't a real issue, it was only a thought.

Oh I see. As another item on the list of why to not add everything to the cluster storage without that VM needing to be HA? Yeah, ill add that to my list.

So, do you agree option 2 is the way to go? Only add to CSV where needed...

Obsolesce

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Jimmy9008

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Last time I ran live optics (a week or two ago), we were at around 6 TB of changes per day

Dashrender

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Dashrender said in Windows Failover Clustering... what are your views and why?:

Oh I see your points for sure, I was only asking if a non CSV based VM could cause a performance bottleneck on a single host - the CSV itself could get stalled out, causing a problem for all CSV based VMs. Now you have another issue to consider when troubleshooting CSV based issues.

Now - perhaps you have so many IOPs that this isn't a real issue, it was only a thought.

Oh I see. As another item on the list of why to not add everything to the cluster storage without that VM needing to be HA? Yeah, ill add that to my list.

So, do you agree option 2 is the way to go? Only add to CSV where needed...

I'm not agreeing or disagreeing - I don't know enough to have an opinion... but my question I think would lean more toward option 1 - because then whole system would be affected equally by the mentioned problem, instead of just a single node, which, which on one side might cause a fail-over and a kicking of this node from the cluster, all the way to crashing the whole cluster (but damn I would hope not).

Obsolesce

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Last time I ran live optics (a week or two ago), we were at around 6 TB of changes per day

What are your drives warrantied at? What's the dwpd or whatever? The idea is they only need to last 5 years / X dwpd or whatever the period is they are rated for anyways.

Jimmy9008

@Dashrender said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Dashrender said in Windows Failover Clustering... what are your views and why?:

Oh I see your points for sure, I was only asking if a non CSV based VM could cause a performance bottleneck on a single host - the CSV itself could get stalled out, causing a problem for all CSV based VMs. Now you have another issue to consider when troubleshooting CSV based issues.

Now - perhaps you have so many IOPs that this isn't a real issue, it was only a thought.

Oh I see. As another item on the list of why to not add everything to the cluster storage without that VM needing to be HA? Yeah, ill add that to my list.

So, do you agree option 2 is the way to go? Only add to CSV where needed...

I'm not agreeing or disagreeing - I don't know enough to have an opinion... but my question I think would lean more toward option 1 - because then whole system would be affected equally by the mentioned problem, instead of just a single node, which, which on one side might cause a fail-over and a kicking of this node from the cluster, all the way to crashing the whole cluster (but damn I would hope not).

If the single non CSV VM could cause a bottleneck on a host, couldnt you argue that that same VM would also cause a bottleneck on the CSV? If anything, as the CSVFS comes with the additional overheads due to being a clustered file system, its more likely having the VM on the CSV would create a performance issue, no?

Jimmy9008

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Last time I ran live optics (a week or two ago), we were at around 6 TB of changes per day

What are your drives warrantied at? What's the dwpd or whatever? The idea is they only need to last 5 years / X dwpd or whatever the period is they are rated for anyways.

1 DWPD, im not so worried about the writes. I just would like to avoid additional writes where not really needed.

Jimmy9008

Another thought. CSV data is replicated to all three hosts. So, 100 GB is actually 300 GB. 1 TB is actually 3 TB. Why would it make sense to add VMs (applications) the company can sustain long downtime with to an area where it takes up 3 x the space, on expensive SSDs. Why not put that application you dont care about, on one host, where it takes up one lot of space, leaving the other space for things the company does care about...

Obsolesce

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

So, 100 GB is actually 300 GB.

Not for the drives themselves. I'm assuming some kind of hardware/software raid, so that 100GB gets split among the drives the data goes to according to RAID level.

Blocks that are accessed more frequently (read data) don't really count as much, as I'm sure there is caching in multiple places.

Obsolesce

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Last time I ran live optics (a week or two ago), we were at around 6 TB of changes per day

What are your drives warrantied at? What's the dwpd or whatever? The idea is they only need to last 5 years / X dwpd or whatever the period is they are rated for anyways.

1 DWPD, im not so worried about the writes. I just would like to avoid additional writes where not really needed.

WHat drive model are they? How many drives per server? What RAID level is being used? Is it hw/sw raid? If hw, which card?

Jimmy9008

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

So, 100 GB is actually 300 GB.

Not for the drives themselves. I'm assuming some kind of hardware/software raid, so that 100GB gets split among the drives the data goes to according to RAID level.

Blocks that are accessed more frequently (read data) don't really count as much, as I'm sure there is caching in multiple places.

If I have a VM on the CSV using 100 GB, the whole point of having the vSAN is that every byte exists on the vSAN partners to avoid any downtime of failure. So, it really is copied entirely three times.

Obsolesce

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

So, 100 GB is actually 300 GB.

Not for the drives themselves. I'm assuming some kind of hardware/software raid, so that 100GB gets split among the drives the data goes to according to RAID level.

Blocks that are accessed more frequently (read data) don't really count as much, as I'm sure there is caching in multiple places.

If I have a VM on the CSV using 100 GB, the whole point of having the vSAN is that every byte exists on the vSAN partners to avoid any downtime of failure. So, it really is copied entirely three times.

Yeah, I know that. But what you shown concern of before was the wear on drives. If you pick out one random drive, it's not getting 3x the data writes.

Jimmy9008

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

How much data changes every day? Do you have 100gb of changes per day? 1Tb?

Last time I ran live optics (a week or two ago), we were at around 6 TB of changes per day

What are your drives warrantied at? What's the dwpd or whatever? The idea is they only need to last 5 years / X dwpd or whatever the period is they are rated for anyways.

1 DWPD, im not so worried about the writes. I just would like to avoid additional writes where not really needed.

WHat drive model are they? How many drives per server? What RAID level is being used? Is it hw/sw raid? If hw, which card?

PERC H740P 8 GB Cache. Drive: MTFDDAK1T9TDN
14 per server Raid6

Jimmy9008

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

@Obsolesce said in Windows Failover Clustering... what are your views and why?:

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

So, 100 GB is actually 300 GB.

Not for the drives themselves. I'm assuming some kind of hardware/software raid, so that 100GB gets split among the drives the data goes to according to RAID level.

Blocks that are accessed more frequently (read data) don't really count as much, as I'm sure there is caching in multiple places.

If I have a VM on the CSV using 100 GB, the whole point of having the vSAN is that every byte exists on the vSAN partners to avoid any downtime of failure. So, it really is copied entirely three times.

Yeah, I know that. But what you shown concern of before was the wear on drives. If you pick out one random drive, it's not getting 3x the data use.

Yes, correct. I misunderstood what you are saying. Either way, I am sure we are fine on writes. Im not worried about them. I do however think its silly to use more writes than needed, for data that doesnt need HA, just because we can. By nature, leaving that data where we can take large downtime off of the vSAN, causes less wear on the disks... Why would we want to add wear where not needed...

Obsolesce

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

I do however think its silly to use more writes than needed, for data that doesnt need HA, just because we can.

I haven't yet expressed which route to take, I was first trying to understand the data aspect.

scottalanmiller

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

We have a Windows Failover Cluster here using Starwind vSAN over three hosts, all local SSD storage.

I'm sure that this was covered somewhere, but are you saying that this is hyperconverged and that there are three total nodes and that Starwind VSAN is what is clustering them? So not three compute nodes hooked to a remote VSAN, but just standard three node HC?

scottalanmiller

@Jimmy9008 said in Windows Failover Clustering... what are your views and why?:

Some applications naturally have HA with how they work, so as long as one VM is on hyper-v on each of the 3 hosts, the application stays up even without the VM being in cluster/CSV storage. So, why take up CSV space.

This is the biggest reason to avoid putting everything on it, IMHO.