Hyper-V Failover Cluster FAILURE(S)
-
Do the switches report any overloading? A huge backup at the L3 routing points could do this, but would be super uncommon. But SAN traffic should never be routed, ever, and this would be a reason for that.
-
This post is deleted! -
@scottalanmiller said in Hyper-V Failover Cluster FAILURE(S):
Do the switches report any overloading? A huge backup at the L3 routing points could do this, but would be super uncommon. But SAN traffic should never be routed, ever, and this would be a reason for that.
Switches are not reporting any errors either. We pulled all those logs and found no network errors or hardware failure.
-
Is there a single switch that handles all traffic for SAN(storage traffic) and failover and network traffic for these hosts?
What kind of connection does the switch have to the router?
-
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
Is there a single switch that handles all traffic for SAN(storage traffic) and failover and network traffic for these hosts?
What kind of connection does the switch have to the router?
Due to a huge web of issues I couldn't tell you exactly how everything is connected but I do know the Servers and SAN's are on their own switches and then handed off to the LAN.
-
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
-
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
Pretty critical. there is a reason that no one would ever try it any other way. SANs need low latency and you dont' want it waiting on a busy backplane. Also, a switch that is good for SAN is not good for the LAN and vice versa. So it wouldn't be cost effective to mix them anyway. Hence, it never comes up. No upsides, loads of downsides.
-
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
Is there a single switch that handles all traffic for SAN(storage traffic) and failover and network traffic for these hosts?
What kind of connection does the switch have to the router?
Due to a huge web of issues I couldn't tell you exactly how everything is connected but I do know the Servers and SAN's are on their own switches and then handed off to the LAN.
Handed off to the LAN? SAN traffic never goes on a LAN, ever. If it does, it means you have no SAN and just black storage traffic on the LAN. The whole point of a SAN is that it is completely isolated and doesn't intermingle with the LAN.
-
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
I believe they did it to connect the 2 SAN's and 1 Datto that are connect via iSCSI on 10G connections and then the Hyper-V is handing traffic off to the LAN for everything else.
-
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
I believe they did it to connect the 2 SAN's and 1 Datto that are connect via iSCSI on 10G connections and then the Hyper-V is handing traffic off to the LAN for everything else.
Those two SANs and the DATTO should be on their own switches that have no connections to the rest of the network.
I've read about people bitching that they have to have another control station to manage this mini network, but it's the cost of using SAN.
-
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
I believe they did it to connect the 2 SAN's and 1 Datto that are connect via iSCSI on 10G connections and then the Hyper-V is handing traffic off to the LAN for everything else.
Those two SANs and the DATTO should be on their own switches that have no connections to the rest of the network.
I've read about people bitching that they have to have another control station to manage this mini network, but it's the cost of using SAN.
The 172.20 is the servers & SAN. 172.30 is the internal network.
-
@scottalanmiller said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
Pretty critical. there is a reason that no one would ever try it any other way. SANs need low latency and you dont' want it waiting on a busy backplane. Also, a switch that is good for SAN is not good for the LAN and vice versa. So it wouldn't be cost effective to mix them anyway. Hence, it never comes up. No upsides, loads of downsides.
I assumed this to be the case, but haven't dug into it.
-
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@scottalanmiller how critical, and if not critical - then recommended, is having SAN traffic on it's own switches, not shared with anything else?
I believe they did it to connect the 2 SAN's and 1 Datto that are connect via iSCSI on 10G connections and then the Hyper-V is handing traffic off to the LAN for everything else.
Those two SANs and the DATTO should be on their own switches that have no connections to the rest of the network.
I've read about people bitching that they have to have another control station to manage this mini network, but it's the cost of using SAN.
The 172.20 is the servers & SAN. 172.30 is the internal network.
This isn't really informational. What's important is to know, if there are any switches that have traffic for both 172.20.x.x and 172.30.x.x on them. If yes, that's one of the first things to change.
-
If the only thing that changed was the IP addressing of (your nodes?), then it may be a DNS related issue. Check all of your cluster related DNS/IP.
-
@tim_g said in Hyper-V Failover Cluster FAILURE(S):
If the only thing that changed was the IP addressing of (your nodes?), then it may be a DNS related issue. Check all of your cluster related DNS/IP.
eh? You would expect DNS to be in use on the SAN network?
-
@dashrender the 172.20 addresses are accessible from the 172.30 block.
-
why is this thread being wiped out?
-
@kyle said in Hyper-V Failover Cluster FAILURE(S):
@dashrender the 172.20 addresses are accessible from the 172.30 block.
Why is this necessary?
What network is the DATTO on? -
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@tim_g said in Hyper-V Failover Cluster FAILURE(S):
If the only thing that changed was the IP addressing of (your nodes?), then it may be a DNS related issue. Check all of your cluster related DNS/IP.
eh? You would expect DNS to be in use on the SAN network?
Obviously not, I re-read, and missed that bit about the error being specific to the SAN.
This means there's a communication interruption (most likely) between a node(s) and the CSV, which is on the SAN.
Are there any more related errors or is there any additional error info to go by?
-
@tim_g said in Hyper-V Failover Cluster FAILURE(S):
@dashrender said in Hyper-V Failover Cluster FAILURE(S):
@tim_g said in Hyper-V Failover Cluster FAILURE(S):
If the only thing that changed was the IP addressing of (your nodes?), then it may be a DNS related issue. Check all of your cluster related DNS/IP.
eh? You would expect DNS to be in use on the SAN network?
Obviously not, I re-read, and missed that bit about the error being specific to the SAN.
This means there's a communication interruption (most likely) between a node(s) and the CSV, which is on the SAN.
Are there any more related errors or is there any additional error info to go by?
I'll post the logs when I get back to the house. A typical 5142 and a few others referencing the inability for the VM's not being able to contact the CSV.