XenServer Issues with SMB over PV NIC
-
Posting on behalf of @jshiers
I am still have some issues with XenServer and I don't think Citrix pointed me in the right direction. I finally nailed down that I had an issue with the PV NIC when transferring data via SMB shares. I was able to show that it was a Server 2008R2 related issue as my W7 VMs were just fine so the Host, Physical NIC, and Switch were all preforming well with the W7 VM and not the 2008R2 VMs. Citrix had me disable NIC Offloading. I did that Friday and SMB data rates shot thru the roof. Cool right? Well today one of my software packages that uses SMB and SQL calls from the client, a 2008R2 RDS server, would not preform at any usable rate. Worked with the software support folks and would up turning NIC Offloading back on on the application/SQL server and the RDS server and bang the software works ok on the SQL calls but I am back to slow (64 KB) data rates via SMB. So now I have no clue what to do to try to fix this mess.
Host IO is good at 250-300 MBps and all VMs are on local storage. Using XenServer 6.5SP1010. Any Ideas?
-
As an obvious question (And I'm sure I know the answer) does he actually have the xentools-iso drivers installed, and not just the base line drivers that are installed with the VM?
Does the NIC actually show the driver type?
-
Are you sure that this is a XenServer issue and not a Windows one? Could be either, I'm not sure from the description that it is necessarily narrowed down to the platform drivers.
-
It's an issue specific to the VM. As jshiers stated he doesn't have the same issue on other VM's using the same pNIC.
So without a specific answer it's difficult to say.
Does everything else on this VM perform at 100Mbps?
-
@DustinB3403 said:
It's an issue specific to the VM. As jshiers stated he doesn't have the same issue on other VM's using the same pNIC.
Which points away from the PV driver rather than towards it, I think.
-
Yes the XenTools are up to date with the latest and greatest after all the host updates were applied. Based on my testing I am thinking that my issue is specific to Server 2008R2 and its interaction with the PV NIC but I was hoping to get some direction on how to figure that out specifically. The VM I had previously been having issues with was taking 6-11 hours to copy 1.5 GB from my IBM Mainframe SMB share to itself and then processing that into a SQL database. After turning off NIC Offloading that VM has run that process in less than 10 min. The software package I am have trouble with now also use SMB and SQL but is sending data to the clients via SQL. When watching Processor Monitor you can see SMB crank up as part of the system process and get north of 500 meg but SQL never gets above 50kbps. Same VMs turn Offloading back on and SQL cranks up north of 100 meg but SMB never gets over 70kbps.
I am nearing a point where I am about ready to run AD Prep and move the forest up to a 2012R2 functional level and then run an in place upgrade on one or more of these servers and see if that helps at all.
-
Did you have these issues prior to updating the host(s)?
-
Yes with the VM that would not copy at a reasonable rate. That is why I moved to 6.5SP1 from 6.2 base. It seemed like everything was ok when we first installed but over time (I am guessing with windows updates) it just kept getting slower and slower. I traced thru everything I could think of from an IO perspective moved the hosts from RAID 6 to RAID 10, on the VMs upped the vCPU counts, and upped the RAM. When none of that worked I purchased Citrix support and they came up with the NIC offloading.
-
Have you attempted a 2008R2 server as a clean install and see if these issues persist?
You mentioned that you upgraded from 6.2 to 6.5SP1 because of these issues, but these issues could have been caused because of the 6.2 PV drivers.
-
@jshiers said:
Yes with the VM that would not copy at a reasonable rate. That is why I moved to 6.5SP1 from 6.2 base. It seemed like everything was ok when we first installed but over time (I am guessing with windows updates) it just kept getting slower and slower. I traced thru everything I could think of from an IO perspective moved the hosts from RAID 6 to RAID 10, on the VMs upped the vCPU counts, and upped the RAM. When none of that worked I purchased Citrix support and they came up with the NIC offloading.
If it is tied to a Windows update that means it is less likely to be driver related.
-
What happens if you switch that 2008R2 VM from a PV NIC to a Non-PV Nic? Can you do that in XenServer?
-
I can't prove the performance issue is tied to Windows updates. What I can prove is how the VMs are preforming today and relate that back to the fact that 6-12 months ago I didn't seem to have these issues with the same software packages. Performance started going down gradually as we didn't really notice issues until it "suddenly" occurred to us that things were not running well. In the overnight I have spun up a 2012R2 VM to do some testing with. I am also wondering if I need to flash the BIOS on the underlying hardware as if the software layer was updated and is expecting the NIC to perform a specific way but the BIOS isn't doing that then maybe my issue is there?
-
@jshiers the BIOS would effect every system you have, so I would update (to update) but don't expect there to be a performance change. As your other VM's aren't have any issues.
With your 2012R2 VM monitor it's performance and see if there is a difference. It's not Apples to Apples, as this issue is occurring with your 2008R2 server.
Do you have a VM backup that you can light up of the 2008R2 server from several months back and see if the issue persist?
-
@dafyre You'd have to install dedicated hardware drivers to the VM rather than the xentool.iso drivers.
Which I'm not certain what would happen with it in this case.
Additionally I don't think it would address the issue. The VM changed, and performance was degraded.
Speaking of, how much free space does the vHDD have on it? I've seen windows servers come to a crawl when the drive is full. Worth a look.
-
I have made sure all the VMs have at least 20 gig of free vHDD space to eliminate any issue with drive space. We also have used a Fluke MicroScanner to check out our physical cables and found a couple of suspect cables that we replaced as well.
Here are some test results, SMB file copy from a 2008R2 VM that has NIC offloading turned on to the new 2012R2 box with no PV tools so its using the base Windows Realtek driver and got 24 Mbps. Not great but much better than the 64-72 Kbps I get between two 2008R2 boxes with offloading turned on. Same test from 2008R2 VM with NIC offloading turned off and I get the same 24 Mbps. All of these VMs are on the same host.
Ok installed the PV tools on the 2012R2 server. Copy from a 2008R2 VM with NIC Offloading ON and the 350 meg file copied in a flash. Resource Monitor showed 2 Gbps. Copy from the VM with NIC Offloading OFF and the copied at 225 Mbps.
Maybe I should just start do inplace upgrades on my servers!!!!
-
Upgrading to the newest release of the Server OS never really hurts, so long as your systems are capable of running on that OS.
I'm glad you were able to remove trouble patch cables, and your testing with 2012R2 seems to have resolved the issue.
Are you able to perform an inline upgrade in a test environment so you're not effecting your production systems?
-
@jshiers said:
I can't prove the performance issue is tied to Windows updates. What I can prove is how the VMs are preforming today and relate that back to the fact that 6-12 months ago I didn't seem to have these issues with the same software packages. Performance started going down gradually as we didn't really notice issues until it "suddenly" occurred to us that things were not running well. In the overnight I have spun up a 2012R2 VM to do some testing with. I am also wondering if I need to flash the BIOS on the underlying hardware as if the software layer was updated and is expecting the NIC to perform a specific way but the BIOS isn't doing that then maybe my issue is there?
Unlikely that you need to flash the BIOS but that does not mean that it is not a good idea, but unlikely to help.
That performance has gone down slowly means that drivers are also unlikely to be involved.
-
@jshiers said:
Maybe I should just start do inplace upgrades on my servers!!!!
That's an option for sure. Snap them, upgrade, test and see.