Vendor Mistake - VMware Infrastructure Decisions
-
I have a VMware-based cluster of two ready-nodes purchased from Starwind https://www.starwindsoftware.com/starwind-hyperconverged-appliance half a year ago so I will try to share my experience on that matter. These are completely DELL-based and the pricing is very fair compared to what DELL OEM-partners want for the same configurations.
As already mentioned above, in this particular scenario, StarWind runs inside a VM on each host. The underlying storage is presented over a standard datastore. Alternatively, you can pass-through the whole RAID controller to StarWind VM in case if your ESX resides on a bootable USB/SD/SataDOM/whatever which is a common and good practice nowadays. The usage of hardware RAID makes the overall performance of a single server much faster than you can achieve using software RAINs provided by either VMware vSAN or MSFT S2D (I’ve done some benchmarking on that matter).
ESX hosts are connected over iSCSI to both StarWind VMs simultaneously. These VMs are mirroring the internal storage and presenting this storage back to ESX as a single MPIO-capable iSCSI device. Since round robin policy is used there is no storage failover in case if one StarWind VM is being softly restarted for patching or the whole physical host suddenly dies. In the case of single host power outage, only the migration of production VMs takes place but storage remains active which I find quite awesome.
Another thing that I do enjoy in StarWind is that it uses RDMA-capable networks (I have Mellanox Connectx3) for synchronization which leaves a lot of CPU resources for primary tasks instead of serving storage requests.
Right now I am waiting for Linux-based StarWind VSA implementation which is told to arrive soon. -
@scottalanmiller Starwind on vSphere requires a VM (Linux or Windows). vSAN is the only fully in kernel SDS option on vSphere. (ScaleIO has a VIB for the client side, but not the target side code which still runs in Linux).
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
They are likely stuck here with whatever was already bought. But good info for a greenfield deployment. Or if they manage to return these for three R730 for example.
I'm not entirely certain we'll be stuck with what we bought. My boss and I were on a conference call with folks from Dell yesterday afternoon. They were talking about different options in SAN devices that would meet our requirements (whether it was Compellent, EMC, etc.), but the biggest issue was that these options were so expensive. Again, not one of them mentioned the potential for a VSAN deployment, so we brought it up (using either VMware VSAN or Starwind). The Dell team has to go back and redesign a quote for gear that would better support a VSAN deployment. In their words, they would likely have to return the servers and the PowerVault we have right now (not sure about the other gear - PowerConnect switches, TrippLite devices, APC PDUs, AppAssure appliance, and ip KVM switch).
I'll be curious to see what comes back when they re-quote.
Why do they have to design a quote? You just tell them what you want, they give you a price. Other than "looking up the price", what are they doing?
Verifying the HCL (for vSphere, and vSAN for the storage devices). If they are 13Gen servers though they should be adaptable, it's just a batter of getting a supported HBA (Hint, you want the HBA 330) and getting supported drives. Other thing I'll comment in general (Not related to Dell or vSAN) is avoid Intel NIC's and go Broadcom. LSO/TSO seems to not be stable on large frames (This can be mitigated by disabling offload at the cost of a few % of CPU if you need). After years of hating broadcom NIC's this feels weird and Intel SHOULD be fixing it at some point this quarter, but after 2 years of putting up with this on large frames I'm not that hopeful.
-
@KOOLER said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
Before I started here a couple of months ago, my boss purchased a couple of Dell R630s and a PowerVault MD3820i (20 drive bays) to be our new infrastructure at HQ. We have dual 10Gb PowerConnect switches and two UPS devices, each connected to a different circuit. The plan is to rebuild the infrastructure on vSphere Standard (licenses already purchased) and have a similar setup in a datacenter somewhere (replicate the SANs, etc.). We're using AppAssure for backups (again, already purchased).
The PowerVault has 16 SAS drives that are 1.8 TB 7200 RPM SED drives and 4 SAS drives that are 400 GB SSD for caching. Well, we made disk groups and virtual disks using the SEDs (letting the SAN manage the keys), but it turns out we cannot use the SSDs they sent us for caching. In fact, they don't have SED SSDs for this model SAN.
At the time the sale was made, Dell ensured my boss everything would work as he requested (being able to use the SSDs for caching with the 7200 RPM SED drives). Now that we know this isn't going to be the case, we have some options.
First, they recommended we trade in the PowerVault for a Compellent and Equalogic. The boss did not want that because he was saying you are forced to do RAID 6 on those devices and cannot go with RAID 10 in your disk groups. As another option, Dell recommended we put the SSDs in our two hosts and use Infinio so we can do caching with the drives we have. In this case we would make Dell pay for the Infinio licenses and possibly more RAM since they made the mistake.
But I'm wondering if perhaps there is another option. Each server has 6 drive bays. So we have 20 drives total. Couldn't we have Dell take the SAN back, give us another R630, and pay for licenses of VMware vSAN for all 3 hosts? Each server has four 10 Gb NICs and two 1 Gb NICs. That might require we get additional NICs. But in this case, I'm not sure drive encryption is an option or if we can utilize the SEDs at all.
I've not double-checked the vSAN HCL or anything for the gear in our servers as this is just me spit balling. Is there some other option we have not considered? We're looking to get the 14 TB or so of usable space that RAID 10 will provide, but the self-encrypting drives were deemed a necessity by the boss. And without some type of caching, we will not hit our IOPs requirements.
Any advice is much appreciated.
Keep R630s, refund PowerVault, refund AppAss. Get VMware VSAN and Veeam (accordingly).
I've got (a non-trivial amount) of R630's in my lab running vSAN. You'll want the HBA 330 ideally (you can settle for the PERC H730 if you already have it) but otherwise the server works fine. Only limit over the R730/R730XD is fewer drive bays, and no GPU support.
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
After reading each of these, I finally understand how it works:
http://www.vladan.fr/starwind-virtual-san-product-review/
http://www.vladan.fr/starwind-virtual-san-deployment-methods-in-vmware-vsphere-environment/
https://www.starwindsoftware.com/technical_papers/HA-Storage-for-a-vSphere.pdfSo, in a nutshell, you do use RAID on the host as you normally would and even provision VMware datastores as you normally would. It's the VMDKs you present to the Starwind VM that get used as your virtual iSCSI target. And you can add in the cache size of your choice from the SSD datastores on your ESXi host.
So if I'm patching servers like I should, I'd have to patch the VMs running Starwind as well. Oh man would I hate to install a patch from MS that bombs my storage. I guess theoretically that isn't too different from installing some firmware on a physical SAN that has certain bugs in it. If one Starwind VM gets rebooted, you still have your replication partner presenting storage to the hosts and are ok.
Right. And Hyper-V alone has very tiny, solid patches. Nothing like patching the OS.
Hyper-V with a console is just as big as windows server from a patching perspective, and even Core Install's see patches with regular (IE monthly quite often) frequency. The install requirements for The ~150MB VMKernel are tiny vs the 10GB+ for Hyper-V Core installs. ESXi regularly goes ~6 months without needing a patch. Most of the patch surface is in upper stack things.
-
@NetworkNerd said:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
They are likely stuck here with whatever was already bought. But good info for a greenfield deployment. Or if they manage to return these for three R730 for example.
I'm not entirely certain we'll be stuck with what we bought. My boss and I were on a conference call with folks from Dell yesterday afternoon. They were talking about different options in SAN devices that would meet our requirements (whether it was Compellent, EMC, etc.), but the biggest issue was that these options were so expensive. Again, not one of them mentioned the potential for a VSAN deployment, so we brought it up (using either VMware VSAN or Starwind). The Dell team has to go back and redesign a quote for gear that would better support a VSAN deployment. In their words, they would likely have to return the servers and the PowerVault we have right now (not sure about the other gear - PowerConnect switches, TrippLite devices, APC PDUs, AppAssure appliance, and ip KVM switch).
I'll be curious to see what comes back when they re-quote.
Honestly may just be a matter of the inside team isn't familiar with it yet (They just re-assigned who has to know what products, and people are flying all over the place training people). Worst case call the VMware inside SDS desk (They are in Austin, right across the parking lot from Spiceworks HQ). Those guys have been piecing together vSAN quotes and have heads dedicated to work with your Dell team and make sure stuff is good.
Now off to pack for ANZ for 2 weeks to go some of the mentioned training....
-
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
After reading each of these, I finally understand how it works:
http://www.vladan.fr/starwind-virtual-san-product-review/
http://www.vladan.fr/starwind-virtual-san-deployment-methods-in-vmware-vsphere-environment/
https://www.starwindsoftware.com/technical_papers/HA-Storage-for-a-vSphere.pdfSo, in a nutshell, you do use RAID on the host as you normally would and even provision VMware datastores as you normally would. It's the VMDKs you present to the Starwind VM that get used as your virtual iSCSI target. And you can add in the cache size of your choice from the SSD datastores on your ESXi host.
So if I'm patching servers like I should, I'd have to patch the VMs running Starwind as well. Oh man would I hate to install a patch from MS that bombs my storage. I guess theoretically that isn't too different from installing some firmware on a physical SAN that has certain bugs in it. If one Starwind VM gets rebooted, you still have your replication partner presenting storage to the hosts and are ok.
Right. And Hyper-V alone has very tiny, solid patches. Nothing like patching the OS.
Hyper-V with a console is just as big as windows server from a patching perspective, and even Core Install's see patches with regular (IE monthly quite often) frequency. The install requirements for The ~150MB VMKernel are tiny vs the 10GB+ for Hyper-V Core installs. ESXi regularly goes ~6 months without needing a patch. Most of the patch surface is in upper stack things.
Any recommendation of Hyper-V obviously means without a console.
-
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
They are likely stuck here with whatever was already bought. But good info for a greenfield deployment. Or if they manage to return these for three R730 for example.
I'm not entirely certain we'll be stuck with what we bought. My boss and I were on a conference call with folks from Dell yesterday afternoon. They were talking about different options in SAN devices that would meet our requirements (whether it was Compellent, EMC, etc.), but the biggest issue was that these options were so expensive. Again, not one of them mentioned the potential for a VSAN deployment, so we brought it up (using either VMware VSAN or Starwind). The Dell team has to go back and redesign a quote for gear that would better support a VSAN deployment. In their words, they would likely have to return the servers and the PowerVault we have right now (not sure about the other gear - PowerConnect switches, TrippLite devices, APC PDUs, AppAssure appliance, and ip KVM switch).
I'll be curious to see what comes back when they re-quote.
Honestly may just be a matter of the inside team isn't familiar with it yet (They just re-assigned who has to know what products, and people are flying all over the place training people). Worst case call the VMware inside SDS desk (They are in Austin, right across the parking lot from Spiceworks HQ). Those guys have been piecing together vSAN quotes and have heads dedicated to work with your Dell team and make sure stuff is good.
Now off to pack for ANZ for 2 weeks to go some of the mentioned training....
You think Dell engineers don't know about VMware? That seems... terrifying
-
@Net-Runner said in Vendor Mistake - VMware Infrastructure Decisions:
I have a VMware-based cluster of two ready-nodes purchased from Starwind https://www.starwindsoftware.com/starwind-hyperconverged-appliance half a year ago so I will try to share my experience on that matter. These are completely DELL-based and the pricing is very fair compared to what DELL OEM-partners want for the same configurations.
As already mentioned above, in this particular scenario, StarWind runs inside a VM on each host. The underlying storage is presented over a standard datastore. Alternatively, you can pass-through the whole RAID controller to StarWind VM in case if your ESX resides on a bootable USB/SD/SataDOM/whatever which is a common and good practice nowadays. The usage of hardware RAID makes the overall performance of a single server much faster than you can achieve using software RAINs provided by either VMware vSAN or MSFT S2D (I’ve done some benchmarking on that matter).
ESX hosts are connected over iSCSI to both StarWind VMs simultaneously. These VMs are mirroring the internal storage and presenting this storage back to ESX as a single MPIO-capable iSCSI device. Since round robin policy is used there is no storage failover in case if one StarWind VM is being softly restarted for patching or the whole physical host suddenly dies. In the case of single host power outage, only the migration of production VMs takes place but storage remains active which I find quite awesome.
Another thing that I do enjoy in StarWind is that it uses RDMA-capable networks (I have Mellanox Connectx3) for synchronization which leaves a lot of CPU resources for primary tasks instead of serving storage requests.
Right now I am waiting for Linux-based StarWind VSA implementation which is told to arrive soon.What license of vSphere came with that, and what version of vSphere are you running on the ready nodes?
-
Here's an update for folks following this thread. I was told Dell found a 1.6 TB SED SSD certified with another Dell storage appliance which uses the same firmware and controllers as our PowerVault MD3820i. They think it may work with our configuration and have shipped us one to test. If that does not work correctly, we will continue to look at VSAN options (VMware or Starwind).
-
Here's the latest:We had been doing some testing with Infinio while waiting for the SSD from Dell using diskspd inside a VM that had VMDKs on multiple datastores that were LUNs on the SAN. Their read caching works very well if you need something for that purpose.
Dell sent us the 1.6 TB SED SSD mentioned above (not officially a supported configuration), but it actually made the SAN overall slower using our benchmarking tools and would only apply the cache to one of the controllers for whatever reason. Dell understood and was willing to help us pursue additional options to make it right.
During the process of Dell trying to get us one of those drives that they thought would work in our SAN, I had mentioned we should look at VMware VSAN. But they were quoting it using exact ready node configurations (dual CPU sockets per node), which would have put us over our vSphere licensed limit for this location (4 sockets) in addition to having to purchase VSAN Standard licenses. I suggested single socket and 4 hosts. There are SED options that will work with VSAN, but it really limits you in terms of choices.
As far as the end solution goes, it looks like we'll get bumped to Enterprise Plus in our vSphere licensing to take advantage of VM Encryption as well as getting VSAN Standard for each host for a hybrid config. That way we can use larger spinning disks in the hosts and let the software handle the encryption. We will have to have an external KMS which will also be provided as part of the solution.
The only thing to answer now is whether VxRail does the trick or we go with some kind of modified ready node / build your own host for VSAN. The SAN we have now and 2 R630 hosts plus two of the 10 GB PowerConnect switches will go back to Dell to exchange.
Starwind was a consideration, but it did not seem as easy to manage and maintain as VSAN for a 4-node configuration to get the storage capacity needed.
-
After many conversations between my boss and his Dell team, here's what we're getting (as of early next week):
- Upgrade to vSphere Enterprise Plus and vSAN Standard for 6 sockets
- Four Dell R730s with single socket Xeon procs and 10 drives each (8 10K SAS 1.2 TB non-SED HDDs, 2 SSDs) for two vSAN disk groups per host, running ESXi on mirrored SD cards
- Hytrust for KMS (to be used with VM Encryption) with support paid for by Dell
Here's what we are returning:
- Dell PowerVault MD 3820, all SAS SEDs in it, all SSDs that were originally for caching
- Two PowerEdge R630 servers with dual socket Xeon processors and no internal drives
- Two PowerConnect N4032 switches that were slated for connectivity to the SAN only
We will be keeping 2 of the PowerConnects we originally ordered to stack and use for the VSAN cluster here at HQ.
We originally had vSphere Standard and vCenter Standard for 6 sockets (4 sockets for here at HQ and 2 sockets for the DR site). Those 6 sockets will still be spread as 4 at HQ and 2 at the DR site, making a 4-node vSAN cluster at HQ and a 2-node cluster at the DR site with witness). We're keeping the AppAssure appliance as well.
So with the vSAN 6.6 release just this week, it means we will be on the bleeding edge once everything is configured. The setup would probably make a great series of blogs assuming I have the time to write them.
Thanks to everyone here for the help and advice. I'm excited to play with the new toys!
-
Cool. They seem to have really come through.