Vendor Mistake - VMware Infrastructure Decisions
-
It's great that they are being so open and flexible. I mean they should be, but so often people are not.
-
@Tim_G said in Vendor Mistake - VMware Infrastructure Decisions:
Honestly, I would return EVERYTHING.
Then I would sit down and design it the right way, using a few R730xd servers, with appropriate specs to accommodate your needs. With that and Starwind vSAN, you can get your HA.
Do you actually need HA? Does the company feel spending the money for real HA is a business requirement and makes financial sense?
If I were commenting on this post, I would be asking the same thing. In this case, management agrees HA is a business requirement (not buying a feature that is touted as HA but actually implementing HA for our workloads).
-
I want to make sure I understand how Starwind works in a 2-node VMware configuration. Here's the overview on their site - https://www.starwindsoftware.com/starwind-virtual-san-vmware. We would get 2 hosts on the vSphere HCL with enough internal spinning disks to get the capacity we need (roughly 10-12 TB), some SSDs for caching, and plenty of RAM. If I read that page correctly, in the VMware world, Starwind runs on a VM on each of the hosts and mirrors the storage between hosts for you (I assume presenting just one giant LUN to your ESXi hosts), and the hosts are connected to one another directly through a dedicated NIC for the mirroring and heartbeating.
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool). But if Starwind has to run on a VM on your hosts, wouldn't that mean you'd have to have some storage on your hosts that is setup as a datastore already so that Starwind's VM can actually run on it (i.e. two disks in a RAID 1 presented as a local datastore to each host on which you'd create the VMs for Starwind)?
If you look at page 8 of this comparison guide (https://www.starwindsoftware.com/whitepapers/free-vs-paid.pdf), the deployment scenarios say you can run this VM-less inside the hypervisor.
I saw some articles about having the compute and storage separated (https://www.starwindsoftware.com/technical_papers/StarWind_Virtual_SAN_Compute_and_Storage_Separated_2-Node_Cluster_iSCSI_VMware_vSphere.pdf), but in this case you would have 2 ESXi hosts and then two other hosts that ran Windows and Starwind to act as your VSAN pool.
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I saw some articles about having the compute and storage separated (https://www.starwindsoftware.com/technical_papers/StarWind_Virtual_SAN_Compute_and_Storage_Separated_2-Node_Cluster_iSCSI_VMware_vSphere.pdf), but in this case you would have 2 ESXi hosts and then two other hosts that ran Windows and Starwind to act as your VSAN pool.
You can do that, you would not do it at this scale. You need to be closer to a dozen physical hosts to consider that.
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
-
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
Also that cost study on vSAN is funky. The costs don't make sense to me based on quotes I've seen (I suspect no one actually was trying to get a discounted quote, and put 5 years or support or something on it). It also uses SATA drives (not certified for vSAN) for capacity instead of NL-SAS drives, and looks to be using a non-certified cache tier drive.
I remember you mentioning that once before about going single socket. That's a good point for consideration in all of this.
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
If you look at page 8 of this comparison guide (https://www.starwindsoftware.com/whitepapers/free-vs-paid.pdf), the deployment scenarios say you can run this VM-less inside the hypervisor.
They've always had that for Hyper-V, have they added it for VMware?
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
Before I started here a couple of months ago, my boss purchased a couple of Dell R630s and a PowerVault MD3820i (20 drive bays) to be our new infrastructure at HQ. We have dual 10Gb PowerConnect switches and two UPS devices, each connected to a different circuit. The plan is to rebuild the infrastructure on vSphere Standard (licenses already purchased) and have a similar setup in a datacenter somewhere (replicate the SANs, etc.). We're using AppAssure for backups (again, already purchased).
The PowerVault has 16 SAS drives that are 1.8 TB 7200 RPM SED drives and 4 SAS drives that are 400 GB SSD for caching. Well, we made disk groups and virtual disks using the SEDs (letting the SAN manage the keys), but it turns out we cannot use the SSDs they sent us for caching. In fact, they don't have SED SSDs for this model SAN.
At the time the sale was made, Dell ensured my boss everything would work as he requested (being able to use the SSDs for caching with the 7200 RPM SED drives). Now that we know this isn't going to be the case, we have some options.
First, they recommended we trade in the PowerVault for a Compellent and Equalogic. The boss did not want that because he was saying you are forced to do RAID 6 on those devices and cannot go with RAID 10 in your disk groups. As another option, Dell recommended we put the SSDs in our two hosts and use Infinio so we can do caching with the drives we have. In this case we would make Dell pay for the Infinio licenses and possibly more RAM since they made the mistake.
But I'm wondering if perhaps there is another option. Each server has 6 drive bays. So we have 20 drives total. Couldn't we have Dell take the SAN back, give us another R630, and pay for licenses of VMware vSAN for all 3 hosts? Each server has four 10 Gb NICs and two 1 Gb NICs. That might require we get additional NICs. But in this case, I'm not sure drive encryption is an option or if we can utilize the SEDs at all.
I've not double-checked the vSAN HCL or anything for the gear in our servers as this is just me spit balling. Is there some other option we have not considered? We're looking to get the 14 TB or so of usable space that RAID 10 will provide, but the self-encrypting drives were deemed a necessity by the boss. And without some type of caching, we will not hit our IOPs requirements.
Any advice is much appreciated.
Keep R630s, refund PowerVault, refund AppAss. Get VMware VSAN and Veeam (accordingly).
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
If you look at page 8 of this comparison guide (https://www.starwindsoftware.com/whitepapers/free-vs-paid.pdf), the deployment scenarios say you can run this VM-less inside the hypervisor.
They've always had that for Hyper-V, have they added it for VMware?
I can't seem to find any evidence of it in their documentation from doing a bit of searching (other than the comparison PDF I linked to above).
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
After reading each of these, I finally understand how it works:
http://www.vladan.fr/starwind-virtual-san-product-review/
http://www.vladan.fr/starwind-virtual-san-deployment-methods-in-vmware-vsphere-environment/
https://www.starwindsoftware.com/technical_papers/HA-Storage-for-a-vSphere.pdfSo, in a nutshell, you do use RAID on the host as you normally would and even provision VMware datastores as you normally would. It's the VMDKs you present to the Starwind VM that get used as your virtual iSCSI target. And you can add in the cache size of your choice from the SSD datastores on your ESXi host.
So if I'm patching servers like I should, I'd have to patch the VMs running Starwind as well. Oh man would I hate to install a patch from MS that bombs my storage. I guess theoretically that isn't too different from installing some firmware on a physical SAN that has certain bugs in it. If one Starwind VM gets rebooted, you still have your replication partner presenting storage to the hosts and are ok.
-
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
After reading each of these, I finally understand how it works:
http://www.vladan.fr/starwind-virtual-san-product-review/
http://www.vladan.fr/starwind-virtual-san-deployment-methods-in-vmware-vsphere-environment/
https://www.starwindsoftware.com/technical_papers/HA-Storage-for-a-vSphere.pdfSo, in a nutshell, you do use RAID on the host as you normally would and even provision VMware datastores as you normally would. It's the VMDKs you present to the Starwind VM that get used as your virtual iSCSI target. And you can add in the cache size of your choice from the SSD datastores on your ESXi host.
So if I'm patching servers like I should, I'd have to patch the VMs running Starwind as well. Oh man would I hate to install a patch from MS that bombs my storage. I guess theoretically that isn't too different from installing some firmware on a physical SAN that has certain bugs in it. If one Starwind VM gets rebooted, you still have your replication partner presenting storage to the hosts and are ok.
Right. And Hyper-V alone has very tiny, solid patches. Nothing like patching the OS.
-
I wonder if ditching VMware in the starwinds case is worth it?
-
I have a VMware-based cluster of two ready-nodes purchased from Starwind https://www.starwindsoftware.com/starwind-hyperconverged-appliance half a year ago so I will try to share my experience on that matter. These are completely DELL-based and the pricing is very fair compared to what DELL OEM-partners want for the same configurations.
As already mentioned above, in this particular scenario, StarWind runs inside a VM on each host. The underlying storage is presented over a standard datastore. Alternatively, you can pass-through the whole RAID controller to StarWind VM in case if your ESX resides on a bootable USB/SD/SataDOM/whatever which is a common and good practice nowadays. The usage of hardware RAID makes the overall performance of a single server much faster than you can achieve using software RAINs provided by either VMware vSAN or MSFT S2D (I’ve done some benchmarking on that matter).
ESX hosts are connected over iSCSI to both StarWind VMs simultaneously. These VMs are mirroring the internal storage and presenting this storage back to ESX as a single MPIO-capable iSCSI device. Since round robin policy is used there is no storage failover in case if one StarWind VM is being softly restarted for patching or the whole physical host suddenly dies. In the case of single host power outage, only the migration of production VMs takes place but storage remains active which I find quite awesome.
Another thing that I do enjoy in StarWind is that it uses RDMA-capable networks (I have Mellanox Connectx3) for synchronization which leaves a lot of CPU resources for primary tasks instead of serving storage requests.
Right now I am waiting for Linux-based StarWind VSA implementation which is told to arrive soon. -
@scottalanmiller Starwind on vSphere requires a VM (Linux or Windows). vSAN is the only fully in kernel SDS option on vSphere. (ScaleIO has a VIB for the client side, but not the target side code which still runs in Linux).
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
They are likely stuck here with whatever was already bought. But good info for a greenfield deployment. Or if they manage to return these for three R730 for example.
I'm not entirely certain we'll be stuck with what we bought. My boss and I were on a conference call with folks from Dell yesterday afternoon. They were talking about different options in SAN devices that would meet our requirements (whether it was Compellent, EMC, etc.), but the biggest issue was that these options were so expensive. Again, not one of them mentioned the potential for a VSAN deployment, so we brought it up (using either VMware VSAN or Starwind). The Dell team has to go back and redesign a quote for gear that would better support a VSAN deployment. In their words, they would likely have to return the servers and the PowerVault we have right now (not sure about the other gear - PowerConnect switches, TrippLite devices, APC PDUs, AppAssure appliance, and ip KVM switch).
I'll be curious to see what comes back when they re-quote.
Why do they have to design a quote? You just tell them what you want, they give you a price. Other than "looking up the price", what are they doing?
Verifying the HCL (for vSphere, and vSAN for the storage devices). If they are 13Gen servers though they should be adaptable, it's just a batter of getting a supported HBA (Hint, you want the HBA 330) and getting supported drives. Other thing I'll comment in general (Not related to Dell or vSAN) is avoid Intel NIC's and go Broadcom. LSO/TSO seems to not be stable on large frames (This can be mitigated by disabling offload at the cost of a few % of CPU if you need). After years of hating broadcom NIC's this feels weird and Intel SHOULD be fixing it at some point this quarter, but after 2 years of putting up with this on large frames I'm not that hopeful.
-
@KOOLER said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
Before I started here a couple of months ago, my boss purchased a couple of Dell R630s and a PowerVault MD3820i (20 drive bays) to be our new infrastructure at HQ. We have dual 10Gb PowerConnect switches and two UPS devices, each connected to a different circuit. The plan is to rebuild the infrastructure on vSphere Standard (licenses already purchased) and have a similar setup in a datacenter somewhere (replicate the SANs, etc.). We're using AppAssure for backups (again, already purchased).
The PowerVault has 16 SAS drives that are 1.8 TB 7200 RPM SED drives and 4 SAS drives that are 400 GB SSD for caching. Well, we made disk groups and virtual disks using the SEDs (letting the SAN manage the keys), but it turns out we cannot use the SSDs they sent us for caching. In fact, they don't have SED SSDs for this model SAN.
At the time the sale was made, Dell ensured my boss everything would work as he requested (being able to use the SSDs for caching with the 7200 RPM SED drives). Now that we know this isn't going to be the case, we have some options.
First, they recommended we trade in the PowerVault for a Compellent and Equalogic. The boss did not want that because he was saying you are forced to do RAID 6 on those devices and cannot go with RAID 10 in your disk groups. As another option, Dell recommended we put the SSDs in our two hosts and use Infinio so we can do caching with the drives we have. In this case we would make Dell pay for the Infinio licenses and possibly more RAM since they made the mistake.
But I'm wondering if perhaps there is another option. Each server has 6 drive bays. So we have 20 drives total. Couldn't we have Dell take the SAN back, give us another R630, and pay for licenses of VMware vSAN for all 3 hosts? Each server has four 10 Gb NICs and two 1 Gb NICs. That might require we get additional NICs. But in this case, I'm not sure drive encryption is an option or if we can utilize the SEDs at all.
I've not double-checked the vSAN HCL or anything for the gear in our servers as this is just me spit balling. Is there some other option we have not considered? We're looking to get the 14 TB or so of usable space that RAID 10 will provide, but the self-encrypting drives were deemed a necessity by the boss. And without some type of caching, we will not hit our IOPs requirements.
Any advice is much appreciated.
Keep R630s, refund PowerVault, refund AppAss. Get VMware VSAN and Veeam (accordingly).
I've got (a non-trivial amount) of R630's in my lab running vSAN. You'll want the HBA 330 ideally (you can settle for the PERC H730 if you already have it) but otherwise the server works fine. Only limit over the R730/R730XD is fewer drive bays, and no GPU support.
-
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@NetworkNerd said in Vendor Mistake - VMware Infrastructure Decisions:
I'm also assuming you are turning RAID off on each host so Starwind can provide RAIN for you (thus creating the storage pool).
No, you leave RAID on on the hosts and Starwind provides Network RAID. There is no RAIN here.
So you'd leave RAID on and then make a small local VMFS datastore for the Starwind VM to run on so that Starwind can use the rest of the unformatted storage on the host for its network RAID?
You just follow the Starwind install guide. But yes, that is what is going on.
After reading each of these, I finally understand how it works:
http://www.vladan.fr/starwind-virtual-san-product-review/
http://www.vladan.fr/starwind-virtual-san-deployment-methods-in-vmware-vsphere-environment/
https://www.starwindsoftware.com/technical_papers/HA-Storage-for-a-vSphere.pdfSo, in a nutshell, you do use RAID on the host as you normally would and even provision VMware datastores as you normally would. It's the VMDKs you present to the Starwind VM that get used as your virtual iSCSI target. And you can add in the cache size of your choice from the SSD datastores on your ESXi host.
So if I'm patching servers like I should, I'd have to patch the VMs running Starwind as well. Oh man would I hate to install a patch from MS that bombs my storage. I guess theoretically that isn't too different from installing some firmware on a physical SAN that has certain bugs in it. If one Starwind VM gets rebooted, you still have your replication partner presenting storage to the hosts and are ok.
Right. And Hyper-V alone has very tiny, solid patches. Nothing like patching the OS.
Hyper-V with a console is just as big as windows server from a patching perspective, and even Core Install's see patches with regular (IE monthly quite often) frequency. The install requirements for The ~150MB VMKernel are tiny vs the 10GB+ for Hyper-V Core installs. ESXi regularly goes ~6 months without needing a patch. Most of the patch surface is in upper stack things.
-
@NetworkNerd said:
@scottalanmiller said in Vendor Mistake - VMware Infrastructure Decisions:
@John-Nicholson said in Vendor Mistake - VMware Infrastructure Decisions:
@scottalanmiller If your doing a 3 node vSAN for a low cost deployment you should go single socket and get more core's per proc. Leaves you room to scale later and costs the vSAN cost in half.
They are likely stuck here with whatever was already bought. But good info for a greenfield deployment. Or if they manage to return these for three R730 for example.
I'm not entirely certain we'll be stuck with what we bought. My boss and I were on a conference call with folks from Dell yesterday afternoon. They were talking about different options in SAN devices that would meet our requirements (whether it was Compellent, EMC, etc.), but the biggest issue was that these options were so expensive. Again, not one of them mentioned the potential for a VSAN deployment, so we brought it up (using either VMware VSAN or Starwind). The Dell team has to go back and redesign a quote for gear that would better support a VSAN deployment. In their words, they would likely have to return the servers and the PowerVault we have right now (not sure about the other gear - PowerConnect switches, TrippLite devices, APC PDUs, AppAssure appliance, and ip KVM switch).
I'll be curious to see what comes back when they re-quote.
Honestly may just be a matter of the inside team isn't familiar with it yet (They just re-assigned who has to know what products, and people are flying all over the place training people). Worst case call the VMware inside SDS desk (They are in Austin, right across the parking lot from Spiceworks HQ). Those guys have been piecing together vSAN quotes and have heads dedicated to work with your Dell team and make sure stuff is good.
Now off to pack for ANZ for 2 weeks to go some of the mentioned training....