@scottalanmiller said in The argument for official support vs third party support:
I have to agree, if the premise is that people keep choosing software that doesn't work under many conditions and the vendor doesn't fix when a bug is found... sounds like software to generally avoid.
This isn't unique to Hypervisors, this is an issue on switches, routers, and other devices that run proprietary binaries. We had at the MSP tons of bugs we got fixed by Brocade, HPE and Cisco (I'm convinced the RSTP standards committee meetings required everyone drink a bottle of vodka before they got started).
One trend I've noticed is a lot of bugs are now ecosystem bugs. Issues with API's called by backup vendors (who may be doing stupid things that they need to fix). Hardware driver and firmware bugs (Intel basically ignored massive problems with multicast retransmits and LSO/TSO on their 540 NIC for the better part of 4 years). HPE's Switch to Adaptec controllers on Gen 10 came with some.... interesting stability issues. Storage bugs have become more pronounced as people push the limits of SATA SSD's (SATA tunneling protocol was a bad idea in general) and technologies that may have been stable at one time (SAS buffering) just scaled poorly. Your OS vendor having an HCL program (RedHat, VMware do this, Microsoft's 5 pack of Heineken and running a script requirement tends to be a bit weaker) that mandates the other engineers will work with them to get fixes on things. Now you could argue this is the hardware vendors problem (it is) but many times it's the OS/Hypervisor who ends up being the one who forces the issue or puts resources into the RCA. Given the massive monoculture for NIC's, HBA's, and drive controllers this is becoming more "fun". In some ways I"m hopeful things will tame down a bit (the death of SATA and SCSI will help) but in other ways, I'm a bit worried (RCoE could be a mess).
My experience came from working with hundreds of customers, some of whom push the extremes of things, but also SMB's who just lost the hardware/driver/firmware lottery, or discovered (especially with XenApp) that they were the first people to try to use a feature (vGPU offload used to be fun!).
While it's true most vendors will eventually fix critical bugs, if you're not a paying customer they still might take a few weeks. A few weeks of crashes to a company generally isn't cool and is a great way as an MSP to get fired.
The other interesting side of vendor support is the new classes of proactive support (Things like Infosite by Nimble, and Skyline and Humbug at Vmware) where you have telemetry systems written into the software that phone home performance, logs and config data and allow for machine learning systems on the vendor to "Predict" issues across multiple customers. MSP's can't aggregate 500K customers to identify corner cases like this. Cloud providers can, but it's interesting to see traditional infrastructure companies adopt the same model of correlation and continuous improvement.
If you're going to complain about support practices of commercial companies I'd argue the one that bothers me the most is vendors that hide their KB systems and admin guides from people who are not paying. WHAT do they have to hide!