VMWare and Hyper-V with the traditional Servers+Switches+SAN architecture – widely adopted by enterprise and the large mid-market – works. It works relatively well, but it is complex (many moving parts, usually from different vendors), necessitates multiple layers of management (server, switch, SAN, hypervisor), and requires the use of storage protocols to be functional at all. Historically speaking, this has led to either the requirement of many people from several different IT disciplines to effectively virtualize and manage a VMWare/Hyper-V based environment effectively, or to smaller companies taking a pass on virtualization as the soft and hard costs associated with it put HA virtualization out of reach.
With the advent of Hyperconvergence in the modern datacenter, HCI vendors had a limited set of options when it came to the shared storage part of the equation. Lacking access to the VMKernel and NTOS kernel, they could only either virtualize the entire SAN and run instances of it as a VM on each node in the HCI architecture (horribly inefficient), or move to hypervisors that aren’t from VMWare or Microsoft. The first choice is what most took, even though it has a very high cost in terms of resource efficiency and IO path complexity as well as nearly doubling the hardware requirements of the architecture to run it. They did this for the sole reason that this was the only way to continue providing their solutions based on the legacy vendors and their lock out and lack of access. Likewise, they found this approach (known as VSA or Virtual SAN Appliance) to be easier than tackling the truly difficult job of building an entire architecture from the ground up, clean sheet style.
The VSA approach – virtualize the SAN and its controllers – also known as pulling the SAN into the servers. The VSA or Virtual San Appliance approach was developed to move the SAN up into the host servers through the use of a virtual machine on each box. This did in fact simplify things like implementation and management by eliminating the separate physical SAN (but not its resource requirements, storage protocols, or overhead – in all actuality, it reduplicates those bits of overhead on every node, turning one SAN into 3 or 4 or more). However, it didn’t do much to simplify the data path. In fact, quite the opposite. It complicated the path to disk by turning the IO path from:
application->RAM->disk
into :
application->RAM->hypervisor->RAM->SAN controller VM->RAM-> hypervisor->RAM->write-cache SSD->erasure code(SW R5/6)->disk->network to next node->RAM->hypervisor->RAM->SAN controller VM->RAM->hypervisor->RAM->write-cache SSD->erasure code(SW R5/6)->disk.
This approach uses so much resource that one could run an entire SMB to MidMarket datacenter on just the CPU and RAM being allocated to these VSA’s
This “stack dependent” approach did, in fact, speed up the time-to-market equation for the HCI vendors that implement it, but due to the extra hardware requirements, extra burden of the IO path, and use of SSD/flash primarily as a caching mechanism for the now tortured IO path in use, this approach still brought a solution in at a price point and complexity level out of reach of the modern SMB.
HCI done the right way – HES
The right way to do an HCI architecture is to take the exact opposite path than all of the VSA based vendors. From a design perspective, the goal of eliminating the dedicated servers, storage protocol overhead, resources consumed, and associated gear is met by moving the hypervisor directly into the OS of a clustered platform that runs storage directly in userspace adjacent to the kernel (known as HES or in-kernel). This leverages direct I/O, thereby simplifying the architecture dramatically while regaining the efficiency originally promised by virtualization.
This approach turns the IO path back into :
application -> RAM -> disk -> backplane -> disk
This complete stack owner approach, in addition to regaining the efficiency promised by HCI, allows for features and functionalities (that historically had to be provided by third parties in the legacy and VSA approaches) to be built directly into the platform, allowing for true single vendor solutions to be implemented and radically simplifying the SMB/SME data center at all levels – lower cost of acquisition, lower total TCO. This makes HCI affordable and approachable to the SMB and Mid-Market. This eliminates the extra hardware requirements, the overhead of SAN, and the overhead of storage protocols and re-serialization of IO. This returns efficiency to the datacenter.
When the IO Path is compared side by side, the differences in the overhead and the efficiency become obvious, and the penalties and pain caused by legacy vendor lock-in start to really stand out, with VSA based approaches (in a basic 3 node implementation) using as much as 24 vCores and up to 300GB RAM (depending on the vendor) just to power the VSA’s and boot themselves vs HES using a fraction of a core per node and 6GB RAM total. Efficiency matters.
Original post: http://blog.scalecomputing.com/the-vsa-is-the-ugly-result-of-legacy-vendor-lock-out/