The VSA is the Ugly Result of Legacy Vendor Lock-Out

scale

VMWare and Hyper-V with the traditional Servers+Switches+SAN architecture – widely adopted by enterprise and the large mid-market – works. It works relatively well, but it is complex (many moving parts, usually from different vendors), necessitates multiple layers of management (server, switch, SAN, hypervisor), and requires the use of storage protocols to be functional at all. Historically speaking, this has led to either the requirement of many people from several different IT disciplines to effectively virtualize and manage a VMWare/Hyper-V based environment effectively, or to smaller companies taking a pass on virtualization as the soft and hard costs associated with it put HA virtualization out of reach.

With the advent of Hyperconvergence in the modern datacenter, HCI vendors had a limited set of options when it came to the shared storage part of the equation. Lacking access to the VMKernel and NTOS kernel, they could only either virtualize the entire SAN and run instances of it as a VM on each node in the HCI architecture (horribly inefficient), or move to hypervisors that aren’t from VMWare or Microsoft. The first choice is what most took, even though it has a very high cost in terms of resource efficiency and IO path complexity as well as nearly doubling the hardware requirements of the architecture to run it. They did this for the sole reason that this was the only way to continue providing their solutions based on the legacy vendors and their lock out and lack of access. Likewise, they found this approach (known as VSA or Virtual SAN Appliance) to be easier than tackling the truly difficult job of building an entire architecture from the ground up, clean sheet style.

The VSA approach – virtualize the SAN and its controllers – also known as pulling the SAN into the servers. The VSA or Virtual San Appliance approach was developed to move the SAN up into the host servers through the use of a virtual machine on each box. This did in fact simplify things like implementation and management by eliminating the separate physical SAN (but not its resource requirements, storage protocols, or overhead – in all actuality, it reduplicates those bits of overhead on every node, turning one SAN into 3 or 4 or more). However, it didn’t do much to simplify the data path. In fact, quite the opposite. It complicated the path to disk by turning the IO path from:

application->RAM->disk

into :

application->RAM->hypervisor->RAM->SAN controller VM->RAM-> hypervisor->RAM->write-cache SSD->erasure code(SW R5/6)->disk->network to next node->RAM->hypervisor->RAM->SAN controller VM->RAM->hypervisor->RAM->write-cache SSD->erasure code(SW R5/6)->disk.

This approach uses so much resource that one could run an entire SMB to MidMarket datacenter on just the CPU and RAM being allocated to these VSA’s

This “stack dependent” approach did, in fact, speed up the time-to-market equation for the HCI vendors that implement it, but due to the extra hardware requirements, extra burden of the IO path, and use of SSD/flash primarily as a caching mechanism for the now tortured IO path in use, this approach still brought a solution in at a price point and complexity level out of reach of the modern SMB.

HCI done the right way – HES

The right way to do an HCI architecture is to take the exact opposite path than all of the VSA based vendors. From a design perspective, the goal of eliminating the dedicated servers, storage protocol overhead, resources consumed, and associated gear is met by moving the hypervisor directly into the OS of a clustered platform that runs storage directly in userspace adjacent to the kernel (known as HES or in-kernel). This leverages direct I/O, thereby simplifying the architecture dramatically while regaining the efficiency originally promised by virtualization.

This approach turns the IO path back into :

application -> RAM -> disk -> backplane -> disk

This complete stack owner approach, in addition to regaining the efficiency promised by HCI, allows for features and functionalities (that historically had to be provided by third parties in the legacy and VSA approaches) to be built directly into the platform, allowing for true single vendor solutions to be implemented and radically simplifying the SMB/SME data center at all levels – lower cost of acquisition, lower total TCO. This makes HCI affordable and approachable to the SMB and Mid-Market. This eliminates the extra hardware requirements, the overhead of SAN, and the overhead of storage protocols and re-serialization of IO. This returns efficiency to the datacenter.

When the IO Path is compared side by side, the differences in the overhead and the efficiency become obvious, and the penalties and pain caused by legacy vendor lock-in start to really stand out, with VSA based approaches (in a basic 3 node implementation) using as much as 24 vCores and up to 300GB RAM (depending on the vendor) just to power the VSA’s and boot themselves vs HES using a fraction of a core per node and 6GB RAM total. Efficiency matters.

Original post: http://blog.scalecomputing.com/the-vsa-is-the-ugly-result-of-legacy-vendor-lock-out/

travisdh1

Alright, I have to ask. Is Starwind able to get access to the hardware level drive access like this in Hyper-V? @KOOLER (sorry, forgetting the others around here with Starwind.)

Deleted74295

Is this really the case? I'm sceptical that a VMWare or HyperV or even a XenServer based system would have that huge a difference in performance requirements compared with a Scale system.

"24 vCores and up to 300GB RAM (depending on the vendor) just to power the VSA’s and boot themselves vs HES using a fraction of a core per node and 6GB RAM total. Efficiency matters."

Is this genuine or is it a flippant example? If it's genuine...shut up and take my money.

travisdh1

@Breffni-Potter said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

Is this really the case? I'm sceptical that a VMWare or HyperV or even a XenServer based system would have that huge a difference in performance requirements compared with a Scale system.

"24 vCores and up to 300GB RAM (depending on the vendor) just to power the VSA’s and boot themselves vs HES using a fraction of a core per node and 6GB RAM total. Efficiency matters."

Is this genuine or is it a flippant example? If it's genuine...shut up and take my money.

From Starwind's LSFS FAQ
"How much RAM do I need for LSFS device to function properly?
4.6 MB of RAM per 1 GB of LSFS device with disabled deduplication,
7.6 MB of RAM per 1 GB of LSFS device with enabled deduplication."

So, yeah, could easily eat up that much ram. ~7.6GB RAM per TB of storage.

I didn't spot the CPU recommendation, but I know it's beefy.

Aconboy

@Breffni-Potter It is absolutely genuine. For example, Simplivity requires that a minimum of 48 GB RAM PER-Node be reserved for their VSA with an entry level, with the higher end nodes taking 100GB RAM for the VSA per node. In some of their older gear, the number was around 150GB per node. With Nutanix, the number with all features turned off starts at 16GB per node, but jumps up to 32 or somewhat more per node as features are turned on. Same story with all the other varietal VSA based vendors. Basically, a VSA is not free, it is a virtualized SAN, and they run an instance of it on every node in their architectures, with the associated resource consumption. - The VSA didn't eliminate the SAN, it virtualized it then replicated it over and over. That is just on the RAM side of things. Then there is cpu core usage associated with each VSA - cores and ram going to run the VSA's instead of the actual workloads. In HC3, we not only eliminated the SAN, we did so without using a VSA at all, so those "reserved" resources go directly into actually running VM's, all the while streamlining the IO path so that there is a dramatic reduction in the number of hops it takes to do things like change a period to a comma.

Deleted74295

@Aconboy said

In HC3, we not only eliminated the SAN, we did so without using a VSA at all, so those "reserved" resources go directly into actually running VM's, all the while streamlining the IO path so that there is a dramatic reduction in the number of hops it takes to do things like change a period to a comma.

I want one now

travisdh1

@Breffni-Potter said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@Aconboy said

In HC3, we not only eliminated the SAN, we did so without using a VSA at all, so those "reserved" resources go directly into actually running VM's, all the while streamlining the IO path so that there is a dramatic reduction in the number of hops it takes to do things like change a period to a comma.

I want one NOW

ftfy

thwr

Just curious: How exactly does your product differ from StarWind Virtual SAN in this context?

scottalanmiller

@thwr said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

Just curious: How exactly does your product differ from StarWind Virtual SAN in this context?

The really, really high level technical different is that VSA / VSAN approach is a layer on top of the hypervisor that has to run as a guest workload. The Scale system puts the storage layer at the same spot that a normal filesystem/LVM would be. It is part of the hypervisor natively and acts just like a filesystem or DRBD. It isn't that it has zero overhead, but it has extremely little as it's just part of the hypervisor itself.

Starwind will vary heavily from ESXi to Hyper-V as it requires a full VM on one and not on the other.

thwr

@scottalanmiller said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@thwr said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

Just curious: How exactly does your product differ from StarWind Virtual SAN in this context?

The really, really high level technical different is that VSA / VSAN approach is a layer on top of the hypervisor that has to run as a guest workload. The Scale system puts the storage layer at the same spot that a normal filesystem/LVM would be. It is part of the hypervisor natively and acts just like a filesystem or DRBD. It isn't that it has zero overhead, but it has extremely little as it's just part of the hypervisor itself.

Starwind will vary heavily from ESXi to Hyper-V as it requires a full VM on one and not on the other.

Ah ok, thx

Aconboy

@thwr @breffni-potter @travisdh1 - we have just released our 1150 platform which brings all features and functionalities with both flash and spinning disk in at a pricepoint under $30k USD for a complete 3 node cluster.

travisdh1

@Aconboy said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@thwr @breffni-potter @travisdh1 - we have just released our 1150 platform which brings all features and functionalities with both flash and spinning disk in at a pricepoint under $30k USD for a complete 3 node cluster.

Trust me, if we needed more than a single server I'd have a cluster!

thwr

@Aconboy said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@thwr @breffni-potter @travisdh1 - we have just released our 1150 platform which brings all features and functionalities with both flash and spinning disk in at a pricepoint under $30k USD for a complete 3 node cluster.

Could you give us some numbers? Like what's included in the 30k? Storage capacity, CPU cores, NICs, upgrade paths...

Aconboy

@thwr sure thing
the 1150 ships with a baseline of 8 broadwell cores per node with the E5-2620v4 upgradable to the e5-2640v4 with 10 cores per node. It ships with 64 GB RAM upgradable to 256 GB per node. It ships with either a 480 GB, 960 GB, or 1.92 TB eMLC ssd per node and 3 1,2, or 4 TB NL-SAS drives per node. Each node has quad gigabit or quad 10gig nics. All features and functionalities are included (HA, DR, multi-site replication, up to 5982 snapshots per-vm, auto tiering with HEAT staging and destaging and automatic prioritization of workload IO to name a few). All 1150 nodes can be joined with all other scale node families and generations both forward and back, so upgrade paths are not artificially limited.

thwr

@Aconboy said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@thwr sure thing
the 1150 ships with a baseline of 8 broadwell cores per node with the E5-2620v4 upgradable to the e5-2640v4 with 10 cores per node. It ships with 64 GB RAM upgradable to 256 GB per node. It ships with either a 480 GB, 960 GB, or 1.92 TB eMLC ssd per node and 3 1,2, or 4 TB NL-SAS drives per node. Each node has quad gigabit or quad 10gig nics. All features and functionalities are included (HA, DR, multi-site replication, up to 5982 snapshots per-vm, auto tiering with HEAT staging and destaging and automatic prioritization of workload IO to name a few). All 1150 nodes can be joined with all other scale node families and generations both forward and back, so upgrade paths are not artificially limited.

Sounds good, what about deduplication / compression? For example, I've got a small 3 node Hyper-V cluster right now with roughly 15TB of (more or less hot) storage. 70% of the VMs are 2008R2 (will be upgraded to 2016 next year), rest is Linux and BSD.

scottalanmiller

No dedupe or compression on the Scale storage. But you can always do that at a higher later with the OS or whatever if you need it. That works in most cases.

KOOLER

@scottalanmiller said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

No dedupe or compression on the Scale storage. But you can always do that at a higher later with the OS or whatever if you need it. That works in most cases.

right! windows server has a recent dedupe so a VM with WS2012R2 will do the trick

KOOLER

@travisdh1 said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

Alright, I have to ask. Is Starwind able to get access to the hardware level drive access like this in Hyper-V? @KOOLER (sorry, forgetting the others around here with Starwind.)

on hyper-v we'll run a mix of a kernel-mode drivers and user-land services and we'll get direct access to hardware

on vmware we'll use hypervisor and will "talk" eventually to VMDK with a data container

KOOLER

@travisdh1 said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

@Breffni-Potter said in The VSA is the Ugly Result of Legacy Vendor Lock-Out:

Is this really the case? I'm sceptical that a VMWare or HyperV or even a XenServer based system would have that huge a difference in performance requirements compared with a Scale system.

"24 vCores and up to 300GB RAM (depending on the vendor) just to power the VSA’s and boot themselves vs HES using a fraction of a core per node and 6GB RAM total. Efficiency matters."

Is this genuine or is it a flippant example? If it's genuine...shut up and take my money.

From Starwind's LSFS FAQ
"How much RAM do I need for LSFS device to function properly?
4.6 MB of RAM per 1 GB of LSFS device with disabled deduplication,
7.6 MB of RAM per 1 GB of LSFS device with enabled deduplication."

So, yeah, could easily eat up that much ram. ~7.6GB RAM per TB of storage.

I didn't spot the CPU recommendation, but I know it's beefy.

you don't always use LSFS with starwind

and if you use lsfs you don't always enable dedupe

and we're offloading hash tables for nvme flash now so upcoming update will have ZERO overhead for dedupe

supported combinations are

flash for capacity and ram for hash tables => FAAAAAAAAAST !!
spinning disk for capacity and nvme flash for hash tables => somehow slower but because of a spinning disk of course

travisdh1

Thanks @KOOLER.