Adding a New Hyperconverged Cluster to Your Existing Network

scottalanmiller

We talk about investing in hyperconvergance rather frequently, but in doing so we often ignore the fact that most businesses, nearly all, have existing technical debt, often in the form of physical servers, virtual systems and even full clusters that can range from the small and basic to rather sizeable. In some lucky cases we may have the entire existing infrastructure ready for a refresh and moving to hyperconvergence is as simple as buying a new hyperconverged cluster on the next refresh cycle and retiring everything that already exists. But, in the real world, this is exceedingly rare.

The reality is that nearly all of us, when moving to any new infrastructure like this, is going to have to either find a way to utilize much or all of what already exists in our environment or we have to retire good equipment before it has reached a reasonable end of life. This is something that we deal with in IT every day and there is really nothing particularly new here.

There is a tempting sales point of hyperconvergence that we can collapse our entire business down to a single highly available, inclusive cluster and not need the sprawl of different types of equipment that we have had in the past. For many, this is even the goal. But how often is this realistic to do all at once when we first acquire a cluster?

For many, the best way to use hyperconvergence is as part of a larger picture, at least initially. One of the great selling points of hyperconverged infrastructures, though, is that (generally) we can grow organically and easily over time so that we don't have to over invest at the initial time of purchase. This makes it often very easy to ease a new hyperconverged stack into our existing collection of systems with the minimal impact (technologically or financially.)

There is no reason that a new hyperconverged cluster cannot run side by side with the pre-existing infrastructure. This is how all servers, VMs, and clusters ran naturally before moving to hyperconvergence, nothing changes that ability now. Of course we desire to reduce those moving parts and we will likely move towards that, possibly quickly, but there is nothing pressuring us to do so. If we have an existing virtualization cluster, for example VMware ESXi 6 with three nodes and a shared SAN, we can continue to use that cluster for as long as we want. After our new HC cluster has been introduced and burned in, we are free to move as many workloads from the old cluster to the new whenever we want. New workloads would, of course, be spun up on the new HC cluster and, over time, the old cluster would no longer be needed once all workloads had been moved. This can be a monumental lift and shift happening all at once or it could be done gradually over time in a phased approach. It is not uncommon to tackle one workload per week or even per month. Slow, planned, controlled migrations are often the safest approach.

Because of the easy growth options available in most hyperconvergent systems, purchasing only enough capacity for initial workloads and buying more capacity over time as workloads migrate can aid in lowering up front investment, allowing the platform to prove itself and squeezing more use out of the older platforms before fulling retiring or re-purposing them.

Of course, there is always the option to keep the old infrastructure indefinitely. This can be an excellent choice if the old infrastructure is purpose designed for a specific workload, such as perhaps VDI, and the HC cluster is designed for more general purpose needs (or vice versa, of course.) Using each tuned for specific use cases is absolutely viable, but does require knowledge of and support of two different clusters each of a unique type which creates IT overhead, but remains a very realistic option.

Even more useful is to maintain an older cluster, once production workloads have fully migrated, as a place for testing, development and staging. Older clusters coming out of recent production usage often have many years of useful life left in them but would be expensive or risky to maintain for production level reliability.

Older clusters can also be used as backup targets, on premises or off premises disaster recovery platforms and more. There are many effective ways for the majority of businesses to effectively reuse and re-purpose older gear when moving to hyperconvergence to keep from having to completely scrap everything that had come before, but with an eye towards an effective, long term strategy to reduce technical debt and move towards a unified, forward looking infrastructure.

Thinking creatively and looking at opportunities arising from the addition of a new central platform within our overall infrastructures can allow us to invest in the future, reduce technical debt and leverage what we have already invested in to maximum efficiency.

StorageNinja

Disclaimer, I work for VMware SABU, who make software for HCI

There is no reason that a new hyperconverged cluster cannot run side by side with the pre-existing infrastructure.

The Challenge comes that a lot of HCI vendors/platforms don't "play nice" with existing storage assets. By this I mean they can't mount external storage arrays, re-use existing assets that still have support-depreciation. This lock out can be support/technical (I don't think Scale Computing supports you using external storage with their product), This can be hardware limited (You can't get FC cards for a Simplivity/Nutanix box) or it can be administrative. An example of this EVERY HCI Product other than UCP-HC and VxRAIL that has a management system that outright is hostile/doesn't support extending it's ease of management to external products. Beyond the example of SPBM allowing management of HCI storage (as well as extending it's support to external arrays with VVOLs) it should be noted that HCI systems tend to make their own storage easier (While actively trying to make it harder to add external assets). Why does this matter? Expansion costs and support fee's. HCI vendors are increasingly adopting the legacy storage vendor model of discounting hugely up front, then charging significantly higher support renewals (and removing discounts once you get past your starter pack). Signs of this are when you go to add node 4 or 5 discovering that it costs as much as nodes 1-2 and 3. Another sign is a support renewal of 6-12K for an otherwise ordinary server. While I'm not opposed to an increasing shift to opex for IT (It's necessary especially to keep companies from running gear into the ground foolishly), doing so in way that you don't realize until year 2 or 4 support renewals is something everyone should watch out for and be aware of. Price != Costs, and while HCI does reduce a LOT of hidden opex costs, you need to be aware of the real Total cost of ownership of what your buying. A great read on this topic is HDS's "34 costs of storage" white paper.

StorageNinja

Because of the easy growth options available in most hyperconvergent systems, purchasing only enough capacity for initial workloads and buying more capacity over time as workloads migrate can aid in lowering up front investment, allowing the platform to prove itself and squeezing more use out of the older platforms before fulling retiring or re-purposing them.

HOW you can grow HCI is important as it has a lot of other trickle down costs. If I can only expand by adding more nodes (and not by growing capacity inside an existing node) this means I may be forced to buy an entire server/node just to add 4TB of Capacity. If your HCI system can start with 1/2 the drive bay's populated and grow by simply adding drives you can grow without incurring secondary costs. This is an advantage of HCI systems that are fundamentally true software offerings.

Secondary costs for Socket's include...

Backup Software often licensed per socket.
Microsoft Licensing (Now per core for added fun!)
Monitoring software
Port costs and port licensing on networking (Thankfully switching is getting cheaper)
Power/cooling costs. A disk shelf, or adding disks to existing servers carries a LOT lower Power/Cooling bill. Not saying external arrays are great, but If I"m adding CPU's and Memory that I DON"T need this is a lot worse for power consumption and why XIV never caught on.
Support costs, that are magically higher than
HCI software (It may be "free" but if the Flash drive costs more than Dell's 50 cents per GB, then your just buying software licensing baked into hardware, which is the old storage model all over again)

StorageNinja

Of course, there is always the option to keep the old infrastructure indefinitely. This can be an excellent choice if the old infrastructure is purpose designed for a specific workload, such as perhaps VDI, and the HC cluster is designed for more general purpose needs (or vice versa, of course.) Using each tuned for specific use cases is absolutely viable, but does require knowledge of and support of two different clusters each of a unique type which creates IT overhead, but remains a very realistic option.

I'd argue dropping the skill sets to maintain legacy clusters is one of the main appeals to HCI. (I Personally want to FORGET about HBA Queue depth management, and FC Zoning). The problem of running old gear indefinably is the hardware support costs, environmental costs, and lack of support for new hypervisor versions and other things eventually catch up with you before you realize it.

StorageNinja

Older clusters can also be used as backup targets, on premises or off premises disaster recovery platforms and more. There are many effective ways for the majority of businesses to effectively reuse and re-purpose older gear when moving to hyperconvergence to keep from having to completely scrap everything that had come before, but with an eye towards an effective, long term strategy to reduce technical debt and move towards a unified, forward looking infrastructure.

I hate this, because it leads to undersized, 1/2 dead, out of support DR environments. It also means you can't upgrade production to a hypervisor version/VM Hardware Version that the Ancient DR site gear can't support. I get not having like for like (N+0 instead of N+1) but you need SOMETHING that can handle the load and I've seen many a DR plan fail because the blade system at DR was 8 years old and dead, or the ancient EVA Array couldn't handle the boot storm.

StorageNinja

This can be a monumental lift and shift happening all at once or it could be done gradually over time in a phased approach. It is not uncommon to tackle one workload per week or even per month. Slow, planned, controlled migrations are often the safest approach.

I'd argue this shouldn't take long for a few reasons...

Stick with your existing Hypervisor/platform. If done in the trial window this means typically a "Net" reduction in licensing costs as Hypervisor licensing is perpetual.
You should use your vendor's burn in tools. VMware has the Proactive tests, that you can automate the creation of 10 IO Blazers per host with different workloads to stress the environment, as well as automated Netperf network testing between hosts so you can "Break" weak points before going live. HCI Bench is another great free tool for shaking out a bad driver, or flaky drive. Testing with 1-2 VM's a month isn't actually putting real load, and if the problem is a flaky SSD Firmware you still might not detect it until you move your heavy LOB app that does 90% of your IO.
Every month you have a new asset sitting there unused is wasted hardware and support depreciation. I remember when storage migrations had to be done at the app level and it was not uncommon to take 6-9 months to migrate data into an array, and another 6-9 months to migrate it out. This led to the creation of Storage Virutalization as having an asset spend 1/4 of it's life not being fully leveraged was just silly.

Migrations should be easy and non-disruptive if using your hypervisors tools.

dafyre

@scottalanmiller said in Adding a New Hyperconverged Cluster to Your Existing Network:

This can be a monumental lift and shift happening all at once or it could be done gradually over time in a phased approach. It is not uncommon to tackle one workload per week or even per month.

There are some tools that help and allow the servers to be live migrated across Hypervisors and clusters that will allow the systems to keep working, so that downtime is not so much of an issue.

That's not to say that the migrations should be started during the peak usage time for your LOB app, but it can, ideally, be moved during slower usage times without adverse effect on performance or requiring any down time.

Slow, planned, controlled migrations are often the safest approach.

Regardless of the tools use, this is still a good bet. But every SMB is going to have its own definition of "slow,planned, and controlled".

scottalanmiller

@John-Nicholson said in Adding a New Hyperconverged Cluster to Your Existing Network:

Disclaimer, I work for VMware SABU, who make software for HCI

There is no reason that a new hyperconverged cluster cannot run side by side with the pre-existing infrastructure.

The Challenge comes that a lot of HCI vendors/platforms don't "play nice" with existing storage assets. By this I mean they can't mount external storage arrays, re-use existing assets that still have support-depreciation. This lock out can be support/technical (I don't think Scale Computing supports you using external storage with their product), This can be hardware limited (You can't get FC cards for a Simplivity/Nutanix box) or it can be administrative. An example of this EVERY HCI Product other than UCP-HC and VxRAIL that has a management system that outright is hostile/doesn't support extending it's ease of management to external products. Beyond the example of SPBM allowing management of HCI storage (as well as extending it's support to external arrays with VVOLs) it should be noted that HCI systems tend to make their own storage easier (While actively trying to make it harder to add external assets). Why does this matter? Expansion costs and support fee's. HCI vendors are increasingly adopting the legacy storage vendor model of discounting hugely up front, then charging significantly higher support renewals (and removing discounts once you get past your starter pack). Signs of this are when you go to add node 4 or 5 discovering that it costs as much as nodes 1-2 and 3. Another sign is a support renewal of 6-12K for an otherwise ordinary server. While I'm not opposed to an increasing shift to opex for IT (It's necessary especially to keep companies from running gear into the ground foolishly), doing so in way that you don't realize until year 2 or 4 support renewals is something everyone should watch out for and be aware of. Price != Costs, and while HCI does reduce a LOT of hidden opex costs, you need to be aware of the real Total cost of ownership of what your buying. A great read on this topic is HDS's "34 costs of storage" white paper.

https://mangolassi.it/topic/11530/new-hyperconvergence-old-storage

scottalanmiller

@John-Nicholson said in Adding a New Hyperconverged Cluster to Your Existing Network:

Because of the easy growth options available in most hyperconvergent systems, purchasing only enough capacity for initial workloads and buying more capacity over time as workloads migrate can aid in lowering up front investment, allowing the platform to prove itself and squeezing more use out of the older platforms before fulling retiring or re-purposing them.

HOW you can grow HCI is important as it has a lot of other trickle down costs. If I can only expand by adding more nodes (and not by growing capacity inside an existing node) this means I may be forced to buy an entire server/node just to add 4TB of Capacity. If your HCI system can start with 1/2 the drive bay's populated and grow by simply adding drives you can grow without incurring secondary costs. This is an advantage of HCI systems that are fundamentally true software offerings.

Secondary costs for Socket's include...

Backup Software often licensed per socket.

Microsoft Licensing (Now per core for added fun!)

Monitoring software

Port costs and port licensing on networking (Thankfully switching is getting cheaper)

Power/cooling costs. A disk shelf, or adding disks to existing servers carries a LOT lower Power/Cooling bill. Not saying external arrays are great, but If I"m adding CPU's and Memory that I DON"T need this is a lot worse for power consumption and why XIV never caught on.

Support costs, that are magically higher than

HCI software (It may be "free" but if the Flash drive costs more than Dell's 50 cents per GB, then your just buying software licensing baked into hardware, which is the old storage model all over again)

Maybe I didn't explain it well - but this would be growing up to the originally intended size so none of those things are "additional" costs or considerations, they would have all been in the initial scope. The point here was that purchasing some of them, many of them, could be held off until necessary rather than forcing to you buy hardware, software and licensing totally up front before it was all ready to be used. The total amount doesn't change, the time value of money is just leveraged for maximum effect.

scottalanmiller

@John-Nicholson said in Adding a New Hyperconverged Cluster to Your Existing Network:

Of course, there is always the option to keep the old infrastructure indefinitely. This can be an excellent choice if the old infrastructure is purpose designed for a specific workload, such as perhaps VDI, and the HC cluster is designed for more general purpose needs (or vice versa, of course.) Using each tuned for specific use cases is absolutely viable, but does require knowledge of and support of two different clusters each of a unique type which creates IT overhead, but remains a very realistic option.

I'd argue dropping the skill sets to maintain legacy clusters is one of the main appeals to HCI. (I Personally want to FORGET about HBA Queue depth management, and FC Zoning). The problem of running old gear indefinably is the hardware support costs, environmental costs, and lack of support for new hypervisor versions and other things eventually catch up with you before you realize it.

Yes, it's technical debt for sure and in most cases, you want to head towards phasing it out. But whatever skill set you had for it, you will still have at the time that you begin replacing it and can slowly reduce the dependency and the skills related to it together.