The SMB Two Server Dilema, What to Do
- 
 So often in discussing systems we talk about what not to do but we sometimes forget that we need to do what to do instead. This is probably the more important topic, knowing what not to do is pointless if we are already doing the right thing. Knowing what not to do isn't very useful without knowing what actually works. So let's tackle that. In the SMB market we often face the situation where we feel that we need to move past having a single server or a single container for our business functions. Deciding when this happens or how to determine it more than a single server is needed is a great topic for another thread. Assuming that we know that two servers are needed, but that we only need the capacity of a single server (e.g. we have two servers for the primary purpose of protecting against failure rather than because we have moved past the working capacity of a single box) we have a few standard approaches that work for essentially all cases. Approach 1: Two Stand Alone Servers This might seem like a silly idea, but it is generally far more practical than people give it credit for being. Individual servers are very simple to set up and maintain, easy to understand, have no unnecessary dependencies or complexities, are very fast and efficient and far more reliable than people generally assume. When well treated, five nines is not impossible for uptime from quality servers that are well maintained. As high or even higher than nearly any NAS or SAN in the same category. With two servers there is no automatic failover in case one node fails. However, in most cases restoring from backup is quick and easy. If the restore time of critical workloads is acceptable, and in many cases it would be almost unnoticed, then this approach is very cost effective. You simply rely on the rapid restore capabilities of the platform to get systems back up and running on the remaining host in a timely manner. Not all workloads would normally need to be restored during an interim disaster recovery process making this procedure more efficient than it might first seam. By having two servers we start by splitting the workload (in most cases) between the two separating our failure domain so that if one fails, only half of our workloads go down immediately. This alone is a tiny benefit and not always truly beneficial if all of the services are tightly coupled and this has to be considered. But with split workloads it means that, at most, 50% of systems need to be restored to the temporary location from backup and in nearly all cases it would be even less than that. Sometimes only one or two critical, non-redundant workloads would need manual restoration while awaiting the repair or replacement of the failed host. And with good service contracts that can be handled by a vendor the same day, in most cases, making even that relatively trivial. This approach is the farthest from HA (high availability) but is quite a bit more resilient than having just a single server while being the simplest, easiest and lowest cost to implement. Approach 2: Two Server HA Clustering Once spending the money to acquire two servers, most companies are going to want to go the extra mile and implement a full high availability cluster with them. This generally makes sense in today's market where the cost of HA solutions has fallen dramatically and is potentially even free. In this approach the storage of the two servers is clustered together to make a single storage pool. This requires, generally, an increase in the total amount of storage purchased and potentially an increase in storage performance to offset the overhead of storage replication. This, however, is necessary in any HA solution as storage replication is the underpinning of any high availability design. So even when growing past this stage these requirements effectively remain even if moving to external storage. SAN or NAS of the same class need the same storage replication to approach the same level of reliability at that layer, so this is never an extra cost of this approach, it is simply an inherent cost of all HA. There are many storage technologies that can do this such as DRBD, HAST, StarWind, Gluster, CEPH, OpenIO and more. Many of these are free, especially in a two node situation. Our choice of storage replication technology will depend on our choice of platform. Some example solutions would be: - XenServer with DRBD. DRBD is fully baked into the platform itself, and completely free and used in many other scenarios such as HA Linux servers and NAS devices. It's a very standard and battle tested component. It runs on XenServer's Dom0 and is included, not an add on. This approach is 100% free top to bottom.
- Hyper-V with Hyper-V Replication. This approach uses nothing but Microsoft's native Hyper-V capabilities and does a rapid, automated asynchronous replication from one node to the other. Not as robust as using DRBD or Starwind, but inclusive and simple.
- Hyper-V with Starwind. This is the Hyper-V equivalent to XenServer with DRBD. Starwind is a third party component but enterprise class and totally free for two nodes in this manner.
- KVM with DRBD. Same as with XS, totally free and totally inclusive in the base product, no third party products needed.
- ESXi with StarWind. ESXi to do this requires at a minimum the Essentials Plus license, but Starwind will do a two node scale system for free. So there is no additional cost to the replicated local storage here.
 Those solutions cover essentially all common use cases for a two node HA cluster. And these approaches together cover essentially all realistic, real world two node scenarios. Local storage is the only viable approach for two servers (and often for many more) and nothing more should be considered until there is growth in nodes, and often vertical growth will trump scaling out here for the same reasons of simplicity. 
- 
 These scenarios work, so reliably, because the lower the total cost of the systems, while lowering the complexity to only what is necessary to accomplish the goals while not adding any additional or unnecessary fragility such as external storage (another point of dependency) or switching (again, more things to depend on) when not needed. Keeping it simple and highly available. Both of these approaches dramatically improves reliability compared to the reliability of a single server working on its own. Many other approaches do not do this, they work by increasing risk and complexity and then mitigating the increased risk that they introduced rather than mitigating the risks inherent to the original problem. This two server approach with replicated local storage maintains a single failure domain and mitigates the risks that it contains. It wins the risk game by reducing vertical risk (failure domains or layers) while also reducing horizontal risk (mitigating the risk within the existing failure domain.) 
- 
 Scaling up from this approach simply requires a storage layer that will continue to grow horizontally when needed or moving to one that will. Vendors like Starwind, HP VSA, VMware VSAN, Scale, Gluster, CEPH, OpenIO all tackle that scaling problem with this approach. 
- 
 @scottalanmiller said in The SMB Two Server Dilema, What to Do: Scaling up from this approach simply requires a storage layer that will continue to grow horizontally when needed or moving to one that will. Vendors like Starwind, HP VSA, VMware VSAN, Scale, Gluster, CEPH, OpenIO all tackle that scaling problem with this approach. A bit late but... RDMA and NVMe are game changers really. RDMA bridged NVMe device has better overall performance and a bit (irrelevant) higher latency. This means we'll see a new generation of a Software Defined Storage soon because neither "wide striping" nor "data locality" have any sense anymore  
- 
 umm... . not sure if this belong here, but has anyone used Zerto ? Any indication of the price-points ? also, are there any similar product(s)/solution(s) that're free of cost ? 
- 
 - XenServer with DRBD. DRBD is fully baked into the platform itself, and completely free and used in many other scenarios such as HA Linux servers and NAS devices. It's a very standard and battle tested component. It runs on XenServer's Dom0 and is included, not an add on. This approach is 100% free top to bottom.
 DRBD is NOT included with XS, you have to manually add it via some external repo like elrepo or build it from source. 
- 
 @Francesco-Provino said in The SMB Two Server Dilema, What to Do: - XenServer with DRBD. DRBD is fully baked into the platform itself, and completely free and used in many other scenarios such as HA Linux servers and NAS devices. It's a very standard and battle tested component. It runs on XenServer's Dom0 and is included, not an add on. This approach is 100% free top to bottom.
 DRBD is NOT included with XS, you have to manually add it via some external repo like elrepo or build it from source. Maybe management tools, but DRBD is part of the stock kernel for some time now, including the one in XS' Dom0. 
- 
 @Veet said in The SMB Two Server Dilema, What to Do: umm... . not sure if this belong here, but has anyone used Zerto ? Any indication of the price-points ? also, are there any similar product(s)/solution(s) that're free of cost ? XenOrchestra does that for free. Although limited value, as DRBD does this better for free as well. You only use this for specific situations where you don't want the full replication. 
- 
 @scottalanmiller of course is part of the kernel, but is useless without the management tools. 
- 
 @Francesco-Provino said in The SMB Two Server Dilema, What to Do: @scottalanmiller of course is part of the kernel, but is useless without the management tools. Yes, but adding a management tool is not the same as adding the functionality. It's not that you are getting something and shoehorning it in, you are just deciding what interface you want to manage the built in functionality with. 
- 
 @scottalanmiller said in The SMB Two Server Dilema, What to Do: @Veet said in The SMB Two Server Dilema, What to Do: umm... . not sure if this belong here, but has anyone used Zerto ? Any indication of the price-points ? also, are there any similar product(s)/solution(s) that're free of cost ? XenOrchestra does that for free. Although limited value, as DRBD does this better for free as well. You only use this for specific situations where you don't want the full replication. From what I read-up on Zerto, is that it can do cross hypervisor replication ... I'm not sure whether DRBD does that .. 
- 
 A lot of people don't want third party tools that come from an unknown source. That the DRBD feature is totally built into the kernel and there just waiting to be exposed with an interface is a big deal that makes users feel much better than getting the actual functionality from a different company. 
- 
 In scenarios such as these what would be the recommended backup approach: DAS, NAS, Backup Appliance, lower end server, removable disk storage, tapes (intentionally left out cloud)? 
- 
 @Veet said in The SMB Two Server Dilema, What to Do: @scottalanmiller said in The SMB Two Server Dilema, What to Do: @Veet said in The SMB Two Server Dilema, What to Do: umm... . not sure if this belong here, but has anyone used Zerto ? Any indication of the price-points ? also, are there any similar product(s)/solution(s) that're free of cost ? XenOrchestra does that for free. Although limited value, as DRBD does this better for free as well. You only use this for specific situations where you don't want the full replication. From what I read-up on Zerto, is that it can do cross hypervisor replication ... I'm not sure whether DRBD does that .. DRBD doesn't care what you use, it just makes identical storage in two locations. It's a pure storage solution, not a "replication" solution. This is network RAID. 
- 
 @whizzard said in The SMB Two Server Dilema, What to Do: In scenarios such as these what would be the recommended backup approach: DAS, NAS, Backup Appliance, lower end server, removable disk storage, tapes (intentionally left out cloud)? That's a lot more flexible. DAS, removable disk and similar are not very easy to use and backups + hard to use = no backups. NAS, file server, backup appliance and tape are the good solutions and each depends heavily on your needs. Appliances vs. software + storage is basically "what product do you want". NAS vs. file server is just two of the same thing, different look and feel. Tape is the odd man out here, but has a lot of good use cases. It's ability to remain cold, last for a really long time and be totally disconnected from the original storage is all really nice. 
- 
 @whizzard said in The SMB Two Server Dilema, What to Do: In scenarios such as these what would be the recommended backup approach: DAS, NAS, Backup Appliance, lower end server, removable disk storage, tapes (intentionally left out cloud)? Should be separate (physically!) entity non-related to your production cluster. Cheap NAS is OK. 
- 
 @KOOLER said in The SMB Two Server Dilema, What to Do: @whizzard said in The SMB Two Server Dilema, What to Do: In scenarios such as these what would be the recommended backup approach: DAS, NAS, Backup Appliance, lower end server, removable disk storage, tapes (intentionally left out cloud)? Should be separate (physically!) entity non-related to your production cluster. Cheap NAS is OK. For the average scenario (and I really just mean average) it's Synology or ReadyNAS that I recommend. Easy, supported, cost effective, desktop or rackmount options, well known, good brands, nice features. 

