Food for thought:  Fixing an over-engineered environment
- 
 @scottalanmiller said in Food for thought: Fixing an over-engineered environment: @eddiejennings said in Food for thought: Fixing an over-engineered environment: @dashrender said in Food for thought: Fixing an over-engineered environment: @eddiejennings said in Food for thought: Fixing an over-engineered environment: @jaredbusch said in Food for thought: Fixing an over-engineered environment: I never use IPMI. I've been underwhelmed with it. If you're curious, this is the motherboard that's on all of these servers: What doesn't it give you that you want? I might have to re-evaluate it. I've only used the IPMI View java app to use the virtual KVM console. I'm looking its web portal now, and looks pretty good. I would like a way to see RAID health status and configuration, but perhaps that's not a reasonable want. It is not, since RAID is not part of the hardware that the IPMI sees. That's what I figured. 
- 
 @eddiejennings said in Food for thought: Fixing an over-engineered environment: @dashrender said in Food for thought: Fixing an over-engineered environment: @eddiejennings said in Food for thought: Fixing an over-engineered environment: @jaredbusch said in Food for thought: Fixing an over-engineered environment: I never use IPMI. I've been underwhelmed with it. If you're curious, this is the motherboard that's on all of these servers: What doesn't it give you that you want? I might have to re-evaluate it. I've only used the IPMI View java app to use the virtual KVM console. I'm looking its web portal now, and looks pretty good. I would like a way to see RAID health status and configuration, but perhaps that's not a reasonable want. Aww - yeah I have no idea if that's a reasonable want or not. I've always just used a vendor supplied app inside Windows to see the status of the RAID controller. Of course with virtualization, I haven't dug into how that works, connecting directly to the hardware, I assume via some something in the hypervisor, etc. 
- 
 On Dell servers, the iDRAC does show the RAID controller status, as long as you use their PERC cards, but that is designed into the ecosystem. But I do not use iDRAC as a goto. I use the Dell OMSA installed into Hyper-V Server as my daily driver tool. 
- 
 I'm finally getting to the point where I'm planning how all of this will work. This is the end goal for the server hardware. Hyper-V Host 1 (former physical Server 2 - SQL Server) - Will be running the new production VMs
- Six S3500 SSDs configured in RAID 5. Two SSDs will be taken from former Server 1, and two SSDs taken from former Server 3.
 Hyper-V Host 2 (former physical Server 1 - IIS server) - Will be running most likely Veeam VM and storing backups
- Four Seagate Enterprise 4 TB HDDs in RAID 10. -- currently reviewing storage needs for backups, so this could change
 Hyper-V Host 3 (former physical Server 3 - the Yosemite backup server, Redis, and host for PostFix VM) - Purpose to be determined
 *Four S3700 SSDs configured in Raid 5. These SSDs will be taken from former Server 2.
 Since I'll be swapping hard drives between servers, there's going to be downtime, so I'm thinking through how I can reduce how much downtime there will be. The below plan isn't set in stone, but rather just ideas. I would start by copying the data used by the IIS virtual folders to an external device. Once that initial copy is done, I would take the production systems offline. I would take a backup of SQL server and copy it to the external device, as well as copy whatever files are new and have changed with the IIS virtual folders (I love robocopy.) Next, I would do all of the disk swapping from above, install and patch Hyper-V on each of the systems, and configure networking. Then I would create the production VMs, configure the servers and patch, copy the data from the external storage, and restore the SQL server backup. There are probably better ways of doing this. Articulating the above helps my thought process. I had a text document with another plan, which I subsequently deleted, for as I was writing and thinking, I realized how flawed it was. 
- 
 Another idea would be scheduling downtime to turn physical Server 1 into a Hyper-V host (Hyper-V Host 2), and, at the same time, removing the SSDs from it. I would start by copying the data used by the IIS virtual folders to an external device. Once that initial copy is done, I would take IIS offline, and then copy new and changed files. Then I'd do the aforementioned hardware swap, install and patch hyper-V, install, patch, and configure a new IIS VM, and restore the data. At this point, I'd have a single hyper-V host (Hyper-V host 2) with a single VM for IIS. The other two physical servers would remain unchanged. With the new IIS VM running and production still functioning, I'd then create a VMs on this Hyper-V host for the other produciton servers. These would be configured and patched (with different networking config as to not intefere with the current running physical servers). Next, I would take the production systems offline. Backup the data from the reamining non-hyper-v host servers (really SQL Server is the only thing that needs to be backedup). On Hyper-V host 2, configure proper networking for the production VMs and restore the data backed up from the physical servers. Bring all VMs online. At this point, all production VMs are running on Hyper-V host 2. Next I would complete the hardware swapping between the remaining physical servers: moving SSDs between physical servers 2 and 3, and installing the removed SSDs from physical server 1 into physical server 2. The final step would be exporting the production VMs from Hyper-V host 2 to Hyper-V host 1 (the one with the SSD storage). I believe this exporting of the VMs can be done without having to take production offline. The problem is that the production SQL Server would be running on HDDs and SSDs for a little bit, while I would be doing the hardware swap on the remaining physical servers. The question would be if the temporary loss of performance be worth having less downtime, as I could have production running while I swap hardware on the other machines. The advantage to this plan is that it seems like I'd have less over-all downtime than the original idea of take everything down, rebuild, then put everything back on line. 
- 
 @eddiejennings said in Food for thought: Fixing an over-engineered environment: Another idea would be scheduling downtime to turn physical Server 1 into a Hyper-V host (Hyper-V Host 2), and, at the same time, removing the SSDs from it. I would start by copying the data used by the IIS virtual folders to an external device. Once that initial copy is done, I would take IIS offline, and then copy new and changed files. Then I'd do the aforementioned hardware swap, install and patch hyper-V, install, patch, and configure a new IIS VM, and restore the data. At this point, I'd have a single hyper-V host (Hyper-V host 2) with a single VM for IIS. The other two physical servers would remain unchanged. With the new IIS VM running and production still functioning, I'd then create a VMs on this Hyper-V host for the other produciton servers. These would be configured and patched (with different networking config as to not intefere with the current running physical servers). Next, I would take the production systems offline. Backup the data from the reamining non-hyper-v host servers (really SQL Server is the only thing that needs to be backedup). On Hyper-V host 2, configure proper networking for the production VMs and restore the data backed up from the physical servers. Bring all VMs online. At this point, all production VMs are running on Hyper-V host 2. Next I would complete the hardware swapping between the remaining physical servers: moving SSDs between physical servers 2 and 3, and installing the removed SSDs from physical server 1 into physical server 2. The final step would be exporting the production VMs from Hyper-V host 2 to Hyper-V host 1 (the one with the SSD storage). I believe this exporting of the VMs can be done without having to take production offline. The problem is that the production SQL Server would be running on HDDs and SSDs for a little bit, while I would be doing the hardware swap on the remaining physical servers. The question would be if the temporary loss of performance be worth having less downtime, as I could have production running while I swap hardware on the other machines. The advantage to this plan is that it seems like I'd have less over-all downtime than the original idea of take everything down, rebuild, then put everything back on line. What's your IOPs usage on that SQL DB at the time of day that you plan to make the change? If it is low enough, you might not even notice a performance difference. 
- 
 @dashrender said in Food for thought: Fixing an over-engineered environment: What's your IOPs usage on that SQL DB at the time of day that you plan to make the change? If it is low enough, you might not even notice a performance difference. Good question. 
- 
 @eddiejennings said in Food for thought: Fixing an over-engineered environment: @dashrender said in Food for thought: Fixing an over-engineered environment: What's your IOPs usage on that SQL DB at the time of day that you plan to make the change? If it is low enough, you might not even notice a performance difference. Good question. Here is the last hour for the drive that holds the actual database files. Drives are all SSD.  The spike in writes after 2:30 was 31 / second. The spike in reads just after 3:00 was 3.5 / second. 
- 
 @eddiejennings said in Food for thought: Fixing an over-engineered environment: @eddiejennings said in Food for thought: Fixing an over-engineered environment: @dashrender said in Food for thought: Fixing an over-engineered environment: What's your IOPs usage on that SQL DB at the time of day that you plan to make the change? If it is low enough, you might not even notice a performance difference. Good question. Here is the last hour for the drive that holds the actual database files. Drives are all SSD.  The spike in writes after 2:30 was 31 / second. The spike in reads just after 3:00 was 3.5 / second. Well, looks like you would be fine on a single SATA drive for that DB, SSDs are sleeping. Toss in a typical caching RAID controller and all the drives are more or less sleeping. 
- 
 @dashrender You mean the 4 SSDs in Raid 10 on which the D drive lives was overkill? (hence this thread) :face_with_stuck-out_tongue_closed_eyes: 
- 
 @eddiejennings said in Food for thought: Fixing an over-engineered environment: @dashrender You mean the 4 SSDs in Raid 10 on which the D drive lives was overkill? (hence this thread) :face_with_stuck-out_tongue_closed_eyes: LOL, Yep! 
- 
 IMHO, dispersing the storage between the hosts and configuring data replication between them would be the best option. Creating several tiers for the migration of your VMs is always nice to have - downtime strikes unexpectedly. 



