Warning! Wall of text ahead. Tried to provide the same details I am considering so I can receive the best feedback. Please help me think about things I already haven't or insert additional ideas. Thanks!
Problem Statement: Operations (production) virtual environment is running in an inverted pyramid of doom. Performance of storage array is low due to oversubscription. Looking for advice to correct this.
Background: This is a continuation of my other thread (http://mangolassi.it/topic/6236/zfs-based-storage-for-medium-vmware-workload) where I was looking for feedback on our delivery "development" environment storage. Our operations network consists of the following hardware and dependency chain. (See diagram at bottom for overview)
VMHost hardware
- VMH-OPS1/2 VMware 5.5 essentials plus
- HP Proliant DL360p G8
- 2x intel Xeon E5-2667
- 128GB memory
- 2x 146GB SAS disk (VMware OS install)
- 8x total 2.5" drive bays
- 1x p420i controller with 1GB ram and BBU
- Internal SD card slot available not used
- 1x dual port 10GB
- 1x quad port 1GB
- 1x dual port 1GB
Storage hardware:
- SAN-ARR0 is an HP MSA P2000 with dual 1GB iscsi controllers (4 ports each controller)
- each controller port pair A/B shares the same VLAN -- 4 VLANS for this storage network
Primary Network to VMHosts
- Serviced by two 10GB switches (HP 5820X-24XG-SFP+)
- Each server has a single 10GB dual port cart (SPOF - single NIC)
Storage Network to VMHosts
- Serviced by two 1GB switches (HP V1910-24g)
- Servers have 1x 4 port GNIC and 1x 2 port GNIC
- Two links from each switch go to one port on each card
- Each link is a separate VLAN
- MPIO round robin
Services currently hosted on the operations platform
- Active Directory/DNS (2008R2)- 2 servers (one on each host)
- DHCP - 1 server
- Exchange 2010 Standard - 1 server
- SharePoint 2010 Foundation - 1 server
- Windows File server (2008R2) 1.5TB data - 1 server
- SQL Server 2008 R2 (SharePoint,VMware) - 1 server
- dozen or so other low IO VMs for business applications, mostly CentOS
I acknowledge this setup should not have been deployed inexperience coupled with an outside vendor pushing this solution is what drove this implementation.
Opportunity: The business has decided to move to Office 365 next year on the E3 plan. This allows us to move Exchange/SharePoint off of the on-premise infrastructure and shrink our storage needs. Given the recent discussions around SSD and the likely return of RAID5, I set out to examine how to remove risks and dependencies in the chain.
Q1 and Q2 Goals:
- Migrate all operations Windows servers (that are not being eliminated by Office 365) to Server 2012 R2 or maybe 2016 but don't think it will be ready in time.
- Migrate business to Office 365 (120 users)
- Eliminate P2000 hosted storage from operations environment Plan:
- Reinstall VMware on embedded SD card slot to regain two SAS bays
- Add second 10GB card to each server
- Install 8*SSD into RAID5 in each server (Currently looking at SDSSDXPS-480G-G25 and MZ7GE480HMHP-00003)
- Migrate data hosted on P2000 back to local storage
- If business determines file server requires reliability/redundancy setup second file server with DFSR