ZFS Based Storage for Medium VMWare Workload
-
Another question: what is the purpose for the clusters? Currently you have an inverted pyramid of doom, not the best design as you know. But this implies that there are no needs around high availability. In fact, it means that you are currently below "standard availability" and this should mean that dropped out of clusters to just go to stand alone servers would itself be an improvement. What is the reason for having clusters at all given that reliability hasn't been a factor thus far?
-
@scottalanmiller said:
Before I dive into it, what is the need around ZFS? It sounds like you are leading with the solution, rather than the goal, which will not lead us in the direction of a best answer. We should step back and think at the goal level and determine what it is that we want to accomplish. Maybe ZFS will be the answer, but what it if isn't? Leading with the answer and looking for the question isn't the best way to design a solution.
In a sense I am, only due to outside of the MSA and Windows based storage this is what I am most familiar with. Seeing as if we don't go with a vendor supported solution, this would require the minimal effort to support. Doesn't make it the right answer, just the one I am most comfortable with putting my name next too.
-
@donaldlandru said:
- Operations and Development must be on separate storage devices
Mostly makes sense. This heavily suggests that the local storage options will be best then as you lose the only real potential leverage for having external storage which was tiny bits of cost savings that might have arisen by having five servers share one storage unit. Without that, really hard to come up with a way to have external storage. It was essentially impossible even with five.
-
@donaldlandru said:
- Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)
What's the reason for this? Red drives are just as reliable, or meaningfully so, as any other drive type in certain scenarios. I'm not saying that Red is going to be right or make any sense, but as a requirement this doesn't match the concept of a business goal. This is another "solution looking for a problem." Red drives are perfectly viable for the most enterprise of applications, when they fit the bill.
Even for a SAM-SD, which by definition is all about being enterprise storage, WD Red are perfectly acceptable. The idea that consumer drives are risky is purely one tied to the use of already more risky parity arrays. The same factors that would make you classify WD Red as "non-business class" also qualifies RAID 6 in the same way. So it would rule both or neither out, depending on the application of this rule but not one or the other.
-
@scottalanmiller said:
@donaldlandru said:
We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines.
So a two node cluster and a three node cluster. This seems straightforward.... no external storage at all. The rule of thumb of external storage is that it should not be considered until you are above four nodes in a single cluster and even then, not normally until much larger. What is the purpose of having external storage at all?
This setup was implemented when I first started four years ago, we used a third-party consultant and they designed this at the solution for the operations cluster. There were initial plans to do something different for the development cluster, but due to cost of the SAN (which may or may not have been needed) it was then value-engineered by the people leading the project, and little regard to my input, as I was the new guy.
My initial plan was to build a four-node cluster with shared storage without the ops/dev silos. The ops (2node) cluster is licensed with VMWare Essentials Plus and the dev cluster is licensed with VMware essentials. I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.
VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access
-
@donaldlandru said:
- Must be expandable to accommodate future growth
Expandability often costs a ton today and delivers very little value "tomorrow." Is this truly an important business goal? It is very often cheaper to do the right thing for today and the immediate future and evaluate again in one, two or five years - whenever factor have changed and you are in a position to make a new decision. Planning for expansion introduces unnecessary risk to the project.
-
@donaldlandru said:
VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access
This tells us two things:
- VMware is the wrong platform for you almost certainly. You are paying a premium to get less than you would get for free elsewhere.
- There is no reason for a cluster or external storage as even the most minimal features of it are being skipped.
-
By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.
-
That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!
-
@donaldlandru said:
I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.
Better to be fast and cheap than to be slow, expensive and have to balance. Easier to throw "speed" at the problem than to do live balancing if that is all that you are getting out of it.
Maintenance should be trivial, what planned outages are you avoiding that warrant the heavier risk of unplanned ones?
-
@donaldlandru said:
Requirements for development storage
- 9+ Tib of usable storage
- Support a minimum of 1100 random iops (what our current system is peaking at)
If split between five nodes, that's a minimal number. My eight year old desktop has 100,000 IOPS! This is less than 250 IOPS per machine, you can often hit that with a small RAID 1 pair in each box! And 10TB is just 2TB per box. This isn't a big problem to tackle when you break it down. Actually pretty moderate needs.
-
@scottalanmiller said:
That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!
Essentials plus is to allow us to use VMotion on operations cluster, where is would likely be cheaper in the long-run to acquire MS Server datacenter licensing and building redundant services, this was the approved solution to move VM's back and forth for node maintenance / upgrades.
The ops layout is
2x AD DC (one hosts DHCP server)
1x SQL server for SharePoint
1x SharePoint foundation
1x Exchange server
1x File Server (hosts a bunch of other services because of no additional server licenses)
handful of other CentOS servers for monitoring, help desk, internal web serverThe ops cluster could likely be decommissioned and what little remaining services could be collocated on the dev environments if I could only convince the owners to go with Office 365
-
@donaldlandru said:
#1 a.k.a the safe option
HP StoreVirtual 4530 with 12 TB (7.2k) spindles in RAID6 -- this is our vendor recommendation. This is an HP renew quote with 3 years 5x9 support next-day on-site for ~$15,000http://www8.hp.com/us/en/products/disk-storage/product-detail.html?oid=6255484
Other than being able to blame a vendor for losing data or uptime rather than being on the hook yourself, what makes this safe? Looking at it architecturally, I would call it reckless to the business as it is an inverted pyramid of doom. The unit is nothing but a normal server on which everything rests. How do you handle it failing? How do you do maintenance if you can't do bring it down? And it is just RAID 6, which is fine, but no aspect of this makes it very safe.
Having a vendor to blame is nice, but the vendor is only responsible for the product, not the system architectural design. Outages caused by this would still be your throat, not HP's. It's not that it is a bad unit, I just don't see how it could be used appropriately in this kind of a setup.
-
@donaldlandru said:
The biggest concerns I have exist in both platforms (drives fail, controllers fail, data goes bad, etc) and have to be mitigated either way. That is what we have backups for -- in my opinion the HP gets me the following things:
This is where you really have to look carefully. You have this big risk (and cost) that you know this does not mitigate. But having local drives with stand alone servers would partially mitigate this and local drives with replication would mitigate this better than nearly any possible approach. So you appear to have options that are faster, cheaper and potentially easier that also solve the biggest problem.
-
@donaldlandru said:
24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs
That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.
-
Ok.. your feedback is actually showing something I have been afraid of, I have severe tunnel vision is servicing the current solution.
Doing a quick inventory as to why I am trying to do that:- We have the investment into this. Like another recent thread here discussed once an SMB gets heavily invested one way it is hard to switch. To be honest, I am not sure how I could convince them too at this point. This actually seems like an opportunity for a great learning experience
- Training of supporting resources -- I have a counterpart in our off-shore office that is just getting up to speed on how VMware works -- to be this will be even harder to change
- I have been using Vmware for 4 years at the office and at home, so I am comfortable with it. This reason should also make the list as to why I should change it.
One limiting factor I see right now is our current chassis are 1U with 2-4 drive bays which would hamper a local storage deployment.
Edit -- Stepping back and thinking, the lack of drive bays are not a valid limiting factor as I could easily add SAS and do DAS storage on these nodes.
-
@scottalanmiller said:
@donaldlandru said:
24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs
That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.
This was modeled after the way TrueNAS (commercial version of FreeNAS) quoted us.
-
@donaldlandru said:
@scottalanmiller said:
@donaldlandru said:
24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs
That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.
This was modeled after the way TrueNAS (commercial version of FreeNAS) quoted us.
The exact people I warn people against.
http://www.smbitjournal.com/2015/07/the-jurassic-park-effect/
The FreeNAS community should be avoided completely. The worst storage advice and misunderstandings of storage basics I've ever seen. FreeNAS, by its nature, collects storage misunderstandings and creates a community of the worst storage advice possible.
-
The FreeNAS community tends to do things like promote software RAID when it doesn't make sense and attempts to dupe people by using carefully crafted marketing phrases like "in order for FreeNAS to monitor the disks", leaving out critical advice like "that isn't something you want FreeNAS to be doing."
-
@donaldlandru said:
- We have the investment into this. Like another recent thread here discussed once an SMB gets heavily invested one way it is hard to switch. To be honest, I am not sure how I could convince them too at this point. This actually seems like an opportunity for a great learning experience
You have what investment into it now? Once you replace the storage that you have today, aren't you effectively starting over and really this is about stopping you from wasting a new investment rather than protecting a current one. Everything that you proposed is, I believe, a greater "reinvestment" than what I am proposing. So, if I'm understanding the concern here correctly, your HP and/or ZFS approach is actually the one that this concern would rule out, correct? Since it requires a much larger new investment.