Understanding Server 2012r2 Clustering
-
@scottalanmiller said:
IT does not change that rapidly, though. Good training twenty years ago would nearly completely prepare you for IT today. That's faster change than civil engineering has, arches are arches, roads are roads. The Romans had some info that even today we lack. But it doesn't change at a pace that causes real problems. Nearly everything important that I learned was in the 1990s. There is an important element of keeping up to date, but the fundamentals don't change, just the products, prices and some nuances. It is rare that something new comes along that really changes things.
Like SSDs and how they've turned RAID 5 on it's head from previously conventional wisdom.
-
@Dashrender said:
@scottalanmiller said:
IT does not change that rapidly, though. Good training twenty years ago would nearly completely prepare you for IT today. That's faster change than civil engineering has, arches are arches, roads are roads. The Romans had some info that even today we lack. But it doesn't change at a pace that causes real problems. Nearly everything important that I learned was in the 1990s. There is an important element of keeping up to date, but the fundamentals don't change, just the products, prices and some nuances. It is rare that something new comes along that really changes things.
Like SSDs and how they've turned RAID 5 on it's head from previously conventional wisdom.
That's not a change, though. It's an application of the same knowledge that was known in 1998. The fundamentals are not changed at all. It is condensed, quick rules that have to be updated based on market prices, sizes, supply changes, failure rates, etc. But understanding RAID basics (including URE rates which was the only bit rarely discussed in the 90s) is all that was ever needed. If RAID was known, SSD hasn't changed anything, it's just more of the same foundational data being applied the same way.
The attempt to learn IT by rote, as a set of rules, makes it seem to change rapidly and require constant updating. But learning the foundations provides rules and concepts that are essentially timeless and provide the foundation from which the rote rules are derived.
Back when Microsoft published their big RAID guidance that was such a landmark, they didn't say "use RAID 5", they said "here is why RAID 5 works now, because of these factors" and expected everyone to understand the factors involved. Because of that, the "2009 change" for hard drives was not a change, but a continuation of the previous knowledge, and likewise the move back to RAID 5 on SSDs is, also, a continuation of the same guidance. They are not disruptions, just all applications of the same foundational rules.
-
Aww oK that makes sense.
-
Likewise, triple parity (aka RAID 7 or RAID 5.3) while new in ~2005 with Sun introduced it with ZFS, while a new product, it really did not change anything. The rules that provided information around RAID 5 and how it changed to RAID 6 continued on to RAID 7. The fundamental formulas needed a new entry for RAID 7, but it was all stuff that was projected in the 90s, we just didn't have an implementation yet. And still ten years later we only have the one. Had we talked triple parity in 1995, we could have projected the reliability, capacity and speed impacts accurately, just would have had to waited to see it in action. We have RAID 8 or RAID 5.4 projected in the same way now. It will come someday, we suspected, and will operated almost exactly along a performance and reliability curve that matches others in the R5 family.
-
A great example is the UNIX interface. The API for UNIX hasn't changed in like 30+ years. The BASH shell hasn't changed in 20+. Other than little updates, sitting down at a Linux, Solaris, AIX, HP-UX or BSD system today is nearly indistinguishable fro doing so in 1995. I started on UNIX in 1994 and other than switching my connection command from telnet to SSH (something that is important to security but doesn't change how we work at all) basically nothing has changed. I use all the same commands, tools, editors, etc. as I did all those years ago. Things are faster and more stable now. But the basics are really similar.
-
@Dashrender said:
So because IT changes so rapidly that any testing isn't really useful, not to mention that, as you've said, who would pay for it then simply release the information for free... Where are newbies and SMB IT personal suppose to glean all this knowledge to make the correctly informed decisions?
For me, it often means you don't make any decision. For example, when I was looking at a SAN, even though I didn't understand the technology and the risks that well (being a newbie), there was enough fear and doubt planted in my mind that I couldn't sign off on buying one because I didn't have 100% confidence that it was the right decision. So I saw doing nothing (and carrying on the way we've always worked) as a better decision than doing something (spending $50k on new technology). And I'm relieved. Especially now that I find out that SQL Server and Exchange are really bad fits for VMware HA, something I had no idea about at the time due to my lack of knowledge and experience.
I moved to virtualisation only when it was completely proven technology and blatantly obvious that it was best practice. So as an SMB, we're not on the bleeding edge of technology. I leave that to the bigger companies, and their experiences then trickle down to me and I implement technology that is new to me, but has been in the blue-chip world for several years.
Having said that, I will take some risks and implement some new tech, but only if the cost is relatively low. So that if I crash and burn I can just scrap it without raising any eyebrows. Life would be boring otherwise. But generally, it's all about choosing mature and proven solutions rather than the latest new thing. Let the big boys with deep pockets take the risks and iron out the bugs.
Another example is that I moved to on-premise Exchange 2010 and Office 2010 licences a few years ago because I felt that Office 365 was too new and I lacked confidence. It is only now that it is completely proven that I feel confident to move to it. I have no regrets about that decision either.
-
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
I would certainly like to agree with this, but @scottalanmiller's comments do have merit as well.
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
I believe that this is true only so far as they hope that they are used well as long as it does not impact profits. I just don't believe you have an unemotional stance on this. You work for a manufacturer and take this very personally, I think.
Maybe you work in a unique industry where manufactures have no financial interests guiding them and they only exists to service their customers and papers are produced to educate, not to make money. But I find that unlikely. White papers might be there to improve success rates that lead to better marketing. But not just for the sake of educating everyone.
And whether your company is like this or not, it doesn't reflect on normal companies and certainly not on IT companies. This is not how the world works. Believing that all business people are out to do good in the world at their own expense is just not how things are. This is not how the world works and in the US it isn't even legal (public companies are required to work for profits.)
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
You should look on SW and thread after thread of people complaining about the worthless, pure marketing with no value whatsoever. Far worse than I'm saying here. Many put decent material in the marketing form. But there are many discussions about how the term "white paper" literally is just a term for "marketing brochure."
-
@scottalanmiller said:
I believe that this is true only so far as they hope that they are used well as long as it does not impact profits. I just don't believe you have an unemotional stance on this. You work for a manufacturer and take this very personally, I think.
No, it's never personal. It's probably more from my Uni studies of Microeconomics and theories of firm behavior.
Maybe you work in a unique industry where manufactures have no financial interests guiding them and they only exists to service their customers and papers are produced to educate, not to make money.
Again, you're treating the world as binary. You've gone from my statement that "firms care about how their tools are used" to "firms only care about how their tools are used." That's not what I said.
-
Anyway, back to my original question and why you shouldn't install Exchange or SQL on a SAN. If you don't use application level clustering and you use local storage then there is risk of data loss as a result of hardware failure. That risk is the same as the risk of data loss with HA and a SAN, right? Unless you would always restore from backup after a crash, which I wouldn't. I would allow SQL and Exchange to recover from the crash and hope no data loss has occurred. In this situation, your recovery time with HA is significantly quicker than with local storage. Which is the point of HA, isn't it?
What am I missing?
-
@Carnival-Boy said:
Anyway, back to my original question and why you shouldn't install Exchange or SQL on a SAN. If you don't use application level clustering and you use local storage then there is risk of data loss as a result of hardware failure. That risk is the same as the risk of data loss with HA and a SAN, right?
That's correct. In both cases you lack protection against crash cases.
Although it should be pointed out that using HA could actually automate a disaster and keep things running even though there is dataloss. Possibly unknown dataloss. SO you would have to decided for your organization if silently losing data is better or worse than extra downtime while people decide what to do.
In finance, you go down rather than lose financial data silently. When an outage like this happens, you stay down until humans decide it is okay to continue. In another environment, you have to weight the benefits and risk.
-
@Carnival-Boy said:
What am I missing?
That the cost of a SAN and HA is extremely high. If they were free, they might make a lot of sense. But when you have to buy a SAN, introduce more points of failure altogether (an inverted pyramid is more likely to experience a failure than a stand alone server) and then have to buy HA you are lowering your overall HA. So you are not getting the "HA point" since you are moving away from HA.
That this entire thing doesn't do the "point" is one thing. That it doubles the corruption points is an additional thing.
HA "is" about uptime. But if dataloss is acceptable you can do all kinds of things to get HA that are cheaper and easier. Most people assume the point of HA is to keep things running while not losing data.
Whether dataloss is acceptable or not, adding the SAN and HA product for databases is an inverted pyramid and is not a path towards high availability but a path away from it.
That it confuses people into thinking it is a DAG replacement is another layer of issues above and beyond the core ones.
-
@Carnival-Boy said:
I would allow SQL and Exchange to recover from the crash and hope no data loss has occurred. In this situation, your recovery time with HA is significantly quicker than with local storage. Which is the point of HA, isn't it?
But you are effectively inducing a crash. You have three points of potential failure at least (each server and the SAN.) Each one is just as likely to fail as the next. In the real world where people are building at this scale, SANs are actually more likely to fail than servers but you can spend big bucks to fix this problem - but why spend lots of money to not be as good as DAG? Just setting money on fire, never a good business decision.
So we aren't doing HA, we are below SA (standard availability.) Less available than a normal server. That there are two points of data corruption (if the SAN fails OR if the host fails) instead of just one is an additional risk beyond the normal problems of an inverted pyramid.
So if the point of HA is uptime, then this architecture does not make sense. If the point is the be financial wise, this is the worst option. If the point is high availability without dataloss, it just gets worse and worse.
-
Another thing to consider, what else are you protecting with the SAN and HA? If you are only protecting ONE SQL server and ONE Exchange server, the costs of purchasing an additional SQL license and an additional Exchange license so you can run software HA instead is much less expensive than the SAN will be. Granted you'll need twice the amount of disk, but that is also likely much less expensive.
This whole situation is definitely what Scott was talking about earlier in 'doing IT.' While I'm annoyed at my own ignorance and lack of thought process, reading Scott's posts have definitely added to my understand and helped me to change my thinking process. Not saying I always see it his way, but I try to be more encompassing in my thought process - but even my current phone project is showing that I'm still lacking.
-
Just in case it wasn't obvious what the "suggested" options were, because people often say to me "Okay, so the SAN is wrong, but what is right?"
In this case there are two generally good options, depending on business needs.
If HA is really needed, if uptime and data loss are bad and it is worth a lot of money to ensure that this does not happen, then you should trust your vendor and do things their way. While I'm down on white papers and whatnot, a good vendor is going to provide a way for them to make money and for you to get what you want. Microsoft is excellent at this. They make a product specifically designed to make Exchange as reliable as possible that they provide themselves. They will support you if you do other things, but they only provide one solution here that is their own... DAG. If HA, fault tolerance and data protection are important DAG is the answer.
For the vast majority of SMB (and even many enterprises) DAG is too much money to protect too little. That's fine and assumed. That almost anyone should be using DAG is not the suggestion, only that spending more than DAG to get less is not the answer. For most people, the answer is as simple as "just use a stand alone server." A stand alone server is a fraction of the cost of the SAN approach both in hardware and in software (does not require HA licensing for VMware.) And it has 67% fewer points of failure (one instead of three.) And it has no chained dependencies. And it is much simpler to manage making human failure less likely. And it has fewer points of potential data corruption (one instead of two.) These things add up. Save money, better performance, more reliable in both uptime and in data protection. It's a pretty big win unless you compare it to DAG which costs more but has better uptime and data protection.
Only a DAG cluster and a stand alone server have clear "winning" use cases.
-
@Dashrender said:
While I'm annoyed at my own ignorance and lack of thought process, reading Scott's posts have definitely added to my understand and helped me to change my thinking process. Not saying I always see it his way, but I try to be more encompassing in my thought process - but even my current phone project is showing that I'm still lacking.
I'm lacking too, but I'm not down about it. I just like getting free advice from Scott. Companies pay thousands for the kind of advice that we get here for free.
I have only have around 100 users, so neither a SAN/HA or DAGs/clustering were ever a real consideration for me. But that user count could be going up to 200 soon through mergers and acquisitions (or a new job ), so it's something I'm trying to understand a bit more about.
It seems to me that we should avoid treating SAN/HA as an alternative to DAG/application clustering as really they're different solutions to different problems. SAN/HA addresses minimising downtime whilst DAG/clustering also addresses data loss. So you need to identify your problem and then select your solution, rather than picking your solution first.
-
I definitely agree with Scott and have learned a lot from him through threads here and on SW!
I am a firm believer in the right tools for the right job. The SAN vs Local Storage (DAG, replication, etc, etc) will always be a decision to be made. And I am coming to agree more and more with @scottalanmiller that Local Storage + Replication + Good Backups will almost always be the better answer for an SMB.
My take-away from my first and only SAN deployment (as smoothly as it went, it could have been much, much, much worse) is that you must define what you are looking for clearly (both to yourself, your team, and to your vendor of choice). You must also do your own research about the products the vendor recommends. We were recommended a number of solutions that were FT only and not true HA. Once your vendor has made you a quote for "Product X" check and make sure the product's company defines things the same way that you do. If you think "Product Y" may be a better fit for what you are wanting, then ask your vendor about it.
Never be afraid to second guess yourself, your vendor, or your peers in a respectful way.
-
@Carnival-Boy said:
It seems to me that we should avoid treating SAN/HA as an alternative to DAG/application clustering as really they're different solutions to different problems. SAN/HA addresses minimising downtime whilst DAG/clustering also addresses data loss. So you need to identify your problem and then select your solution, rather than picking your solution first.
Let me add at that scale. HA addresses minimising downtime, but SAN does not. SAN actually exacerbates downtimes all other things being equal. Many people, especially sales people, conflate SAN with "buying super expensive equipment* which is not the same thing. You can but $60K SAN that are crazy reliable. This is very true. But for even better uptime than two hosts connected to a $60K SAN, you could just buy a $60K server that has fewer points of failure but matches or beats the expensive SAN in all the ways that it is reliable. That's part of the sales trick, making SANs look more reliable than servers by comparing entry level servers to high end SANs. But at the same levels, they are equally reliable (or slightly weighted to the advantage of servers due to massively larger volumes) so anything that is done to make a SAN reliable can be done to make a single server more reliable.
So while HA is all about uptime and a SAN may or may not be a part of that strategy, the real value to a SAN is in scalability. SANs are more flexibly scalable than any other solution. Scalable means in number of supported physical hosts. So if you need to scale to very large host counts, SAN is the obvious choice. That's SAN's one strong suit - and it is a big one. But when you don't need that one thing, SAN lacks its major "pro" and just comes with the "cons."