Understanding Server 2012r2 Clustering
-
@scottalanmiller said:
... it doesn't change at a pace that causes real problems....
Hmmmmmm maybe for some definitions of "real problems"
Also, for fun:
-
It's amazing just how much 1995 was like today. Yeah, it was all old and slow. But from the Windows 95 interface, the Windows NT platform, Linux, UNIX, USB, HTML, PHP, Google (in early phases), Amazon, SSL, Java, JavaScript, Perl, VoIP, PPro (which later became the Intel Core), IE, Opera, Wikis, Ruby, Yahoo, eBay, MSNBC, etc.
-
@scottalanmiller said:
IT does not change that rapidly, though. Good training twenty years ago would nearly completely prepare you for IT today. That's faster change than civil engineering has, arches are arches, roads are roads. The Romans had some info that even today we lack. But it doesn't change at a pace that causes real problems. Nearly everything important that I learned was in the 1990s. There is an important element of keeping up to date, but the fundamentals don't change, just the products, prices and some nuances. It is rare that something new comes along that really changes things.
Like SSDs and how they've turned RAID 5 on it's head from previously conventional wisdom.
-
@Dashrender said:
@scottalanmiller said:
IT does not change that rapidly, though. Good training twenty years ago would nearly completely prepare you for IT today. That's faster change than civil engineering has, arches are arches, roads are roads. The Romans had some info that even today we lack. But it doesn't change at a pace that causes real problems. Nearly everything important that I learned was in the 1990s. There is an important element of keeping up to date, but the fundamentals don't change, just the products, prices and some nuances. It is rare that something new comes along that really changes things.
Like SSDs and how they've turned RAID 5 on it's head from previously conventional wisdom.
That's not a change, though. It's an application of the same knowledge that was known in 1998. The fundamentals are not changed at all. It is condensed, quick rules that have to be updated based on market prices, sizes, supply changes, failure rates, etc. But understanding RAID basics (including URE rates which was the only bit rarely discussed in the 90s) is all that was ever needed. If RAID was known, SSD hasn't changed anything, it's just more of the same foundational data being applied the same way.
The attempt to learn IT by rote, as a set of rules, makes it seem to change rapidly and require constant updating. But learning the foundations provides rules and concepts that are essentially timeless and provide the foundation from which the rote rules are derived.
Back when Microsoft published their big RAID guidance that was such a landmark, they didn't say "use RAID 5", they said "here is why RAID 5 works now, because of these factors" and expected everyone to understand the factors involved. Because of that, the "2009 change" for hard drives was not a change, but a continuation of the previous knowledge, and likewise the move back to RAID 5 on SSDs is, also, a continuation of the same guidance. They are not disruptions, just all applications of the same foundational rules.
-
Aww oK that makes sense.
-
Likewise, triple parity (aka RAID 7 or RAID 5.3) while new in ~2005 with Sun introduced it with ZFS, while a new product, it really did not change anything. The rules that provided information around RAID 5 and how it changed to RAID 6 continued on to RAID 7. The fundamental formulas needed a new entry for RAID 7, but it was all stuff that was projected in the 90s, we just didn't have an implementation yet. And still ten years later we only have the one. Had we talked triple parity in 1995, we could have projected the reliability, capacity and speed impacts accurately, just would have had to waited to see it in action. We have RAID 8 or RAID 5.4 projected in the same way now. It will come someday, we suspected, and will operated almost exactly along a performance and reliability curve that matches others in the R5 family.
-
A great example is the UNIX interface. The API for UNIX hasn't changed in like 30+ years. The BASH shell hasn't changed in 20+. Other than little updates, sitting down at a Linux, Solaris, AIX, HP-UX or BSD system today is nearly indistinguishable fro doing so in 1995. I started on UNIX in 1994 and other than switching my connection command from telnet to SSH (something that is important to security but doesn't change how we work at all) basically nothing has changed. I use all the same commands, tools, editors, etc. as I did all those years ago. Things are faster and more stable now. But the basics are really similar.
-
@Dashrender said:
So because IT changes so rapidly that any testing isn't really useful, not to mention that, as you've said, who would pay for it then simply release the information for free... Where are newbies and SMB IT personal suppose to glean all this knowledge to make the correctly informed decisions?
For me, it often means you don't make any decision. For example, when I was looking at a SAN, even though I didn't understand the technology and the risks that well (being a newbie), there was enough fear and doubt planted in my mind that I couldn't sign off on buying one because I didn't have 100% confidence that it was the right decision. So I saw doing nothing (and carrying on the way we've always worked) as a better decision than doing something (spending $50k on new technology). And I'm relieved. Especially now that I find out that SQL Server and Exchange are really bad fits for VMware HA, something I had no idea about at the time due to my lack of knowledge and experience.
I moved to virtualisation only when it was completely proven technology and blatantly obvious that it was best practice. So as an SMB, we're not on the bleeding edge of technology. I leave that to the bigger companies, and their experiences then trickle down to me and I implement technology that is new to me, but has been in the blue-chip world for several years.
Having said that, I will take some risks and implement some new tech, but only if the cost is relatively low. So that if I crash and burn I can just scrap it without raising any eyebrows. Life would be boring otherwise. But generally, it's all about choosing mature and proven solutions rather than the latest new thing. Let the big boys with deep pockets take the risks and iron out the bugs.
Another example is that I moved to on-premise Exchange 2010 and Office 2010 licences a few years ago because I felt that Office 365 was too new and I lacked confidence. It is only now that it is completely proven that I feel confident to move to it. I have no regrets about that decision either.
-
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
I would certainly like to agree with this, but @scottalanmiller's comments do have merit as well.
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
I believe that this is true only so far as they hope that they are used well as long as it does not impact profits. I just don't believe you have an unemotional stance on this. You work for a manufacturer and take this very personally, I think.
Maybe you work in a unique industry where manufactures have no financial interests guiding them and they only exists to service their customers and papers are produced to educate, not to make money. But I find that unlikely. White papers might be there to improve success rates that lead to better marketing. But not just for the sake of educating everyone.
And whether your company is like this or not, it doesn't reflect on normal companies and certainly not on IT companies. This is not how the world works. Believing that all business people are out to do good in the world at their own expense is just not how things are. This is not how the world works and in the US it isn't even legal (public companies are required to work for profits.)
-
@Carnival-Boy said:
@scottalanmiller said:
It is not the business of any of these entities except for internal IT to care or get involved in a statement of this type. VMware and Microsoft are vendors. They produce tools. They support those tools. But it is up to IT to implement and use those tools well and in the right way for their business.
Manufacturers absolutely care about how the tools they've developed are actually used. White papers are not just for marketing purposes.
You should look on SW and thread after thread of people complaining about the worthless, pure marketing with no value whatsoever. Far worse than I'm saying here. Many put decent material in the marketing form. But there are many discussions about how the term "white paper" literally is just a term for "marketing brochure."
-
@scottalanmiller said:
I believe that this is true only so far as they hope that they are used well as long as it does not impact profits. I just don't believe you have an unemotional stance on this. You work for a manufacturer and take this very personally, I think.
No, it's never personal. It's probably more from my Uni studies of Microeconomics and theories of firm behavior.
Maybe you work in a unique industry where manufactures have no financial interests guiding them and they only exists to service their customers and papers are produced to educate, not to make money.
Again, you're treating the world as binary. You've gone from my statement that "firms care about how their tools are used" to "firms only care about how their tools are used." That's not what I said.
-
Anyway, back to my original question and why you shouldn't install Exchange or SQL on a SAN. If you don't use application level clustering and you use local storage then there is risk of data loss as a result of hardware failure. That risk is the same as the risk of data loss with HA and a SAN, right? Unless you would always restore from backup after a crash, which I wouldn't. I would allow SQL and Exchange to recover from the crash and hope no data loss has occurred. In this situation, your recovery time with HA is significantly quicker than with local storage. Which is the point of HA, isn't it?
What am I missing?
-
@Carnival-Boy said:
Anyway, back to my original question and why you shouldn't install Exchange or SQL on a SAN. If you don't use application level clustering and you use local storage then there is risk of data loss as a result of hardware failure. That risk is the same as the risk of data loss with HA and a SAN, right?
That's correct. In both cases you lack protection against crash cases.
Although it should be pointed out that using HA could actually automate a disaster and keep things running even though there is dataloss. Possibly unknown dataloss. SO you would have to decided for your organization if silently losing data is better or worse than extra downtime while people decide what to do.
In finance, you go down rather than lose financial data silently. When an outage like this happens, you stay down until humans decide it is okay to continue. In another environment, you have to weight the benefits and risk.
-
@Carnival-Boy said:
What am I missing?
That the cost of a SAN and HA is extremely high. If they were free, they might make a lot of sense. But when you have to buy a SAN, introduce more points of failure altogether (an inverted pyramid is more likely to experience a failure than a stand alone server) and then have to buy HA you are lowering your overall HA. So you are not getting the "HA point" since you are moving away from HA.
That this entire thing doesn't do the "point" is one thing. That it doubles the corruption points is an additional thing.
HA "is" about uptime. But if dataloss is acceptable you can do all kinds of things to get HA that are cheaper and easier. Most people assume the point of HA is to keep things running while not losing data.
Whether dataloss is acceptable or not, adding the SAN and HA product for databases is an inverted pyramid and is not a path towards high availability but a path away from it.
That it confuses people into thinking it is a DAG replacement is another layer of issues above and beyond the core ones.
-
@Carnival-Boy said:
I would allow SQL and Exchange to recover from the crash and hope no data loss has occurred. In this situation, your recovery time with HA is significantly quicker than with local storage. Which is the point of HA, isn't it?
But you are effectively inducing a crash. You have three points of potential failure at least (each server and the SAN.) Each one is just as likely to fail as the next. In the real world where people are building at this scale, SANs are actually more likely to fail than servers but you can spend big bucks to fix this problem - but why spend lots of money to not be as good as DAG? Just setting money on fire, never a good business decision.
So we aren't doing HA, we are below SA (standard availability.) Less available than a normal server. That there are two points of data corruption (if the SAN fails OR if the host fails) instead of just one is an additional risk beyond the normal problems of an inverted pyramid.
So if the point of HA is uptime, then this architecture does not make sense. If the point is the be financial wise, this is the worst option. If the point is high availability without dataloss, it just gets worse and worse.
-
Another thing to consider, what else are you protecting with the SAN and HA? If you are only protecting ONE SQL server and ONE Exchange server, the costs of purchasing an additional SQL license and an additional Exchange license so you can run software HA instead is much less expensive than the SAN will be. Granted you'll need twice the amount of disk, but that is also likely much less expensive.
This whole situation is definitely what Scott was talking about earlier in 'doing IT.' While I'm annoyed at my own ignorance and lack of thought process, reading Scott's posts have definitely added to my understand and helped me to change my thinking process. Not saying I always see it his way, but I try to be more encompassing in my thought process - but even my current phone project is showing that I'm still lacking.
-
Just in case it wasn't obvious what the "suggested" options were, because people often say to me "Okay, so the SAN is wrong, but what is right?"
In this case there are two generally good options, depending on business needs.
If HA is really needed, if uptime and data loss are bad and it is worth a lot of money to ensure that this does not happen, then you should trust your vendor and do things their way. While I'm down on white papers and whatnot, a good vendor is going to provide a way for them to make money and for you to get what you want. Microsoft is excellent at this. They make a product specifically designed to make Exchange as reliable as possible that they provide themselves. They will support you if you do other things, but they only provide one solution here that is their own... DAG. If HA, fault tolerance and data protection are important DAG is the answer.
For the vast majority of SMB (and even many enterprises) DAG is too much money to protect too little. That's fine and assumed. That almost anyone should be using DAG is not the suggestion, only that spending more than DAG to get less is not the answer. For most people, the answer is as simple as "just use a stand alone server." A stand alone server is a fraction of the cost of the SAN approach both in hardware and in software (does not require HA licensing for VMware.) And it has 67% fewer points of failure (one instead of three.) And it has no chained dependencies. And it is much simpler to manage making human failure less likely. And it has fewer points of potential data corruption (one instead of two.) These things add up. Save money, better performance, more reliable in both uptime and in data protection. It's a pretty big win unless you compare it to DAG which costs more but has better uptime and data protection.
Only a DAG cluster and a stand alone server have clear "winning" use cases.
-
@Dashrender said:
While I'm annoyed at my own ignorance and lack of thought process, reading Scott's posts have definitely added to my understand and helped me to change my thinking process. Not saying I always see it his way, but I try to be more encompassing in my thought process - but even my current phone project is showing that I'm still lacking.
I'm lacking too, but I'm not down about it. I just like getting free advice from Scott. Companies pay thousands for the kind of advice that we get here for free.
I have only have around 100 users, so neither a SAN/HA or DAGs/clustering were ever a real consideration for me. But that user count could be going up to 200 soon through mergers and acquisitions (or a new job ), so it's something I'm trying to understand a bit more about.
It seems to me that we should avoid treating SAN/HA as an alternative to DAG/application clustering as really they're different solutions to different problems. SAN/HA addresses minimising downtime whilst DAG/clustering also addresses data loss. So you need to identify your problem and then select your solution, rather than picking your solution first.