What Are You Doing Right Now
-
@coliver said:
@scottalanmiller said:
I need a place to collect all of these horror stories of the inverted pyramid with a SAN killing people. The number of times that firmware takes out the magic "dual controller" SAN is unbelievable...
http://community.spiceworks.com/topic/1028886-dell-md3200i-firmware-issue
Haha... when we deploy our VNXe-3300 dual-controller wonder-box (we were told it would do everything up-to but not including slice and toast bread) at my last job we were testing it out and the first thing I did was update it to the newest firmware... the firmware was applied correctly to one controller but not the other and without both the entire unit refused to work. Thankfully we weren't in production or that would have been a fun 24 hours... To EMC's credit we received the replacement controller at 6AM the next morning.
6am was how many hours later for a "never fail" system that failed on the most basic task?
-
@dafyre said:
In my mind, a single storage device, no matter how many controllers, power supplies, or motherboards it has in the same chasis is still a NAS.
If somebody tries to sell me a SAN, it will be at least 2 devices mirrored using the network.
No, that is completely the wrong use of NAS and SAN. Neither mean anything related to this at all. SAN means block storage, NAS means file storage. Neither implies anything beyond that. A USB external SATA drive is a SAN. And there are NAS that are huge clusters. SAN never implies "good" or "better than", there is nothing like that in the name.
SAN can be less than a NAS because a NAS requires a computer and a SAN only requires a NIC, but it is only that a SAN can be lower than a NAS because a NAS can't go that low. But if you are feeling that the term SAN implies some kind of reliability, you have been set up to be very mislead.
-
@scottalanmiller said:
@coliver said:
@scottalanmiller said:
I need a place to collect all of these horror stories of the inverted pyramid with a SAN killing people. The number of times that firmware takes out the magic "dual controller" SAN is unbelievable...
http://community.spiceworks.com/topic/1028886-dell-md3200i-firmware-issue
Haha... when we deploy our VNXe-3300 dual-controller wonder-box (we were told it would do everything up-to but not including slice and toast bread) at my last job we were testing it out and the first thing I did was update it to the newest firmware... the firmware was applied correctly to one controller but not the other and without both the entire unit refused to work. Thankfully we weren't in production or that would have been a fun 24 hours... To EMC's credit we received the replacement controller at 6AM the next morning.
6am was how many hours later for a "never fail" system that failed on the most basic task?
~18 hours. It took them ~3 hours to determine the controller was bricked (even though I told them it was when we started the support call). My manager at the time didn't really understand what was going on... so I caught all kinds of hell for breaking it.
-
@coliver said:
@scottalanmiller said:
@coliver said:
@scottalanmiller said:
I need a place to collect all of these horror stories of the inverted pyramid with a SAN killing people. The number of times that firmware takes out the magic "dual controller" SAN is unbelievable...
http://community.spiceworks.com/topic/1028886-dell-md3200i-firmware-issue
Haha... when we deploy our VNXe-3300 dual-controller wonder-box (we were told it would do everything up-to but not including slice and toast bread) at my last job we were testing it out and the first thing I did was update it to the newest firmware... the firmware was applied correctly to one controller but not the other and without both the entire unit refused to work. Thankfully we weren't in production or that would have been a fun 24 hours... To EMC's credit we received the replacement controller at 6AM the next morning.
6am was how many hours later for a "never fail" system that failed on the most basic task?
~18 hours. It took them ~3 hours to determine the controller was bricked (even though I told them it was when we started the support call). My manager at the time didn't really understand what was going on... so I caught all kinds of hell for breaking it.
Eighteen hours to replace a failed SAN? That's crazy. HP servers are four to six hours for parts replacement SLA!! Same with Dell. This is why I fear SANs, is there any SAN vendor that treats their SAN as even a fraction as critical as a server vendors treats their servers?
If this is the kind of care that goes into support, how much care goes into the engineering?
-
Move this conversation to a Thread! Good information there.
-
Getting you a replacement at 6am wasn't really to their credit, it was really to their shame. HP or Dell would have been embarrassed by that kind of SLA.
-
@Minion-Queen said:
Move this conversation to a Thread! Good information there.
Wish there was a good thread splitting option
-
@scottalanmiller I realize and understand that my definition of NAS and SAN do not match what the "official" definitions are.
That is why if I am speaking with a vendor about a NAS or a SAN, I describe for them specifically what I want (ie: block storage vs file storage, live replication,etc, etc).
But I have found that if I do not specify things according to my definitions , I wind up with a single unit SAN that is a single point of failure.
-
@dafyre said:
But I have found that if I do not specify things according to my definitions , I wind up with a single unit SAN that is a single point of failure.
You should never be a position of specifying things in that way at all. You should be working with model numbers and specs.
Also, every vendor I know sells SAN that isn't like you describe, so not sure how using it in that way protects you. If you go to Exablox, their NAS is always a highly reliable cluster. If you go to EMC like @coliver did or to Dell, they will always sell you single SANs without a cluster (or nearly always.) What vendor is using the terms in the way that you describe?
-
@scottalanmiller said:
Also, every vendor I know sells SAN that isn't like you describe, so not sure how using it in that way protects you. If you go to Exablox, their NAS is always a highly reliable cluster. If you go to EMC like @coliver did or to Dell, they will always sell you single SANs without a cluster (or nearly always.) What vendor is using the terms in the way that you describe?
In my initial research about SANs (before I met you, lol)... Our Sales rep (perhaps this was the problem, eh?) was recommending a single SAN device. My team and I quickly dismissed that idea as we were wanting to have our data in two locations on our campus. We told him this and he came back with several products. We decided to go with a 2-device HP / LeftHand SAN (storage cluster) since we were primarily an HP shop at the time.
Edit: Pardon my initial response to your question... My brain is not working this morning.
To answer your question: When I first did my research, the vendors we talked to offered a single fault tolerant storage device as the default configuration and they called it a SAN. Which yeah, that matches the book definition, but still leaves us with the IPOD.
I am thinking in my next storage project that requires HA storage that I should ask for a storage cluster instead of a SAN...
-
Plurasight and making some How To's for new DSL modems being replaced at some of our locations (yes unplug and plug in) and some How To's on our new VOIP phones (currently being piloted at one location)
-
@dafyre said:
I am thinking in my next storage project that requires HA storage that I should ask for a storage cluster instead of a SAN...
You should not ask for anything really, but instead you tell the vendor what product you want and how you want it configured. Telling them a cluster helps, but it still leaves you with them interpreting what you say - which should never be the case.
Imagine going to a vendor and saying "I need a server." They will sell you one, of course, but things like the OS, configuration, support, applications, etc. are all unique and only you really know what you need. The server you get likely won't meet your needs in any significant way. Not because they don't want to meet your needs, just because there is little way for them to know enough about your needs.
A cluster is better than a single point of failure, all other things being equal. But what if that cluster can't failover? What if the cluster has some fragility that makes it less safe than the SPOF?
What you need, it sounds like, is HA, however that is achieved. The lesson here, I think, it thinking (and speaking) at a goal level. Specifying a SAN or a cluster or whatever is a technical underpinning that is probably good for you to understand, but it is not what you care about. You don't care how HA is achieved, you care that the result is HA. If we stick at the goal level, you can hold all the parties accountable to the goal and you can ensure that you are on the same page. Specify a SAN and you might mean one thing, they might assume another and the words might mean something different altogether.
-
So the structure I would use here is....
You make specifications to the architect at the goal level.
Architect designs something to meet the goal.
Architect tells vendor the parts to deliver.
Vendor verifies and recommends changes based on unknown factors.
-
@scottalanmiller said:
So the structure I would use here is....
You make specifications to the architect at the goal level.
Architect designs something to meet the goal.
Architect tells vendor the parts to deliver.
Vendor verifies and recommends changes based on unknown factors.
I definitely agree with your last two posts. For us, our architect happened to be a member of our vendor's storage team. For us, we didn't know what all was out there, and they had the resources to tell us what products were out there that met the goal we specified.
I really like the comment about being goal oriented. That probably would not have changed our decision much, but would have definitely shortened our research times.
-
@dafyre said:
I definitely agree with your last two posts. For us, our architect happened to be a member of our vendor's storage team. For us, we didn't know what all was out there, and they had the resources to tell us what products were out there that met the goal we specified.
Vendors can't be architects. Their team is exclusively sales people, always. If you are paid via sales, you are a sales person. No vendor has architectural resources for you. They have architectural resources for them.
-
@scottalanmiller I can see your point, after all their team is designed to make the most money for their company and all of that. Maybe we were just lucky and things worked out good. 8-)
It was a good learning experience for all of us involved on our side of the team. And fortunatley for us, we had an excellent experience with the product and support, so we always considered that a win.
Edit: Maybe not the cheapest win. But still a win.
-
@dafyre said:
@scottalanmiller I can see your point, after all their team is designed to make the most money for their company and all of that. Maybe we were just lucky and things worked out good. 8-)
How have you determined that they worked out well? This is one of the questions that I always ask teams doing a post mortem like this. What was the criteria for success? Was the process fast, efficient and did it result in a cost effective solution that met business needs and was the best one to do so? In many cases what we find is that people define "cost effective" as " within budget" which are very different things and it covers up overspending and "meets business needs" as ignoring the key criteria of "best option".
Are you reasonably confident that there wasn't a solution at half the cost that was faster and safer?
-
@dafyre said:
Edit: Maybe not the cheapest win. But still a win.
From a business perspective (goal level), how is that a win? Sounds like a loss, right? Isn't the goal to get the best solution for the cost?
-
That depends, of course, if the cost different is 1% or 90%. Spending lots of time researching a better solution to shave off a few dollars is probably a bad idea. But I often see overspending on these projects in the 400% or higher range. Enormous numbers, not little ones.
What is surprising is how often the cheap solution (maybe 20% the assumed cost) is also the more reliable one.
-
@scottalanmiller said:
Are you reasonably confident that there wasn't a solution at half the cost that was faster and safer?
Fast, Cheap, Reliable... Pick two.... We wound up with relatively fast and reliable. It wasn't the cheapest solution, but it was built and maintained by a company we trusted. It also, however, wasn't the most expensive solution either.
@scottalanmiller said:
From a business perspective (goal level), how is that a win? Sounds like a loss, right? Isn't the goal to get the best solution for the cost?
At the time, and for what we were looking for, that was one of the solutions that offered most of the features we were looking for (including HA)... I would change your final question to "Isn't the goal to get the best solution for the value?" ... to which (like cost), the obvious answer is yes. But as (I think) you have mentioned elsewhere, cost does not necessarily equal value.
Some of the other storage vendors we looked at did not offer some of the features that we requested at the same price point as the LeftHand in the same price tier. So it is a Win in terms of value. Never once did I have to do a complete restore from backups because the storage cluster completely died. (And we did suffer from entire node failures, but the rest of the campus never noticed as the storage cluster's HA performed as expected).
@scottalanmiller said:
That depends, of course, if the cost different is 1% or 90%.
This is true. We weren't trying to penny pinch to save $500 or $1000... The price differences in products we looked at were between $5,000 and $10,000.... so between 15 and 30 percent difference between the most expensive and the solution that we followed through with.
<maybe we should have done a separate thread, lol.>
There's a longer story that goes with why we went with a SAN... Maybe I'll make a new topic and copy our last few responses from this thread... 8-)