ZFS Based Storage for Medium VMWare Workload
-
@donaldlandru said:
All in all I think the operations is pretty well protected, minus the three risks listed above. It is two nodes that can absorb either node failing, it is on redundant 10gig top of rack switches and redundant 1gig switches. Also, backups are done and tested as well with Veeam. Am I missing something here?
Yes, you are missing that it is not protected at all. Even if the server node layer was so well protected that it has no risk at all (which can never actually happen) the MSA represents more risk than a normal server. In no way is this "protected" let alone "well protected." You are running your operations on an expensive, yet "less than standard" level of reliability. The risks that you mention add up to "more than the risks of a normal server."
So calling this "well protected" is misleading. It's "unnecessarily risky" but "above the needs of the business." So you are at risk without reason, overly complex and have spent too much money for what you got from it, but the only thing that we have learned for sure from it are these two things:
- The company will happily spend far in excess of its needs - which is good as long as there is someone making sure that they only buy what they need, not what they can.
- The company has no need for even "standard reliability" let alone anything higher. So any spending on something more than straight servers with zero failover would be wasteful.
-
@scottalanmiller said:
@donaldlandru said:
Real life I am not sure if it works, on paper it does. It is a false sense of security but the MSA does have active/active controllers built in (10GB iSCSI), redundant power supplies, and of course the disks are in a RAID. The risks that are not mitigated by the single chassis are:
Not active/active. It has codependent controllers that fail together. It's the opposite of what people expect when they say "redundant". It's the two straw houses next door in a fire, scenario. Having two houses is redundant, but if they are both made of straw and there is a fire, the redundant house will provide zero protection while very likely making a fire that much more likely to happen or to spread. Active/Active controllers from HP start in the 3PAR line, not the MSAs.
All that other redundant stuff is a red herring. EVERY enterprise server has all of that redundancy but without the cripplingly dangerous dual controllers. Making any normal server MORE reliable than the MSA, not less. If anyone talks to you about the "redundant" parts in an MSA you are getting a sales pitch from someone trying very hard to trick you unless they point out that every server has those things so this is "just another server".
I disagree with this, the controllers fail independently of each other. We have experienced a controller failure in the MSA and while it was degraded performance wise, zero downtime was experienced. HP sent a technician out with replacement controller, hot-swapped and checked configuration 10 minute process again with zero downtime.
Now, if we are worried about the neighbors house on fire, sure we have that issue as everything (operations) is housed in a single data center. We accept the risk that our operations is highly available (might incur downtime during failover) but is not fault-tolerant (services are not running in an active/active format).
Software bugs on the other hand have bit us in the past we makes us very cautious when we do upgrades, scheduled maintenance windows, extra backups on hand, etc.
-
The biggest challenge with local storage is getting your extra capacity where you need it. This is something that matters in development, I can't have an overworked compute node with a boatload of extra storage, and an underworked one with no storage. This is what shared storage (at the dev level) solves for us. And keep in mind, those storage location needs can change at anytime.
-
@scottalanmiller said:
- The company has no need for even "standard reliability" let alone anything higher.
This is the second time in this thread you've said something to this effect. Why do you believe this? Simply because of the choices they made before?
That is a flawed way to look at the companies need for "standard reliability" or even HA. @donaldlandru provided the desired end results in his first and followup posts. Just because the company implemented something that didn't provide those stated goals before, doesn't mean the goals themselves are wrong, it means they were probably sold a bill of goods that couldn't provide those goals and got lucky throughout it's current use.
-
@donaldlandru said:
I disagree with this, the controllers fail independently of each other. We have experienced a controller failure in the MSA and while it was degraded performance wise, zero downtime was experienced. HP sent a technician out with replacement controller, hot-swapped and checked configuration 10 minute process again with zero downtime.
No they can fail independently of one another. That is not the same thing. Under certain types of hardware failure they are redundant, under the most common forms of firmware failure, they are not. This makes them work great in demos as you can reliably yank out a controller and it keeps working but search SW and you'll see the MSAs dying with both controllers going out at once with one killing the other as it goes. They are tightly coupled, you aren't in the range of independent controllers here.
They also die far more frequently than standard RAID controllers. A normal RAID controller is expected to have a multi-decade average life. One of the most reliable components in your servers (I've see this over 80,000 servers over a decade of monitoring.) Your failure rate from that one controller dying in your environment puts the failure rate at hundreds of times higher than I've measured in servers. It's purely anecdotal on your end, but something to consider. How many server controllers have died in the same time period even though you have many more servers?
-
@donaldlandru said:
The biggest challenge with local storage is getting your extra capacity where you need it. This is something that matters in development, I can't have an overworked compute node with a boatload of extra storage, and an underworked one with no storage. This is what shared storage (at the dev level) solves for us. And keep in mind, those storage location needs can change at anytime.
This is why you would build your new Dev servers to be clones... So you build them both with 384GB of RAM (how in the world did you get 288? lol.) and 4 x 6TB drives in RAID 10 (gives you 12TB usable in each server)... Then Setup XenServer + HA-Lizard (or at least DRBD) and it effectively turns that storage into shared storage.
-
@Dashrender said:
@scottalanmiller said:
- The company has no need for even "standard reliability" let alone anything higher.
This is the second time in this thread you've said something to this effect. Why do you believe this? Simply because of the choices they made before?
Because they are happy with the current reliability. If what they have today is "good enough" then better is, by logical extension, not just "good enough" but better. If you need six eggs to feed your family "enough", then seven eggs is "better".
-
Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?
Your operations systems - I like the two node sync'ed approach, if you even really need that, but you already have the two servers.
-
@scottalanmiller said:
@donaldlandru said:
I disagree with this, the controllers fail independently of each other. We have experienced a controller failure in the MSA and while it was degraded performance wise, zero downtime was experienced. HP sent a technician out with replacement controller, hot-swapped and checked configuration 10 minute process again with zero downtime.
No they can fail independently of one another. That is not the same thing. Under certain types of hardware failure they are redundant, under the most common forms of firmware failure, they are not. This makes them work great in demos as you can reliably yank out a controller and it keeps working but search SW and you'll see the MSAs dying with both controllers going out at once with one killing the other as it goes. They are tightly coupled, you aren't in the range of independent controllers here.
They also die far more frequently than standard RAID controllers. A normal RAID controller is expected to have a multi-decade average life. One of the most reliable components in your servers (I've see this over 80,000 servers over a decade of monitoring.) Your failure rate from that one controller dying in your environment puts the failure rate at hundreds of times higher than I've measured in servers. It's purely anecdotal on your end, but something to consider. How many server controllers have died in the same time period even though you have many more servers?
Yes they can and have failed independently of each other, outside of a demo environment (as I just outlined above). Firmware update risks are everywhere, shared and local storage both so one way or the other doesn't mitigate that risk.
Out of the hardware in our datacenter I have had the one MSA controller fail, a P420 in the HP DL360p G8 and a perc in the dell 2950, all inside the four years I have been here. To me this shows no better level of reliability than the other. Both of the controller failures in the blades caused downtime to the organization, the failure in the MSA did not.
-
@donaldlandru said:
Yes they can and have failed independently of each other, outside of a demo environment (as I just outlined above). Firmware update risks are everywhere, shared and local storage both so one way or the other doesn't mitigate that risk.
Active/Active doesn't have the firmware risk. That's a HUGE deal. MSAs fail, both controllers together, all of the time. At a rate we've observed far higher than servers fail on their own (equivalent servers.) It's just how it is. They can work, but they fail together too often to match the reliability of a normal server.
-
@donaldlandru said:
Out of the hardware in our datacenter I have had the one MSA controller fail, a P420 in the HP DL360p G8 and a perc in the dell 2950, all inside the four years I have been here. To me this shows no better level of reliability than the other. Both of the controller failures in the blades caused downtime to the organization, the failure in the MSA did not.
Those are crazy high failure rates for all of those. PERCs I have not measured in large quantity but SmartArrays I have, by the thousands, and the failure rates are miniscule, a fraction of the failure rates of memory sticks, for example.
-
@Dashrender said:
Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?
His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.
-
@Dashrender said:
Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?
Your operations systems - I like the two node sync'ed approach, if you even really need that, but you already have the two servers.
That is pretty much where this all started, do I need to fork out the money to HP or is the other way good enough.
In operations the RTO/RPO is 24 hours. We carry our HP care pack on the MSA. Everything is backed up by Veeam several hours throughout the day and replicated offsite. We have physical access to the offsite location in case of datacenter failure for faster recovery.
For the development environments up to six months ago there was no backup of the development environments as the thought was this could be rebuilt from scratch. This was until I outlined the effort it would take to bring everything back. -- roughly 6 months.
Now the RPO is one week with a RTO of 72 hours.
-
@scottalanmiller said:
@Dashrender said:
Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?
His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.
You're right it is, but for the dev environment it might be all that he needs with a good backup solution. He's currently hamstrung by his old servers - two of which are slated to be replaced in the next year or so.
Perhaps he should do nothing until it's time to replace those boxes.
-
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?
His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.
You're right it is, but for the dev environment it might be all that he needs with a good backup solution. He's currently hamstrung by his old servers - two of which are slated to be replaced in the next year or so.
Perhaps he should do nothing until it's time to replace those boxes.
I can't do nothing, I do not have enough storage to host a new client that starts soon. I have to do something there. I am not opposed to overall architecture changes in a refresh cycle, but in the meantime -- I have a budget and need disk.
-
That all supports that HA is total overkill. HA is for when ten minutes is too long. Not for when "we can be down for an hour or two in a disaster."
-
@donaldlandru said:
Here is what the business cares about the solution: Reliable solution that provides necessary resources for the development environments to operate effectively (read: we do not do performance testing in-house as by the very nature, it is much a your mileage may vary depending on your deployment situation).
In addition to the business requirements, I have added my own requirements that my boss agrees with and blesses.
- Operations and Development must be on separate storage devices
- Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)
- Must be expandable to accommodate future growth
Requirements for development storage
- 9+ Tib of usable storage
- Support a minimum of 1100 random iops (what our current system is peaking at)
- disks must be in some kind of array (zfs, raid, mdadm, etc)
Back to the original requirements list. HA and FT are not listed as needed for the development environment. This conversation went sideways when we started digging into the operations side (where there should be HA) and I have a weak point, the storage.
-
@donaldlandru said:
Back to the original requirements list. HA and FT are not listed as needed for the development environment. This conversation went sideways when we started digging into the operations side (where there should be HA) and I have a weak point, the storage.
Okay, so we are looking exclusively at the non-production side?
But production completely lacks HA today, it should be a different thread, but your "actions" say you dont need HA in production even if you feel that you do. Either what you have today isn't good enough and has to be replaced there, or HA isn't needed since you've happily been without it for so long. This can't be overlooked - you are stuck with either falling short of a need or not being clear on the needs for production.
-
For dev, why do anything except replace the nodes with a single node that can handle the load? Cheap, simple, easy.
-
The cost of external storage for the compute nodes is a huge percentage of the cost of just replacing the whole thing, right? If you could spend $14K on an MSA for them, you should be able to spend around $16K, I'm guessing, to get a single node with more CPU and more RAM than you have between the two nodes currently while getting a storage system that is bigger and likely orders of magnitude faster.