ZFS Based Storage for Medium VMWare Workload

dafyre

The biggest challenge with local storage is getting your extra capacity where you need it. This is something that matters in development, I can't have an overworked compute node with a boatload of extra storage, and an underworked one with no storage. This is what shared storage (at the dev level) solves for us. And keep in mind, those storage location needs can change at anytime.

This is why you would build your new Dev servers to be clones... So you build them both with 384GB of RAM (how in the world did you get 288? lol.) and 4 x 6TB drives in RAID 10 (gives you 12TB usable in each server)... Then Setup XenServer + HA-Lizard (or at least DRBD) and it effectively turns that storage into shared storage.

scottalanmiller

@Dashrender said:

@scottalanmiller said:

The company has no need for even "standard reliability" let alone anything higher.

This is the second time in this thread you've said something to this effect. Why do you believe this? Simply because of the choices they made before?

Because they are happy with the current reliability. If what they have today is "good enough" then better is, by logical extension, not just "good enough" but better. If you need six eggs to feed your family "enough", then seven eggs is "better".

Dashrender

Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?

Your operations systems - I like the two node sync'ed approach, if you even really need that, but you already have the two servers.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

I disagree with this, the controllers fail independently of each other. We have experienced a controller failure in the MSA and while it was degraded performance wise, zero downtime was experienced. HP sent a technician out with replacement controller, hot-swapped and checked configuration 10 minute process again with zero downtime.

No they can fail independently of one another. That is not the same thing. Under certain types of hardware failure they are redundant, under the most common forms of firmware failure, they are not. This makes them work great in demos as you can reliably yank out a controller and it keeps working but search SW and you'll see the MSAs dying with both controllers going out at once with one killing the other as it goes. They are tightly coupled, you aren't in the range of independent controllers here.

They also die far more frequently than standard RAID controllers. A normal RAID controller is expected to have a multi-decade average life. One of the most reliable components in your servers (I've see this over 80,000 servers over a decade of monitoring.) Your failure rate from that one controller dying in your environment puts the failure rate at hundreds of times higher than I've measured in servers. It's purely anecdotal on your end, but something to consider. How many server controllers have died in the same time period even though you have many more servers?

Yes they can and have failed independently of each other, outside of a demo environment (as I just outlined above). Firmware update risks are everywhere, shared and local storage both so one way or the other doesn't mitigate that risk.

Out of the hardware in our datacenter I have had the one MSA controller fail, a P420 in the HP DL360p G8 and a perc in the dell 2950, all inside the four years I have been here. To me this shows no better level of reliability than the other. Both of the controller failures in the blades caused downtime to the organization, the failure in the MSA did not.

scottalanmiller

@donaldlandru said:

Yes they can and have failed independently of each other, outside of a demo environment (as I just outlined above). Firmware update risks are everywhere, shared and local storage both so one way or the other doesn't mitigate that risk.

Active/Active doesn't have the firmware risk. That's a HUGE deal. MSAs fail, both controllers together, all of the time. At a rate we've observed far higher than servers fail on their own (equivalent servers.) It's just how it is. They can work, but they fail together too often to match the reliability of a normal server.

scottalanmiller

@donaldlandru said:

Out of the hardware in our datacenter I have had the one MSA controller fail, a P420 in the HP DL360p G8 and a perc in the dell 2950, all inside the four years I have been here. To me this shows no better level of reliability than the other. Both of the controller failures in the blades caused downtime to the organization, the failure in the MSA did not.

Those are crazy high failure rates for all of those. PERCs I have not measured in large quantity but SmartArrays I have, by the thousands, and the failure rates are miniscule, a fraction of the failure rates of memory sticks, for example.

scottalanmiller

@Dashrender said:

Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?

His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.

donaldlandru

@Dashrender said:

Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?

Your operations systems - I like the two node sync'ed approach, if you even really need that, but you already have the two servers.

That is pretty much where this all started, do I need to fork out the money to HP or is the other way good enough.

In operations the RTO/RPO is 24 hours. We carry our HP care pack on the MSA. Everything is backed up by Veeam several hours throughout the day and replicated offsite. We have physical access to the offsite location in case of datacenter failure for faster recovery.

For the development environments up to six months ago there was no backup of the development environments as the thought was this could be rebuilt from scratch. This was until I outlined the effort it would take to bring everything back. -- roughly 6 months.

Now the RPO is one week with a RTO of 72 hours.

Dashrender

@scottalanmiller said:

@Dashrender said:

Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?

His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.

You're right it is, but for the dev environment it might be all that he needs with a good backup solution. He's currently hamstrung by his old servers - two of which are slated to be replaced in the next year or so.

Perhaps he should do nothing until it's time to replace those boxes.

donaldlandru

@Dashrender said:

@scottalanmiller said:

@Dashrender said:

Because of the volatility of your dev environment, I wonder if using a SAM-SD for central storage would be best. What happens if the entire storage array is down? Can you live for a day or two without it on the dev environment? What are you planning for backups on it? What is your RTO and RPO?

His proposed ZFS-based storage option is a SAM-SD, just in case anyone missed that.

You're right it is, but for the dev environment it might be all that he needs with a good backup solution. He's currently hamstrung by his old servers - two of which are slated to be replaced in the next year or so.

Perhaps he should do nothing until it's time to replace those boxes.

I can't do nothing, I do not have enough storage to host a new client that starts soon. I have to do something there. I am not opposed to overall architecture changes in a refresh cycle, but in the meantime -- I have a budget and need disk.

scottalanmiller

That all supports that HA is total overkill. HA is for when ten minutes is too long. Not for when "we can be down for an hour or two in a disaster."

donaldlandru

@donaldlandru said:

Here is what the business cares about the solution: Reliable solution that provides necessary resources for the development environments to operate effectively (read: we do not do performance testing in-house as by the very nature, it is much a your mileage may vary depending on your deployment situation).

In addition to the business requirements, I have added my own requirements that my boss agrees with and blesses.

Operations and Development must be on separate storage devices

Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)

Must be expandable to accommodate future growth

Requirements for development storage

9+ Tib of usable storage

Support a minimum of 1100 random iops (what our current system is peaking at)

disks must be in some kind of array (zfs, raid, mdadm, etc)

Back to the original requirements list. HA and FT are not listed as needed for the development environment. This conversation went sideways when we started digging into the operations side (where there should be HA) and I have a weak point, the storage.

scottalanmiller

@donaldlandru said:

Back to the original requirements list. HA and FT are not listed as needed for the development environment. This conversation went sideways when we started digging into the operations side (where there should be HA) and I have a weak point, the storage.

Okay, so we are looking exclusively at the non-production side?

But production completely lacks HA today, it should be a different thread, but your "actions" say you dont need HA in production even if you feel that you do. Either what you have today isn't good enough and has to be replaced there, or HA isn't needed since you've happily been without it for so long. This can't be overlooked - you are stuck with either falling short of a need or not being clear on the needs for production.

scottalanmiller

For dev, why do anything except replace the nodes with a single node that can handle the load? Cheap, simple, easy.

scottalanmiller

The cost of external storage for the compute nodes is a huge percentage of the cost of just replacing the whole thing, right? If you could spend $14K on an MSA for them, you should be able to spend around $16K, I'm guessing, to get a single node with more CPU and more RAM than you have between the two nodes currently while getting a storage system that is bigger and likely orders of magnitude faster.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

Back to the original requirements list. HA and FT are not listed as needed for the development environment. This conversation went sideways when we started digging into the operations side (where there should be HA) and I have a weak point, the storage.

Okay, so we are looking exclusively at the non-production side?

But production completely lacks HA today, it should be a different thread, but your "actions" say you dont need HA in production even if you feel that you do. Either what you have today isn't good enough and has to be replaced there, or HA isn't needed since you've happily been without it for so long. This can't be overlooked - you are stuck with either falling short of a need or not being clear on the needs for production.

Ahh -- there is the detail I missed. Just re-read my post and that doesn't make this clear. Yes, the discussion was supposed to pertain to the non-production side. My apologies.

I agree we do lack true HA in the production side as there is a single weak link (one storage array), the solution here depends on our move to Office 365 as that would take most of the operations load off of the network and change the requirements completely.

We have qasi-HA with the current solution, but now based on new enlightenment I would agree it is not fully HA.

dafyre

Curiosity got the better of me, so I went to xByte to see... You can build a nice SAM-SD based on a Dell R720 from xBytes for around 10k ... But that included 256GB of ram and 8 x 1.2 TB SAS drives (they don't have any larger drives listed on their web site)... and 3 Year Warranty... (I have a PDF (https://beta.wellston.biz/xByte SAM-SD.pdf) of how I configured it if everybody wants to see)...

scottalanmiller

@donaldlandru said:

Ahh -- there is the detail I missed. Just re-read my post and that doesn't make this clear. Yes, the discussion was supposed to pertain to the non-production side. My apologies.

LOL, a rather sizeable detail I think we've been focused almost entirely on the operations cluster in our discussion and/or putting the two together to assess needs as a whole - which is worth considering, is there actually a good reason that they are independent to this level?

scottalanmiller

@dafyre said:

Curiosity got the better of me, so I went to xByte to see... You can build a nice SAM-SD based on a Dell R720 from xBytes for around 10k ... But that included 256GB of ram and 8 x 1.2 TB SAS drives (they don't have any larger drives listed on their web site)... and 3 Year Warranty... (I have a PDF (https://beta.wellston.biz/xByte SAM-SD.pdf) of how I configured it if everybody wants to see)...

Yup, using xByte and the PowerEdge R720xd (did you do the 720 or the 720xd?) you can get quite a monster of a server. We have a reference PowerEdge R720xd at the NTG Labs for this. Only 128GB of RAM, though With the 720xd you can do 12x LFF drives plus two SSDs in CacheCade. Sure, you are going to spend a little more for that than what you quoted, but not tons more and that is a 50% leap in drive capacity and an insane leap in potential IOPS with the CacheCade included.

donaldlandru

@scottalanmiller said:

The cost of external storage for the compute nodes is a huge percentage of the cost of just replacing the whole thing, right? If you could spend $14K on an MSA for them, you should be able to spend around $16K, I'm guessing, to get a single node with more CPU and more RAM than you have between the two nodes currently while getting a storage system that is bigger and likely orders of magnitude faster.

HP DL360p Gen 8 with 2 Intel E5-2640 and 384GB ram cost us roughly $13k each -- this is without local drives. On our current large compute node I am only 20% utilized on CPU and 50% utilized on RAM (at peak). I am however, out of storage. Which I can add for as cheap as $5k with RED drives or $10k with Seagate SAS drives.

The $13k does not include VMWare licensing, which is obviously much debated if I even need it; however, send I am decommissioning 4 CPUs when we upgrade I still have available licenses.