ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    ZFS Based Storage for Medium VMWare Workload

    Scheduled Pinned Locked Moved SAM-SD
    zfsstoragevirtualizationfilesystemsraid
    156 Posts 9 Posters 86.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • donaldlandruD
      donaldlandru @scottalanmiller
      last edited by

      @scottalanmiller said:

      @Dashrender said:

      @scottalanmiller said:

      By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.

      Did the price of Essentials double? I thought it was $600 for three nodes for Essentials? and something like $5000 for Essentials Plus.

      Those are the rough numbers. He has five nodes so that means either buying all licenses twice (so $1200 and $10,000) or being disqualified from Essentials pricing altogether and needing to move to Standard licensing options.

      This is very much the case here.

      To get this all into a single cluster (and hopefully using something like VSAN) would require us to upgrade to standard or higher, we would be able to use acceleration kits to get us there but is no small investment.

      DashrenderD scottalanmillerS 2 Replies Last reply Reply Quote 0
      • dafyreD
        dafyre @donaldlandru
        last edited by

        @donaldlandru said:

        The politics are likely to be harder to play as we just renewed our SnS for both Essentials and Essentials plus in January for three years.
        <snip>
        Another important piece of information with the local storage is that everything is based on 2.5" disks -- and all but two servers only have two bays each, getting any really kind of local storage without going external direct attached (non-shared) is going to be a challenge.

        He brings a good point about the 2 bays and 2.5" drives... Do they even make 4 / 6 TB drives in 2.5" form yet?

        If not, would it be worth getting an external DAS shelf for each of the servers?

        DashrenderD 1 Reply Last reply Reply Quote 0
        • DashrenderD
          Dashrender @scottalanmiller
          last edited by

          @scottalanmiller said:

          @Dashrender said:

          @scottalanmiller said:

          By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.

          Did the price of Essentials double? I thought it was $600 for three nodes for Essentials? and something like $5000 for Essentials Plus.

          Those are the rough numbers. He has five nodes so that means either buying all licenses twice (so $1200 and $10,000) or being disqualified from Essentials pricing altogether and needing to move to Standard licensing options.

          Aww, gotcha - you were doubling them up.... His current spend was $5600.

          1 Reply Last reply Reply Quote 0
          • DashrenderD
            Dashrender @donaldlandru
            last edited by

            @donaldlandru said:

            @scottalanmiller said:

            Currently you have an inverted pyramid of doom, not the best design as you know.

            This is true, in all scenarios we are playing out we are left with this giant SPOF. This is why I really like the alternatives shared solutions, ZFS, openindiana, etc. because it will allow me to build a second storage node and do replication for failover.

            The business is also screaming for reliability and 110% uptime, but falls short when it comes time to writing the check for what they want.

            Do the dev environments need to be highly available -- IMO no, but the business sees that as it's bread and butter, they are aware that we still have not fulfilled this requirements.

            The question is - do they loose more money when the operations systems are down or when the dev environment is down?

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @donaldlandru
              last edited by

              @donaldlandru said:

              This is true, in all scenarios we are playing out we are left with this giant SPOF.

              It's important to recognize that it is a SPOF. But being a SPOF is not the core issue, believe it or not, just the one that causes the biggest emotional reaction. If you were to buy a super high end active/active EMC or HDS device for this (mainframe class storage, start around $50K for the smallest possible units) the fact that it was a SPOF would be heavily mitigated. The whole mainframe concept is built around making a SPOF that is unlikely to fail.

              But your issues are bigger. Here are the big issues that you are left with in both of your scenarios:

              • Single point of failure on which everything rests (the thing most likely to fail causes EVERYTHING to fail.)
              • No risk mitigation for the other layers in the dependency chain. This isn't a 3-2-1 as traditionally described but actually a (1/1/1-1) meaning ANY server failure results in unmitigated (literally) failure AND any storage failure results in total failure. You have a dramatic increase in failure risk with this design, not just a small or moderate increase like most people see (because most people are confused and heavily mitigate risk at one or two but not all three layers.) So it is very important to realize that this is at least one full order of magnitude more risky than a traditional inverted pyramid of doom.
              • The single point of failure that you have is actually a pretty fragile one. Probably more fragile than the servers themselves. So not only is the risk of failure doubled by having two completely places for things to fail, but the single point of failure that impacts everything is the most fragile piece of all.
              • This has the highest cost both today AND going into the future.
              donaldlandruD 1 Reply Last reply Reply Quote 0
              • DashrenderD
                Dashrender @donaldlandru
                last edited by

                @donaldlandru said:

                @scottalanmiller said:

                @Dashrender said:

                @scottalanmiller said:

                By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.

                Did the price of Essentials double? I thought it was $600 for three nodes for Essentials? and something like $5000 for Essentials Plus.

                Those are the rough numbers. He has five nodes so that means either buying all licenses twice (so $1200 and $10,000) or being disqualified from Essentials pricing altogether and needing to move to Standard licensing options.

                This is very much the case here.

                To get this all into a single cluster (and hopefully using something like VSAN) would require us to upgrade to standard or higher, we would be able to use acceleration kits to get us there but is no small investment.

                But that is completely unnecessary if you move to Xen (or is it XenServer - still confused) or Hyper-V

                scottalanmillerS 1 Reply Last reply Reply Quote 1
                • DashrenderD
                  Dashrender @dafyre
                  last edited by

                  @dafyre said:

                  @donaldlandru said:

                  The politics are likely to be harder to play as we just renewed our SnS for both Essentials and Essentials plus in January for three years.
                  <snip>
                  Another important piece of information with the local storage is that everything is based on 2.5" disks -- and all but two servers only have two bays each, getting any really kind of local storage without going external direct attached (non-shared) is going to be a challenge.

                  He brings a good point about the 2 bays and 2.5" drives... Do they even make 4 / 6 TB drives in 2.5" form yet?

                  If not, would it be worth getting an external DAS shelf for each of the servers?

                  It's been 15 years, but I've seen DAS shelves that can be split between two hosts. Assuming those are still made, and there is enough needed disk slots, that would save a small amount.

                  scottalanmillerS S 2 Replies Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller @donaldlandru
                    last edited by

                    @donaldlandru said:

                    To get this all into a single cluster (and hopefully using something like VSAN) would require us to upgrade to standard or higher, we would be able to use acceleration kits to get us there but is no small investment.

                    Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right?

                    What I am proposing is that you make the single order of magnitude leap from "acceptably low" reliability to "standard reliability which is good enough for any normal SMB" while dropping your cost dramatically. It's a massive win. Saving a fortune AND leaping far beyond your reliability needs.

                    Going to something like VSAN just can't make sense. You didn't need something like this before, why would you suddenly need to leapfrog from "super low availability" right over top of normal all the way to "they don't even need this on most of Wall St" super high availability at massively high cost that would require that you upgrade your compute nodes and licensing high cost storage replication technologies? Not only would it require that but it would require bigger or more nodes in order to handle those needs. It's a little like someone who has been riding a bicycle for years (but paying a fortune for it) finding out that they can get a Chevy Cruze for half the price, but having seen what cars are like, deciding that they should buy a Ferrari for their first car when a bicycle was fine all along.

                    DashrenderD S 2 Replies Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller @Dashrender
                      last edited by

                      @Dashrender said:

                      It's been 15 years, but I've seen DAS shelves that can be split between two hosts. Assuming those are still made, and there is enough needed disk slots, that would save a small amount.

                      DAS by definition can be split.

                      1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @Dashrender
                        last edited by

                        @Dashrender said:

                        But that is completely unnecessary if you move to Xen (or is it XenServer - still confused) or Hyper-V

                        If he moves to HyperV or XenServer he would still need proprietary replicated local storage options at his node count. But it would be free at the platform layer (saving $10K at least) and far cheaper at the storage layer (saving many thousands more.)

                        1 Reply Last reply Reply Quote 0
                        • donaldlandruD
                          donaldlandru @scottalanmiller
                          last edited by

                          @scottalanmiller said:

                          It's important to recognize that it is a SPOF. But being a SPOF is not the core issue, believe it or not, just the one that causes the biggest emotional reaction. If you were to buy a super high end active/active EMC or HDS device for this (mainframe class storage, start around $50K for the smallest possible units) the fact that it was a SPOF would be heavily mitigated. The whole mainframe concept is built around making a SPOF that is unlikely to fail.

                          But your issues are bigger. Here are the big issues that you are left with in both of your scenarios:

                          • Single point of failure on which everything rests (the thing most likely to fail causes EVERYTHING to fail.)
                          • No risk mitigation for the other layers in the dependency chain. This isn't a 3-2-1 as traditionally described but actually a (1/1/1-1) meaning ANY server failure results in unmitigated (literally) failure AND any storage failure results in total failure. You have a dramatic increase in failure risk with this design, not just a small or moderate increase like most people see (because most people are confused and heavily mitigate risk at one or two but not all three layers.) So it is very important to realize that this is at least one full order of magnitude more risky than a traditional inverted pyramid of doom.
                          • The single point of failure that you have is actually a pretty fragile one. Probably more fragile than the servers themselves. So not only is the risk of failure doubled by having two completely places for things to fail, but the single point of failure that impacts everything is the most fragile piece of all.
                          • This has the highest cost both today AND going into the future.

                          Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

                          • Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)
                          • Move to local storage and create redundant servers for items that can't be down (split-scope DHCP, second Exchange server) not sure how to mitigate the risk to SharePoint being offline since it is the free version, plus the SQL server would be another single point

                          When dealing with the Microsoft licensing to create the redundancy to obtain the reliability the business wants I think we are coming in at around the same price. Going with local storage here would reduce the complexity and if I can convince the organization to go with Office 365 we actually have a lot lower risk here and wouldn't need to create a bunch of highly available services.

                          The second topic (scope) is the development environments and you are 100% correct, even if we have active/active SAN clusters the failure will always be at the server level. The lack of vmotion in this "cluster" and the lack of available resources to do a failover, make the compute layer the biggest problem. If we lose a compute node those servers are offline until replaced. The business accepts that risk as long as we have a fast way of spinning down VMs and bringing up the VMs the team is working on. This is much easier with shared storage than local, in my opinion.

                          So I do have multiple problems to solve, with different sets of requirements.

                          scottalanmillerS 2 Replies Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller
                            last edited by

                            If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following:

                            • Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!
                            • Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)
                            • Make two clusters of two servers each keeping every software piece free and simple
                            DashrenderD 1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller
                              last edited by

                              Going the XenServer HA route, the guy who actually makes HA-Lizard is here in the community so that is a big deal that not only do you have XS resources here, but you have *the XS HA resource.

                              1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @donaldlandru
                                last edited by

                                @donaldlandru said:

                                Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

                                Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation.

                                donaldlandruD 1 Reply Last reply Reply Quote 0
                                • donaldlandruD
                                  donaldlandru
                                  last edited by

                                  My next biggest concern, like any technology, is how do I get there from here. I have enough budget for a storage node, and we are going to run out of space within the next 60 days. I do not have, and will not receive additional funding this year for new servers. So some form of "in-place" style of upgrade has to occur. Obviously, this is a server down, convert vm bring it back up type of process that has an unknown LoE.

                                  Trying to not paint a picture of a rock and a hard place, but realistically where else am I at right now?

                                  S 1 Reply Last reply Reply Quote 0
                                  • DashrenderD
                                    Dashrender @scottalanmiller
                                    last edited by

                                    @scottalanmiller said:

                                    If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following:

                                    • Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!
                                    • Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)
                                    • Make two clusters of two servers each keeping every software piece free and simple

                                    That would cost a lot more than his current $14,000 budget (assuming that number was a budget number).

                                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @donaldlandru
                                      last edited by

                                      @donaldlandru said:

                                      • Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)

                                      I've never seen someone do this successfully. That doesn't suggest that it doesn't work, but are you sure that the MSA series will do SAN mirroring with fault tolerance? I'm not confident that that is a feature (but certainly not confident that it isn't.) Double check that to be sure as I talk to MSA users daily and no one has ever led me to believe that this was even an option.

                                      I know that Dell's MD series cannot do this, only the EQL series.

                                      donaldlandruD 1 Reply Last reply Reply Quote 0
                                      • donaldlandruD
                                        donaldlandru @scottalanmiller
                                        last edited by

                                        @scottalanmiller said:

                                        @donaldlandru said:

                                        Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are:

                                        Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation.

                                        In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb.

                                        scottalanmillerS 1 Reply Last reply Reply Quote 0
                                        • scottalanmillerS
                                          scottalanmiller @Dashrender
                                          last edited by

                                          @Dashrender said:

                                          That would cost a lot more than his current $14,000 budget (assuming that number was a budget number).

                                          Yes, but cost far less than what he was proposing. My recommendations were to lower his cost while improving reliability originally. Then he lept to the Ferrari scenario so I proposed another solution that still beats that one while maintaining the Ferrari features while still only spending a fraction as much money.

                                          1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @donaldlandru
                                            last edited by

                                            @donaldlandru said:

                                            In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb.

                                            Oh okay. So mitigated where it matters, I assume, and unmitigated where it doesn't matter so much. That I was not clear about.

                                            dafyreD 1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 6
                                            • 7
                                            • 8
                                            • 8 / 8
                                            • First post
                                              Last post