ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Replacing the Dead IPOD, SAN Bit the Dust

    Scheduled Pinned Locked Moved IT Discussion
    inverted pyramid of doomarchitectureipodsanstoragevirtualizationrisk
    100 Posts 14 Posters 22.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      Jimmy9008
      last edited by

      I have put in an IPOD before at an SMB. Although nothing failed before I left, at least not that I know of (as it was years ago), I now have the knowledge to build better solutions anyway. So would not do that again. This was 4 hosts, 1 EQL SAN. An MSP I worked for always put them in, even once they were aware of the issues. Sometimes, you cannot teach people as 'it always worked'... pfft.

      If I were the OP, I would work with the business to define if, and why,, a failover cluster is needed. If not, things get so simple. Two hosts using replica to each other and great (tested) backups, is likely more than enough. If host A fails, start the replicas on B. If B fails, start the replicas on A. Have each doing 50%). Then backup on and off site and test both. If a cluster is needed, defo a vSAN like starwind.

      scottalanmillerS 1 Reply Last reply Reply Quote 3
      • scottalanmillerS
        scottalanmiller @Aconboy
        last edited by

        @Aconboy said in Replacing the Dead IPOD, SAN Bit the Dust:

        @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

        25

        looking at this thread, I would say that a Scale 1150 cluster would fit the bill nicely, and even with a single node for second site dr, he would still likely be under $35k all-in

        That's what I was imagining. Might need slightly more than the baseline RAM, but even that might be enough with 2x 64GB nodes.

        dafyreD 1 Reply Last reply Reply Quote 2
        • dafyreD
          dafyre @scottalanmiller
          last edited by

          @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

          @Aconboy said in Replacing the Dead IPOD, SAN Bit the Dust:

          @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

          25

          looking at this thread, I would say that a Scale 1150 cluster would fit the bill nicely, and even with a single node for second site dr, he would still likely be under $35k all-in

          That's what I was imagining. Might need slightly more than the baseline RAM, but even that might be enough with 2x 64GB nodes.

          If they're not doing HA and all of that... why not get one beefier node rather than two smaller ones?

          scottalanmillerS 1 Reply Last reply Reply Quote 3
          • scottalanmillerS
            scottalanmiller @Jimmy9008
            last edited by

            @Jimmy9008 said in Replacing the Dead IPOD, SAN Bit the Dust:

            An MSP I worked for always put them in, even once they were aware of the issues. Sometimes, you cannot teach people as 'it always worked'... pfft.

            Well, what is good for a VAR is not what is good for the customer. An IPOD is terrible for the customer, but the best thing ever for a VAR. So a VAR, even knowing how bad an IPOD is for the customer, makes extra money selling the design, extra money supporting the design, extra money helping the customer recover from the "once in a lifetime failure that only happens to them", etc. To a VAR, the IPOD is the perfect way to make maximum profits. So a VAR, knowing the full situation, will often not just keep selling IPODs, but move to them!

            1 Reply Last reply Reply Quote 2
            • scottalanmillerS
              scottalanmiller @dafyre
              last edited by

              @dafyre said in Replacing the Dead IPOD, SAN Bit the Dust:

              @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

              @Aconboy said in Replacing the Dead IPOD, SAN Bit the Dust:

              @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

              25

              looking at this thread, I would say that a Scale 1150 cluster would fit the bill nicely, and even with a single node for second site dr, he would still likely be under $35k all-in

              That's what I was imagining. Might need slightly more than the baseline RAM, but even that might be enough with 2x 64GB nodes.

              If they're not doing HA and all of that... why not get one beefier node rather than two smaller ones?

              AKA Mainframe design.

              DashrenderD 1 Reply Last reply Reply Quote 1
              • DashrenderD
                Dashrender @scottalanmiller
                last edited by

                @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                @dafyre said in Replacing the Dead IPOD, SAN Bit the Dust:

                @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                @Aconboy said in Replacing the Dead IPOD, SAN Bit the Dust:

                @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                25

                looking at this thread, I would say that a Scale 1150 cluster would fit the bill nicely, and even with a single node for second site dr, he would still likely be under $35k all-in

                That's what I was imagining. Might need slightly more than the baseline RAM, but even that might be enough with 2x 64GB nodes.

                If they're not doing HA and all of that... why not get one beefier node rather than two smaller ones?

                AKA Mainframe design.

                Is it really mainframe design? don't a lot of mainframes have tons on internal redundancies and fail over components?

                scottalanmillerS 1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller @Dashrender
                  last edited by

                  @Dashrender said in Replacing the Dead IPOD, SAN Bit the Dust:

                  @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                  @dafyre said in Replacing the Dead IPOD, SAN Bit the Dust:

                  @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                  @Aconboy said in Replacing the Dead IPOD, SAN Bit the Dust:

                  @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                  25

                  looking at this thread, I would say that a Scale 1150 cluster would fit the bill nicely, and even with a single node for second site dr, he would still likely be under $35k all-in

                  That's what I was imagining. Might need slightly more than the baseline RAM, but even that might be enough with 2x 64GB nodes.

                  If they're not doing HA and all of that... why not get one beefier node rather than two smaller ones?

                  AKA Mainframe design.

                  Is it really mainframe design? don't a lot of mainframes have tons on internal redundancies and fail over components?

                  A "lot" of non-mainframes do, too. Those things are not what makes something a mainframe and lacking them is not what makes something else not a mainframe.

                  This is a "Mainframe Architecture", not a mainframe, meaning it is an architecture that is "Designed around a single highly reliable component" in contrast to other designs that rely on multiple components to make up for individual fragility.

                  1 Reply Last reply Reply Quote 2
                  • DashrenderD
                    Dashrender
                    last edited by

                    gotcha.

                    1 Reply Last reply Reply Quote 1
                    • wrx7mW
                      wrx7m
                      last edited by

                      I have to carve out an hour and a half to watch the two SAM presentations posted earlier in this thread...

                      1 Reply Last reply Reply Quote 4
                      • StrongBadS
                        StrongBad
                        last edited by

                        Sounds like the business really wants something more robust, even if they didn't figure out how to do it the first time through, so going for something simple, but hyperconverged, seems like the obvious answer. Especially if it can come in way under the current expected budget.

                        1 Reply Last reply Reply Quote 0
                        • scottalanmillerS
                          scottalanmiller
                          last edited by

                          Just saw another thread of someone who did the same thing.... depended on a black box SAN, let support lapse, and now is in tough shape: https://community.spiceworks.com/topic/1912628-emc-vnxe3100-troublesome-storage-pool-vmware-view-vdi

                          coliverC 1 Reply Last reply Reply Quote 0
                          • coliverC
                            coliver @scottalanmiller
                            last edited by

                            @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                            Just saw another thread of someone who did the same thing.... depended on a black box SAN, let support lapse, and now is in tough shape: https://community.spiceworks.com/topic/1912628-emc-vnxe3100-troublesome-storage-pool-vmware-view-vdi

                            And they're using the VNXe line. Couldn't get in any worse shape.

                            scottalanmillerS 1 Reply Last reply Reply Quote 1
                            • scottalanmillerS
                              scottalanmiller @coliver
                              last edited by

                              @coliver said in Replacing the Dead IPOD, SAN Bit the Dust:

                              @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                              Just saw another thread of someone who did the same thing.... depended on a black box SAN, let support lapse, and now is in tough shape: https://community.spiceworks.com/topic/1912628-emc-vnxe3100-troublesome-storage-pool-vmware-view-vdi

                              And they're using the VNXe line. Couldn't get in any worse shape.

                              Could be an MSA. Bwahahaha.

                              DashrenderD 1 Reply Last reply Reply Quote 0
                              • DashrenderD
                                Dashrender @scottalanmiller
                                last edited by

                                @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                                @coliver said in Replacing the Dead IPOD, SAN Bit the Dust:

                                @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                                Just saw another thread of someone who did the same thing.... depended on a black box SAN, let support lapse, and now is in tough shape: https://community.spiceworks.com/topic/1912628-emc-vnxe3100-troublesome-storage-pool-vmware-view-vdi

                                And they're using the VNXe line. Couldn't get in any worse shape.

                                Could be an MSA. Bwahahaha.

                                Some guy in the XS threads over the backup throughput problems is saying he's buying a brand new MSA for his new XS box...

                                scottalanmillerS 1 Reply Last reply Reply Quote 0
                                • scottalanmillerS
                                  scottalanmiller @Dashrender
                                  last edited by

                                  @Dashrender said in Replacing the Dead IPOD, SAN Bit the Dust:

                                  @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                                  @coliver said in Replacing the Dead IPOD, SAN Bit the Dust:

                                  @scottalanmiller said in Replacing the Dead IPOD, SAN Bit the Dust:

                                  Just saw another thread of someone who did the same thing.... depended on a black box SAN, let support lapse, and now is in tough shape: https://community.spiceworks.com/topic/1912628-emc-vnxe3100-troublesome-storage-pool-vmware-view-vdi

                                  And they're using the VNXe line. Couldn't get in any worse shape.

                                  Could be an MSA. Bwahahaha.

                                  Some guy in the XS threads over the backup throughput problems is saying he's buying a brand new MSA for his new XS box...

                                  Sad trombone plays.

                                  1 Reply Last reply Reply Quote 1
                                  • NerdyDadN
                                    NerdyDad @Aconboy
                                    last edited by

                                    @Aconboy @scottalanmiller Looks like I would need 2 of 1150's all decked out in order to handle the processing power for the datacenter.

                                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @NerdyDad
                                      last edited by

                                      @NerdyDad said in Replacing the Dead IPOD, SAN Bit the Dust:

                                      @Aconboy @scottalanmiller Looks like I would need 2 of 1150's all decked out in order to handle the processing power for the datacenter.

                                      No, you need three. Just how the clusters work. The smallest SSD hybrid cluster is three 1150s. That gives you two units to handle the load and one to provide for N+1 failover. While things are not in a failure state, you get the power of the extra CPU and only drop 33% when a full node has failed.

                                      With only two, you have to have the full CPU and RAM of all workloads in every node. WIth three, you need less per node. You can get a pretty extreme amount of power in a single 1150, I doubt you need to go anywhere close to decked out.

                                      You can also grow transparently in the future. Start with three today. Later you need more, just add a fourth node and you get more disk capacity, more IOPS, more RAM, more CPU. You just plug it in and let things load balance to the new node.

                                      1 Reply Last reply Reply Quote 2
                                      • NerdyDadN
                                        NerdyDad
                                        last edited by

                                        An update and closing to this problem. I also posted about this on SW as well. My RAID6 was on the verge of failing. I had 2 disks die on me, got them swapped, and the SAN was attempting to rebuilt both of them at the same time while a 3rd and 4th disk was wanting to die as well. During the rewrites to the drives, the SAN would hit a bad sector and fail the rewrite, causing the SAN to go offline and taking the vm down with it. We eventually had to take the last pulled drive to a data restoration place with a spare and they were able to get the data off of one drive and onto another overnight. That cost us about $2,400, but when you're talking about millions of dollars of orders a week, $2.4k is a drop in the bucket.

                                        I did call Dell support and they were kind enough to remote in and assess the situation, even escalating it to a Storage Engineer. The SE got in and was able to tell me what was going on. My firmware was 2nd from the latest version, both cards were working well, but was about to lose the RAID6 array.

                                        Lessons learned:

                                        1. Make sure and double check that you have backups to business critical servers. Test them. Especially if you have the spare hardware doing nothing. If you have the hardware and are not testing your backups, you are doing yourself a serious injustice. Please refer back to Veeam's 3-2-1 rule when it comes to backup strategizing.
                                        2. Keep an eye on your SANs and keep them happy. Replace disks when needed and keep the firmware up to date. Replace your disks and have spares on the shelf.
                                        3. Management (if you are listening): Put your IT department on a 5-7 year refresh cycle. All machines are man-made. Man is fallible. Therefore, so are the machines that they make. Machines are going to fail eventually. Make sure that you have an architecture that is fault tolerant and able to be replaced on a 5-7 year cycle. Plus, keep a maintenance agreement with each of these manufacturers as long as you are on the equipment.
                                        4. Assess your design architecture. Are you currently using the Inverted Pyramid of Doom (IPOD)? If so, and management allows, get off of it. Go to Hyperconvergence. At your main data location, make sure that you have at least 3 nodes. 2 for load balancing and 1 for failover. At each of your additional sites, put in at least 2 nodes, 1 for production and 1 for a backup. Still keep to the 5-7 year refresh cycle.

                                        My management has decided not to check out hyperconvergence, but are sticking with the IPOD scheme for now. We are going to be reutilizing one of our EQL's for replication of data from the Compellent SAN. However, I want to note in one of @scottalanmiller's videos that added complexity does not increase resiliency in the network, but adds more of Moore's Law saying that if it can fail, it will fail.

                                        scottalanmillerS 2 Replies Last reply Reply Quote 1
                                        • NerdyDadN
                                          NerdyDad
                                          last edited by

                                          Also, the SAN in question has bee retired. We have 2 others in our datacenter that has their data pulled from them and the SANs in question have been taken offline. I'll go and pull them out of the datacenter tomorrow.

                                          1 Reply Last reply Reply Quote 1
                                          • scottalanmillerS
                                            scottalanmiller @NerdyDad
                                            last edited by scottalanmiller

                                            @NerdyDad said in Replacing the Dead IPOD, SAN Bit the Dust:

                                            1. Management (if you are listening): Put your IT department on a 5-7 year refresh cycle.

                                            That's not at all the issue here. There are three real issues, none of them related to the age of the equipment, this could have happened on day one with new gear.

                                            • Using low end gear that isn't designed for high reliability when highly reliable is needed (you said that $2,400 for data recovery was a drop in the bucket, and yet they chose gear that doesn't reflect that financial reality). Your SAN is around the home line, it's not something I would use in any production scenario.
                                            • Using an appliance without support. This is way below the home line.
                                            • Using an architecture that is designed to be ultra risky without benefit. (You addressed, this, just pointing it out again.)

                                            Fix any of those three mistakes that the issue would have been avoided.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 5 / 5
                                            • First post
                                              Last post