ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Single SSD PCIe vs HDD RAID Reliability

    Scheduled Pinned Locked Moved IT Discussion
    raidssdstoragereliabilitywinchester drive
    44 Posts 4 Posters 14.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @Francesco Provino
      last edited by

      @Francesco-Provino said:

      @Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

      By a combination of removing the SATA bottleneck, but also by skipping the RAID.

      F 1 Reply Last reply Reply Quote 0
      • F
        Francesco Provino @scottalanmiller
        last edited by

        @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

        scottalanmillerS 1 Reply Last reply Reply Quote 0
        • F
          Francesco Provino @scottalanmiller
          last edited by

          @scottalanmiller said:

          @Francesco-Provino said:

          @Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

          By a combination of removing the SATA bottleneck, but also by skipping the RAID.

          Exactly, I think this is definitely a win-win approach.

          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • F
            Francesco Provino @scottalanmiller
            last edited by

            @scottalanmiller said:

            @Francesco-Provino said:

            We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

            Traditional enterprise boards like FusionIO have very good reliability track records. Intel is new to the game and has a good reputation in the SSD space and a bad one in "non-drive storage space." Put the two together and this would be a rather unknown scenario with them.

            I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

            scottalanmillerS 1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller @Francesco Provino
              last edited by

              @Francesco-Provino said:

              @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

              Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

              When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.

              F 1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller @Francesco Provino
                last edited by

                @Francesco-Provino said:

                Exactly, I think this is definitely a win-win approach.

                If the only goal is IOPS. What workload do you have that is that sensitive to IOPS? They exist, especially databases, but what is the place for downtime? Typically I would expect systems using these drives to have either a RAIN storage system so that storage is covered that way or be part of a network replicated system like a Hyper-V fault tolerant cluster with Starwind replicating between the nodes. That way if one node fails you can run from another which has a copy of the data until the first one is repaired.

                In a stand alone node I would only use these if data is highly static or does not need to generally be backed up. Those are rarely the case in systems that need extreme IOPS.

                1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @Francesco Provino
                  last edited by

                  @Francesco-Provino said:

                  I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

                  There are three ways to handle this replication:

                  • Full Synchronization replication
                  • Asynchronous replication
                  • Backup mechanisms

                  Of these you have these impacts or tradeoffs:

                  Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.

                  Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.

                  Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.

                  So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.

                  F 1 Reply Last reply Reply Quote 0
                  • F
                    Francesco Provino @scottalanmiller
                    last edited by

                    @scottalanmiller said:

                    @Francesco-Provino said:

                    @scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

                    Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

                    When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.

                    I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.
                    We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?

                    scottalanmillerS 2 Replies Last reply Reply Quote 0
                    • F
                      Francesco Provino @scottalanmiller
                      last edited by

                      @scottalanmiller

                      @scottalanmiller said:

                      @Francesco-Provino said:

                      I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

                      There are three ways to handle this replication:

                      • Full Synchronization replication
                      • Asynchronous replication
                      • Backup mechanisms

                      Of these you have these impacts or tradeoffs:

                      Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.

                      Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.

                      Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.

                      So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.

                      Thanks for the clarification on replication, I really appreciate it.
                      We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @Francesco Provino
                        last edited by

                        @Francesco-Provino said:

                        I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.

                        How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.

                        F 1 Reply Last reply Reply Quote 0
                        • F
                          Francesco Provino @scottalanmiller
                          last edited by Francesco Provino

                          @scottalanmiller said:

                          @Francesco-Provino said:

                          I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.

                          How quickly does Intel do replacements? Intel is not an enterprise supplies like HP, Dell or Fujitsu.

                          As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

                          So, we can wait some days for Intel.

                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @Francesco Provino
                            last edited by

                            @Francesco-Provino said:

                            We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?

                            That's what I would call a "leap alternative." The two are not comparable. The Intel board has more IOPS, but does that matter? I feel like that is a red herring here, definitely for VDI. Not that it is bad, just that the fact that it is 100x higher is pointless (and incorrect, by desktop SSD is quite old and only 1/4th speed of these so you should be able to get in the ballpark.)

                            You are jumping from "third party unsupported SSD" in one case to "primary OEM fully warranted and supported" in the other. Of course one is drastically more cost effective. But all that you are showing is that full enterprise support on hard drives is costly. You are comparing apples to oranges.

                            If you want to see a reasonable alternative to a third party, unsupported PCIe SSD you would compare against third party, usupported SATA SSD. In which case you would find that you could be doing RAID 10 with hundreds of thousands of IOPS for around $400 or RAID 5 for around $300. Suddenly the cost per IOPS is pretty similar.

                            1 Reply Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller @Francesco Provino
                              last edited by

                              @Francesco-Provino said:

                              As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

                              What is the manner of replication?

                              F 1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @Francesco Provino
                                last edited by

                                @Francesco-Provino said:

                                We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).

                                So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.

                                What is the time of dataloss if you need to go to the QNAP to do a restore?

                                F 1 Reply Last reply Reply Quote 0
                                • scottalanmillerS
                                  scottalanmiller
                                  last edited by

                                  So the real question is this....

                                  What makes 400K IOPS without RAID worth $600 - $800 when 300K IOPS with RAID is just $300 for this specific use case?

                                  F 1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller
                                    last edited by

                                    And, it should be pointed out, that a $300 RAID 5 array here is likely safer (both in terms of continuous uptime as well as in terms of dataloss) than the PCIe SSD + the SAN replication. If it were me, and I had to choose between the RAID array and the async replication to an external SAN I'd take the SSD RAID 5 array because it is fully consistent, not just crash consistent.

                                    1 Reply Last reply Reply Quote 0
                                    • F
                                      Francesco Provino @scottalanmiller
                                      last edited by

                                      @scottalanmiller said:

                                      @Francesco-Provino said:

                                      As I said, I can restart the VMs almost immediately on the SAN (they are replicated, so ready to be restarted), or restore them, both from replication pool or from backup, to one of the other two servers local storage.

                                      What is the manner of replication?

                                      VMware Replication to the SAN, Veeam to the NAS.

                                      1 Reply Last reply Reply Quote 0
                                      • F
                                        Francesco Provino @scottalanmiller
                                        last edited by

                                        @scottalanmiller said:

                                        @Francesco-Provino said:

                                        We will do both async replication from SSD to SAN (direct attach Fibre Channel, already in place, our main storage pool as of today) and backup to a NAS unit (QNAP, big SATA drives).

                                        So the failover to the SAN is risky in that data could be lost because it is only crash consistent and the filesystem and/or databases might be corrupted when attempting to use it.

                                        What is the time of dataloss if you need to go to the QNAP to do a restore?

                                        That's always true with async replication. QNAP is in the same building, connected with gigabit network. In my tests, I can retrieve the backup of our biggest VM in almost an hour and a half. Totally ok for us.

                                        1 Reply Last reply Reply Quote 0
                                        • F
                                          Francesco Provino @scottalanmiller
                                          last edited by

                                          @scottalanmiller said:

                                          So the real question is this....

                                          What makes 400K IOPS without RAID worth $600 - $800 when 300K IOPS with RAID is just $300 for this specific use case?

                                          @scottalanmiller said:

                                          And, it should be pointed out, that a $300 RAID 5 array here is likely safer (both in terms of continuous uptime as well as in terms of dataloss) than the PCIe SSD + the SAN replication. If it were me, and I had to choose between the RAID array and the async replication to an external SAN I'd take the SSD RAID 5 array because it is fully consistent, not just crash consistent.

                                          Unfortunately, this is not my case: OEM SSD aren't supported with our RAID cards in the servers, and VMware can't do software RAID (apart from, well, sort of, uhm, VSAN).
                                          IBM's SAS SSD are still incredibly expensive.

                                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller
                                            last edited by

                                            If you can compare Samsung drives like this one: http://www.amazon.com/Samsung-2-5-Inch-Internal-MZ-75E500B-AM/dp/B00OBRE5UE/ref=sr_1_1?ie=UTF8&qid=1447070361&sr=8-1&keywords=samsung+ssd+500GB

                                            And the details on the Intel PCIe card: http://www.thessdreview.com/our-reviews/intel-ssd-dc-p3700-nvme-ssd-enthusiasts-report/ (that's the p3700, not the p3500)

                                            It seems like the PCIe card is difficult to choose in this case. You can get more IOPS for less money and more protection from the SATA SSDs still.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 2 / 3
                                            • First post
                                              Last post