Single SSD PCIe vs HDD RAID Reliability

Francesco Provino

Hello, I'm about to move one of the infrastructure I administer from IPOD (one SAN with 12 spindles) to local PCIe SSD replicated into SAN. I just want to hear some opinion about single PCIe SSD reliability VS a spindles array…

Or, better, should the PCIe card be as reliable as an hardware RAID card? NAND wearing aside, of course.

scottalanmiller

This is obviously a tough question to answer as there are so many factors involved. First there is a move from a module system (drives and RAID controller separate), then from spinning rust traditional hard drives (aka Winchester drives) to SSD (solid state drives) and then from tradition RAID arrays to whatever is being done under the hood on the PCIe SSD system (generally we assume erasure encoding on multiple devices but attached to a single card.) So there is a lot going on here to consider.

scottalanmiller

One of the biggest factors here is the specific PCIe SSD card in question. The big reason for moving to the PCIe SSD model is speed, unmitigated, blinding speed by removing the SAS and SATA bottlenecks and by getting all of the components integrated together. By doing this the controller and drives can work together in a very intelligent way providing for better tuning, lower latency and way more bandwidth. It is a great design.

Top end card makers like FusionIO build cards with extreme degrees of reliability (and cost.) They are extremely reliable and used in very high end, high criticality businesses for the most extreme workloads. So there is little concern about them from a general reliability standpoint.

One would generally expect a PCIe SSD array to be more reliable than a traditional, and much lower cost, hardware RAID card.

The big advantage to the traditional RAID card, RAIDed disks and hot swap model is that while the drives wear out presumably more often they are also designed to be effectively disposable and it is trivial to replace them, on the fly, with little impact to the running workloads. Hot swapping drives is a big benefit that should not be overlooked either.

Generally PCIe SSE is considered when extreme IOPS are needed beyond what hardware RAID or even software RAID plus SATA SSDs can deliver.

Dashrender

What percentage of the wall street company's servers used Fusion IO devices?

scottalanmiller

@Dashrender said:

What percentage of the wall street company's servers used Fusion IO devices?

Percentage of companies probably approaches 100%. FusionIO is just where it is at.

Percentage of servers is probably 1-5%.

Dashrender

@scottalanmiller said:

@Dashrender said:

What percentage of the wall street company's servers used Fusion IO devices?

Percentage of servers is probably 1-5%.

I was asking to show how little of the company's infrastructure is actually using PCIe SSD systems instead of SSD or HDD systems. Basically to show that most of us probably don't need PCIe systems. That's not to say none of us do, just the majority don't.

scottalanmiller

Definitely only a very tiny percentage of workloads in the SMB need SSD, let alone PCIe class SSD solutions. Although what the enterprise space does might be misleadingly low because they are often using massive high performance SAN systems for storage consolidation and getting a price that justifies not running PCIe SSD in most cases that the SMB cannot get. So it is entirely possible that the SMB might have an easier time justifying a PCIe SSD designed system than a typical enterprise would. They have different scales and needs. SMBs would essentially never use things like Pure SSD systems or 96Gb/s SAN that enterprises have little issue justifying. So the balance of which one might choose the ultra fast PCIe SSD approach might be skewed in that way.

Reid Cooper

When do you start to consider PCIe SSD rather than SSD drives connected to a normal RAID controller?

Francesco Provino

@scottalanmiller Specifically, I'm talking about Intel p3500 or p3600 in any server VS IBM FC SAN DS3500 direct attached (so, DAS in truth) with 12x15k spindles. Or, considering HDD local storage, a m5110 RAID card on each server.

Francesco Provino

@Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

The price of a set of SAS SSD from IBM (the only ones supported by the RAID controller that I have in my servers!) that match IOPS and capacity of an Intel (or other brands, of course) PCIe SSD is roughly three times…

We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

scottalanmiller

@Francesco-Provino said:

@scottalanmiller Specifically, I'm talking about Intel p3500 or p3600 in any server VS IBM FC SAN DS3500 direct attached (so, DAS in truth) with 12x15k spindles. Or, considering HDD local storage, a m5110 RAID card on each server.

Those are super cheap PCIe boards from a manufacturer with a horrific track record in this space (their SSDs are generally good but their overall mobos and storage systems are some of the worst.) I would be very wary relying on an Intel board for server usage. Intel seems to lack an "enterprise mindset" and sees the storage world as one of desktops and disposable storage.

These are new boards, only first released this year. As they appear to just be a single SSD strapped to a PCIe board and priced as such - what are your concerns around reliability?

scottalanmiller

@Francesco-Provino said:

We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

Traditional enterprise boards like FusionIO have very good reliability track records. Intel is new to the game and has a good reputation in the SSD space and a bad one in "non-drive storage space." Put the two together and this would be a rather unknown scenario with them.

scottalanmiller

@Francesco-Provino said:

@Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

By a combination of removing the SATA bottleneck, but also by skipping the RAID.

Francesco Provino

@scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

Francesco Provino

@scottalanmiller said:

@Francesco-Provino said:

@Reid-Cooper As of today, IOPS/€-wise the NVMe PCIe are actually way more cheaper than SAS SSD!

By a combination of removing the SATA bottleneck, but also by skipping the RAID.

Exactly, I think this is definitely a win-win approach.

Francesco Provino

@scottalanmiller said:

@Francesco-Provino said:

We need to move to local storage, and it seems to me that this is the most convenient approach; but anyway, I was trying to fetch some information about reliability…

Traditional enterprise boards like FusionIO have very good reliability track records. Intel is new to the game and has a good reputation in the SSD space and a bad one in "non-drive storage space." Put the two together and this would be a rather unknown scenario with them.

I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

scottalanmiller

@Francesco-Provino said:

@scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.

scottalanmiller

@Francesco-Provino said:

Exactly, I think this is definitely a win-win approach.

If the only goal is IOPS. What workload do you have that is that sensitive to IOPS? They exist, especially databases, but what is the place for downtime? Typically I would expect systems using these drives to have either a RAIN storage system so that storage is covered that way or be part of a network replicated system like a Hyper-V fault tolerant cluster with Starwind replicating between the nodes. That way if one node fails you can run from another which has a copy of the data until the first one is repaired.

In a stand alone node I would only use these if data is highly static or does not need to generally be backed up. Those are rarely the case in systems that need extreme IOPS.

scottalanmiller

@Francesco-Provino said:

I know about the legendary reliability of FusionIO, but… I don't think we really need THAT much reliability, not at this price! Replicating every VM to SAN, I can power on the VMs of a failure node in almost no time directly from the SAN.

There are three ways to handle this replication:

Full Synchronization replication
Asynchronous replication
Backup mechanisms

Of these you have these impacts or tradeoffs:

Full Sync: This is a form of network RAID 1. You will need to wait for the SAN to respond that it has written a copy of the data. While your read performance will be as fast as the Intel PCIe SSD can go, the writes will be as slow as the SAN can do. So while this is safe and allows for storage failover without dataloss or downtime, the impact to writes is enormous.

Async: Data is only crash consistent. You can have "nearly every byte" that you had before but data can and sometimes does corrupt. It cannot be tested as corruption only happens some of the time and typically happens under load. So there is a risk that your SAN would be corrupted and useless in the event of the PCIe SSD failing.

Backup/Restore: Needs quiescence to be safe which inflicts a performance penalty on its own. In the event of a PCIe SSD failure you are doing a DR scenario and facing some dataloss.

So there are options, each with different caveats. It would depend on what needs your business has as to which would make sense for you.

Francesco Provino

@scottalanmiller said:

@Francesco-Provino said:

@scottalanmiller So Intel 5-years warranty has no such value in this case? I'll be happy to replace them every 3-4 years…

Warranties have little value when you are talking about your data and uptime. A warranty is to guarantee that you have equipment for the duration, not that the things that you store on that equipment continue to exist. If we are talking a desktop on which no critical data is stored and you have a spare desktop to use until Intel replaces the SSD, sure, the warranty has value. If we are talking about a server holding your critical data the warranty presumably has almost no value.

When the PCIe SSD fails you will need to order the warranty replacement. What is the replacement terms - four hours, six hours, next business day, two weeks? Do you have to return the failed one first and wait for them to test it? Remember this is a complete storage system not just one drive in a RAID array. When HP or Dell do warranty replacement of a drive there is no downtime or dataloss. When Intel does a replacement of these drives, you are without storage for some amount of time and once replaced, the data from the old SSD is gone.

I know about It, but thanks to the replication I think we can live with that. We can have few hours of downtime without losing too much money.
We mainly do VDI and database stuff… it's not that we require such great IOPS count, but… what are the alternatives? Buy IBM spindles in 2015, at an higher price of the SSD? Double the price for 1/100 IOPS? Does it really makes sense?