NVMe and RAID?
-
@scottalanmiller Yeah I was thinking about bonding multiple 10Gbe together from the server to the switch. Its some stuff I'd have to learn as I don't use anything with 10Gbe currently. I'm not even sure if twin or quad 10Gbe could handle the speeds of NVMe.
-
@biggen said in NVMe and RAID?:
@scottalanmiller Yeah I was thinking about bonding multiple 10Gbe together from the server to the switch. Its some stuff I'd have to learn as I don't use anything with 10Gbe currently. I'm not even sure if twin or quad 10Gbe could handle the speeds of NVMe.
Bonding has overhead, so no, it won't handle NVMe. Really, nothing will. You really want 40GbE or 100GbE or Infiniband to start getting closer, but absolutely nothing touches NVMe's real speeds except the PCI bus.
A relatively standard NVMe system today is 64Gb/s and you can get higher. And that's for a single drive. So only the most extreme networking can even touch the bandwidth, and that's before you consider the latency problems.
So the question becomes, is the extreme extra speed of NVMe valuable to you once you are just using general networking like 10GigE? Likely no, because even standard SSD is going to flood that.
-
What about 10GBe bonding with SAS SSDs? Would you remove the network bottleneck with 20Gbe links?
At least going with SAS/SATA puts hardware RAID back on the table for easier admin.
-
@biggen There should be an option for a PERC RAID controller that does NVMe RAID?
All of our EPYC Rome setups are going into Hyper-Converged so no need for a RAID controller.
-
@biggen said in NVMe and RAID?:
What about 10GBe bonding with SAS SSDs? Would you remove the network bottleneck with 20Gbe links?
Bonding will help with bandwidth, but will hurt latency. When you bond, the CPU has to do some work before things are sent down the pipe, so this slows things down. And you don't get 100% efficiency. A two way bond is pretty good, you get something akin to 195% the performance of a single pipe. The third bond is much less. The fourth, way less. And the fifth, so little that no one even discusses it. By the sixth, it's assumed that you are just getting slower, not faster.
-
@PhlipElder I didn't see the option playing with the Dell system builder online. Honestly, the price of NVMe is so ridiculously expensive I'm not sure it warrants much investigation at this point. Single drives are in the $3k and up range depending on the size.
@scottalanmiller said in NVMe and RAID?:
@biggen said in NVMe and RAID?:
What about 10GBe bonding with SAS SSDs? Would you remove the network bottleneck with 20Gbe links?
Bonding will help with bandwidth, but will hurt latency. When you bond, the CPU has to do some work before things are sent down the pipe, so this slows things down. And you don't get 100% efficiency. A two way bond is pretty good, you get something akin to 195% the performance of a single pipe. The third bond is much less. The fourth, way less. And the fifth, so little that no one even discusses it. By the sixth, it's assumed that you are just getting slower, not faster.
What do you suggest then for removing the network bottleneck of 12GB SAS SSDs? Bonded twin 10Gbe links or a single 25GBe connection? The cost of switches and adapter cards goes up I'm assuming moving to 25Gbe.
-
@biggen Right now, the only place we're using NVME in servers is for either cache in a hybrid storage setting (cache/capacity or cache/performance/capacity) or for servers with all NVMe.
Intel's VROC plug-in dongle enables RAID 1 in certain settings. That's driven by the CPU. Not sure Dell supports it.
For most applications, an R740xd with high performance NV-Cache and SATA SSD in RAID 6 will do. Intel SSD DC-S4610 series (D3-4610).
We have plenty of setups like that for virtualized SQL/database workloads as well as 4K/8K video storage.
EDIT: Forgot, in the Intel Server Systems we deploy we install a couple Intel NVMe drives, the VROC dongle for Intel only NVMe, and RAID 1 them for the host OS.
-
@scottalanmiller said in NVMe and RAID?:
@biggen said in NVMe and RAID?:
@brianlittlejohn So I guess the days of having a Jr. Admin blind swap are over then? It takes much more care and instruction to use software RAID.
That's correct. If you want that level of performance, blind swapping is kind of over. For now.
Not so much, at least for the major server vendors. Every back plane will let you flash the activity light of a failed drive. I know iDRAC and md allow you to do this (haven't had a reason to look into HP or Supermicro, but would be surprised if it were not built in as well.)
-
@PhlipElder said in NVMe and RAID?:
@biggen Right now, the only place we're using NVME in servers is for either cache in a hybrid storage setting (cache/capacity or cache/performance/capacity) or for servers with all NVMe.
Intel's VROC plug-in dongle enables RAID 1 in certain settings. That's driven by the CPU. Not sure Dell supports it.
For most applications, an R740xd with high performance NV-Cache and SATA SSD in RAID 6 will do. Intel SSD DC-S4610 series (D3-4610).
We have plenty of setups like that for virtualized SQL/database workloads as well as 4K/8K video storage.
EDIT: Forgot, in the Intel Server Systems we deploy we install a couple Intel NVMe drives, the VROC dongle for Intel only NVMe, and RAID 1 them for the host OS.
Thanks for that info. Yeah, I'm thinking NVMe is probably overkill for video editing over a network connection. Especially considering the fact that he would be network bound anyway. I was thinking either 12Gb SAS SSDs in RAID 1 (2TB+ variety) or 6Gb SATA SSDs in Raid 1. This at least gives the option to go back to hot/blind swap with the appropriate PERC.
@travisdh1 said in NVMe and RAID?:
@scottalanmiller said in NVMe and RAID?:
@biggen said in NVMe and RAID?:
@brianlittlejohn So I guess the days of having a Jr. Admin blind swap are over then? It takes much more care and instruction to use software RAID.
That's correct. If you want that level of performance, blind swapping is kind of over. For now.
Not so much, at least for the major server vendors. Every back plane will let you flash the activity light of a failed drive. I know iDRAC and md allow you to do this (haven't had a reason to look into HP or Supermicro, but would be surprised if it were not built in as well.)
That allows you to ID a bum drive but you still have no way to rebuild it automatically like you would in a blind swap, right?
-
@biggen said in NVMe and RAID?:
@PhlipElder said in NVMe and RAID?:
@biggen Right now, the only place we're using NVME in servers is for either cache in a hybrid storage setting (cache/capacity or cache/performance/capacity) or for servers with all NVMe.
Intel's VROC plug-in dongle enables RAID 1 in certain settings. That's driven by the CPU. Not sure Dell supports it.
For most applications, an R740xd with high performance NV-Cache and SATA SSD in RAID 6 will do. Intel SSD DC-S4610 series (D3-4610).
We have plenty of setups like that for virtualized SQL/database workloads as well as 4K/8K video storage.
EDIT: Forgot, in the Intel Server Systems we deploy we install a couple Intel NVMe drives, the VROC dongle for Intel only NVMe, and RAID 1 them for the host OS.
Thanks for that info. Yeah, I'm thinking NVMe is probably overkill for video editing over a network connection. Especially considering the fact that he would be network bound anyway. I was thinking either 12Gb SAS SSDs in RAID 1 (2TB+ variety) or 6Gb SATA SSDs in Raid 1. This at least gives the option to go back to hot/blind swap with the appropriate PERC.
We deployed an Intel Server System R2224WFTZSR 2U dual socket with a pair of Intel Xeon Gold 6240Y processors. We set up two dual-port Intel x540-T2 10GbE network adapters and a pair of LSI SAS HBAs for external SAS cable connections. It's purpose was to host two to four virtual machines for 150 to 300 1080P cameras throughout a building.
Between 5 and 15 of those camera streams would be processed by recognition software and fire e-mail flags off to management staff for various conditions.
Storage is a pair of Intel SSDs for the host OS, a pair of Intel SSD D3-S4610 series in RAID 1 for the high I/O processing, and an HGST 60-bay JBOD loaded with 12TB NearLine SAS drives.
We used Storage Spaces to set up a 3-way mirror on the drives in the JBOD yielding 33% production storage.
Constant throughput is about 375MB/Second to 495MB/Second depending on how many folks are moving through the building.
We've put a number of other virtual machines on the server to utilize more CPU.
4K video editing is something we have on the radar for these folks as they've started filming their vignettes and other recordings in 4K.
-
@biggen said in NVMe and RAID?:
I was playing around on the Dell configuration website building out an Epyc 2 socket machine with an NVMe backplane. What I noticed is there is no RAID availability for this configuration. How is this handled then if I wanted to put in two identical NVMe U.2 drives and mirror them? Is hardware RAID not an option for this configuration? Is this left to the OS you choose now?
We spec'd a handful of those Epyc 2 Dells with NVMe last year for a hyperconverged cluster.
Intel has VROC which is md raid (software raid) behind the scenes but that doesn't work on AMD CPUs. And you need BIOS support etc.
But I know people who put in 8 NVMe drives and run standard md raid with massive performance numbers.
Blind swap is not a big deal. You can fix that with a simple cron job. If the array is degraded and you put in a new drive in the old slot with the same or larger capacity it will automatically start a rebuild.
-
@PhlipElder What is the camera VMS solution you are using? Milestone? Axis?
-
@Pete-S I'll have to look again then at Intel offering. I figured AMD had Intel blown out of the water as far as cost-per-core offerings go nowadays.
-
@biggen said in NVMe and RAID?:
@Pete-S I'll have to look again then at Intel offering. I figured AMD had Intel blown out of the water as far as cost-per-core offerings go nowadays.
On a pound for pound basis the AMD EPYC Rome platforms we are working with are less expensive and vastly superior in performance.
-
I don't see any VROC mentioning in the system builder for any of the configurations I've done for the Intel systems. I'm guessing that is because Dell wants you to buy a PERC instead.
-
@biggen said in NVMe and RAID?:
I don't see any VROC mentioning in the system builder for any of the configurations I've done for the Intel systems. I'm guessing that is because Dell wants you to buy a PERC instead.
Probably.
So long as you're using some linux based platform for the host, it shouldn't be an issue. All of them support booting to some sort of software RAID.
-
@travisdh1 Yeah it would be a Debian VM providing the SMB share (via Proxmox or xcp-ng) so MD RAID isn't an issue. Proxmox can use ZFS Raid 1 whilst xcp-ng can do standard MD RAID.
Edit: Dell even has that BOSS add-in system that allows for a RAID 1 bootable volume just for the OS. The NVMe drives could be VM storage only if I go that route.
-
@biggen NVMe storage is indeed ridiculously fast. When I say fast think about its latency rather than throughput. In practice, their performance really shines with heavily used relational DBs. Doing RAID over the network with NVMe would require at least 25 GbE with RDMA support end-to-end and would work even better with NVMeoF initiator. Otherwise, network latency would be a bottleneck. However, for 4k video editing, 10 GbE end-to-end with SSD storage on the server should be sufficient.
There is a better alternative than interface bonding between a single file server and clients, it's called SMB multi-channel support that uses multiple network interfaces for data transfers (clients need to have multiple NICs though). This way network bandwidth is aggregated with active-active paths not load balanced with active-passive. The downside is SMB Multichannel works reliably in all Windows environment, its Samba implementation is patchy. Mac OS doesn't support it at all AFAIK.
-
NVMe drives are the same price as SAS3 - with the same write endurance / manufacturer.
If you go Dell, because you want them holding your hand, you'll pay the 2-3 times as much for the drives. That's just the way it is.
Consider that more than one person can access the fileserver at the same time,. You can get away with 10GbE at the clients (bonding doesn't help at the client). That means a 100 GB video file will take 100 seconds to transfer.
However you need more than that on the server and your array need to be able to handle more than 1 GB (gigabyte) per sec.
Most 10GbE switches have 40GbE ports as well. So a two port 40GbE NIC on the server will allow 8 streams of 1 GB/sec for a total of 8GB/sec.
That means that your array need to handle 8 GB/sec. You need a lot of drives if you're not going with NVMe drives to get that kind of performance.
If you do a fileserver like this, skip the hypervisor completely and run it on bare metal. You'll lose at ton of performance otherwise.
Also, latency means nothing in your application. It's all about transfer rate.
So something like debian on bare metal, md raid and use 4TB or larger NVMe U2 drives.
Go for a CPU with high base frequency. High I/O rates from NVMe drives will use a quite a bit of CPU power. You don't need lots and lots of cores though. Go for drives with 1 DWPD for best value. -
@taurex Thanks for that information. More to go over for me it seems!
@Pete-S I figure going Dell or HPE is the way to go for him. He needs to have a support contract behind something like this and it doesn't need to be me.
I hadn't considered uplinks of 40Gbe+. Makes sense.
Skip the hypervisor, huh? I figured it would add a performance penalty but makes backups that are so much easier. I don't even know how to perform bare metal backups on servers. Backing up the video files being worked on would be easy via a traditional Synology NAS (or custom built solution) but backing up the OS in the event that a update renders it broken would take some thought.
I assume Samba could keep up with 8GB/sec (assumes ~8 users all transferring at the same time) so long as the underlying storage is performant enough so Samba isn't waiting?