@alboup said in Erasure Coding:
Hi all,
Erasure coding - is it a safe for use in production on an all-flash array? I'm specifically talking about VMware's vSAN here, however the question is fairly broad.
The alternative is one I'm most familiar and comfortable with; RAID1(/10) mirrored to another node(s) to provide simple/reliable fault tolerance. However, there are plenty of people talking up the new RAID5/RAID6 erasure coding features as it substantially reduces overheads. Apparently there is far less risk of failure due to the much lower URE rates in flash storage.
I'm curious what you guys think? Is it risk adverse? @scottalanmiller has posted up some passionate threads in the past about why R5/R6 is the devil (which I totally agree with) so where do you stand with erasure coding?
Thanks
- VMware VSAN has software RAID5 and RAID6, these are XOR-based software parity RAID, I don't know why VMware decided to call them "erasure coding" (typically it's something like Reed-Solomon codes or whatever). Probably they decided "parity RAID" isn't cool and "erasure coding" is cool.
- Microsoft erasure coding is indeed one coming from Azure and while it's OK it's a speed freak because it uses GLOBAL parity, means some regions will be updated more frequently compared to other ones. FTL will take care o that but their E/C was never designed to run with flash for sure!
https://www.usenix.org/conference/atc12/technical-sessions/presentation/huang
Verdict: you can use whatever you want in production, both solutions have many-many adopters but none of them wasn't;t designed to run on flash (think about Pure engine) just because in such a case erasure coding should be done within FTL (flash translation layer) on so-called OpenSSDs (or their equivalent, whatever Pure is calling them).
Hope this helped