Data archive is not backup! What do you use?
-
I think that the biggest question will be around how you select what to archive and manage it.
-
@scottalanmiller said in Data archive is not backup! What do you use?:
@Francesco-Provino said in Data archive is not backup! What do you use?:
@matteo-nunziati said in Data archive is not backup! What do you use?:
+1 for tar.bz2, you can encrypt it if you want.
I agree, it's fairly common and almost every distrp ship it as default. I've read in many sites that xz and other lzma-based compress even more, but there are some doubts about the recoverability, the format is more complex and at least 10x slower…
BZ is so good. Probably not worth pushing it farther in most cases.
Now experimenting with ZPAQ...
-
@Francesco-Provino said in Data archive is not backup! What do you use?:
@scottalanmiller said in Data archive is not backup! What do you use?:
@Francesco-Provino said in Data archive is not backup! What do you use?:
@matteo-nunziati said in Data archive is not backup! What do you use?:
+1 for tar.bz2, you can encrypt it if you want.
I agree, it's fairly common and almost every distrp ship it as default. I've read in many sites that xz and other lzma-based compress even more, but there are some doubts about the recoverability, the format is more complex and at least 10x slower…
BZ is so good. Probably not worth pushing it farther in most cases.
Now experimenting with ZPAQ...
How is it?
-
@StrongBad said in Data archive is not backup! What do you use?:
@Francesco-Provino said in Data archive is not backup! What do you use?:
@scottalanmiller said in Data archive is not backup! What do you use?:
@Francesco-Provino said in Data archive is not backup! What do you use?:
@matteo-nunziati said in Data archive is not backup! What do you use?:
+1 for tar.bz2, you can encrypt it if you want.
I agree, it's fairly common and almost every distrp ship it as default. I've read in many sites that xz and other lzma-based compress even more, but there are some doubts about the recoverability, the format is more complex and at least 10x slower…
BZ is so good. Probably not worth pushing it farther in most cases.
Now experimenting with ZPAQ...
How is it?
This one. It's included by default in most linux distro.
It does deduplication… but it really takes forever, 11 hours to compress 43 Gb to 21Gb using 100% CPU of a 4 core xeon e3.I think I'll stick with bzip2 or lzma (xz) for now, I don't think higher compression ration are really worth the price.
-
Wow, that is a lot of CPU!!
-
@scottalanmiller said in Data archive is not backup! What do you use?:
Wow, that is a lot of CPU!!
Yes, maybe because I also set the chunk compression to the maximum ratio.
-
I think deduplication is not worth the cpu/ram cost in most cases (thinking about ZFS E.G.).
-
@Francesco-Provino said in Data archive is not backup! What do you use?:
I think deduplication is not worth the cpu/ram cost in most cases (thinking about ZFS E.G.).
That's generally true. Most storage vendors agree with you when the engineers are talking. Sales people, of course, love selling deduplication.
-
Deduplication tends to be good for archival data or as an offline process that runs only during idle times directly on the storage. Inline dedupe is rarely worth it.
-
@scottalanmiller said in Data archive is not backup! What do you use?:
Deduplication tends to be good for archival data or as an offline process that runs only during idle times directly on the storage. Inline dedupe is rarely worth it.
Deduplication makes the archives much more fragile. A bit flip in the right chunk can potentially blow the whole archive.
What percentage of gained space is worth the loss of recoverability?
With b2 at 0.005, glacier at 0.004, magnetic and tape storage still getting cheaper, why add complexity and risk for a little saving? The space gained is ~10% or less compared with LZMA compression for my dataset, that is a typical smb one. -
@Francesco-Provino never used those (b2, glacier) how do you access them? REST API? client? anything special required?
-
@Francesco-Provino said in Data archive is not backup! What do you use?:
Deduplication makes the archives much more fragile. A bit flip in the right chunk can potentially blow the whole archive.
Not really, it would only impact deduped data. So the data that is stored many, many times yes each copy would be effected, but only data that was all the same.
-
@matteo-nunziati said in Data archive is not backup! What do you use?:
@Francesco-Provino never used those (b2, glacier) how do you access them? REST API? client? anything special required?
They are basically the same as S3. We use B2 and we access it via the API. There is a toolkit for Linux which is super easy to use.
-
-
@scottalanmiller so basically, if I want to move stuff from a NAS appliance (which does'nt support the thing), I need a VM in the middle to manage the copy/move/remove operations. right? (ok, then stopping hijacking the thread)
-
@matteo-nunziati said in Data archive is not backup! What do you use?:
@scottalanmiller so basically, if I want to move stuff from a NAS appliance (which does'nt support the thing), I need a VM in the middle to manage the copy/move/remove operations. right? (ok, then stopping hijacking the thread)
That's the case with anything, really. Most NAS support it though. NetApp doesn't, but eww, avoid that. Synology, ReadyNAS, ioSafe, SAM-SD, most NAS support it.
-
@matteo-nunziati said in Data archive is not backup! What do you use?:
@Francesco-Provino never used those (b2, glacier) how do you access them? REST API? client? anything special required?
I use both via the CLI, it's very easy to script the upload of the archives :).
This is the official guide for the AWS cli. -
If you want access to Backblaze B2 on a NAS that doesn't support it, the specific tool for that is Aclouda. It's not publicly available yet, but you can always sweet talk them into being part of their private pool perhaps.
-
@matteo-nunziati said in Data archive is not backup! What do you use?:
@scottalanmiller so basically, if I want to move stuff from a NAS appliance (which does'nt support the thing), I need a VM in the middle to manage the copy/move/remove operations. right? (ok, then stopping hijacking the thread)
Any linux VM can do it easily.
Qnap support it, also.
You can install the AWS/B2 cli in any linux-based NAS, in truth. -
I've restrict my choice to XZ vs LZIP.
XZ is adopted by GNU, the kernel distribution and the majority of linux flavours…
But it looks like LZIP is better designed, more simple, with better docs, but not that widespread.Any advice on that?