KVM Backing and Support

scottalanmiller

@storageninja said in KVM Backing and Support:

). Veeam can run scripts to a VM before and after backups if you want to put a database in hot standby mode (What we did for some more weird databases) so when you recover the VM you know the database will be consistent. Worst case you can do a stop service before backup and resume afterwards script.

Sure, but at some point, we've lost the benefits versus agent based or just scripted and we are only deploying the backup infrastructure in that way to prove a point - which is not ITs job to do.

DustinB3403

I can't argue the merits of DevOps Backups, but this article seems to agree with @JaredBusch and @StorageNinja that "DevOps backups" are a horrible idea.

DustinB3403

@scottalanmiller said in KVM Backing and Support:

@storageninja said in KVM Backing and Support:

). Veeam can run scripts to a VM before and after backups if you want to put a database in hot standby mode (What we did for some more weird databases) so when you recover the VM you know the database will be consistent. Worst case you can do a stop service before backup and resume afterwards script.

Sure, but at some point, we've lost the benefits versus agent based or just scripted and we are only deploying the backup infrastructure in that way to prove a point - which is not ITs job to do.

Isn't it exclusively IT"s job to deploy the backup in a way to prove that what is being backed up is consistent and usable?

I mean, to sound like a fool here "we" have audits to meet and certainly one of those annoying questions will be "do you produce consistent backups?"

scottalanmiller

@dustinb3403 said in KVM Backing and Support:

I can't argue the merits of DevOps Backups, but this article seems to agree with @JaredBusch and @StorageNinja that "DevOps backups" are a horrible idea.

That article says nothing of the sort. In fact, it highlights the opposite. It even points out that DevOps approaches will work where traditional backups have less or no chance.

There's no risk to DevOps style backups. If there appears to be, they are misunderstood. They might take more effort and skill, they don't allow for lazy "just click and button and hope for the best" approaches commonly promoted as the purpose for other approaches, but when you do them, nothing is more reliable.

scottalanmiller

@dustinb3403 said in KVM Backing and Support:

@scottalanmiller said in KVM Backing and Support:

@storageninja said in KVM Backing and Support:

). Veeam can run scripts to a VM before and after backups if you want to put a database in hot standby mode (What we did for some more weird databases) so when you recover the VM you know the database will be consistent. Worst case you can do a stop service before backup and resume afterwards script.

Sure, but at some point, we've lost the benefits versus agent based or just scripted and we are only deploying the backup infrastructure in that way to prove a point - which is not ITs job to do.

Isn't it exclusively IT"s job to deploy the backup in a way to prove that what is being backed up is consistent and usable?

Once you are dedicated to doing backups well, why not take the trivial additional effort to do them "really well"? That's kind of the point. Modern DevOps style backups aren't that much harder. Agentless only gets really easy when you use it to avoid really testing.

Obsolesce

@scottalanmiller said in KVM Backing and Support:

@obsolesce said in KVM Backing and Support:

But it's so convenient and easy to be able to back up (agentless) VMs at the hypervisor level with the ability to restore files within VMs like you can with Hyper-V backup solutions.

Is it? How is "so convenient" really important in IT? Unless you can put a dollar value on that convenience, it's not relevant.

It comes at a cost, a cost of reliability and performance. I see loads of shops getting useless backups because they thought convenience trumped "working". It encourages lazy, bad backups and processes.

Once you do all the due diligence and effort to get good backups the difference in effort between agentless and agent is generally nominal.

Veeam does what I'm talking about and does reliable backups... as one example.

The time you save and what you get is worth it for an SMB. It doesn't cost much in that case. Essentially, Essentials. If the cost is higher then other options should be evaluated.

JaredBusch

@obsolesce because Scott has decided that this is his new shiny thing and you will never dissuade him.

stacksofplates

There are merits to both sides. For example we do have a lot "backed up" in Git. Things like DHCP servers, DNS servers, web servers, etc that don't have stateful data are stored in Git. Then that Git server is obviously backed up. And you get a little extra redundancy since Git is distributed by nature. We do "agent" based but only because everything is under some type of CM. So it's easy to just make sure that system has the agent's backup role applied to it and that's done automatically.

But I can also see how small shops with not much help would to spend a small amount of money and be able to do agentless with not much extra work.

stacksofplates

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

stacksofplates

@stacksofplates said in KVM Backing and Support:

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

This is also why I like Elasticsearch so much. Clustering is super easy and so are snapshots/backups.

StorageNinja

@scottalanmiller said in KVM Backing and Support:

They might take more effort and skill

They tend to require more care and feeding, and by structure tend to use a lot more space (Just using application level backup tools produces fulls on every backup job, does a ton more IO, and layering this with LVM snapshot shipping and volume shipping leads to tons of redundant copies of data vs. using something like Commvault that will dedupe everything in a pool. Because of the overhead and costs you often don't see very granular RPO's vs. something that has a journal log and can DVR style replay (Like TimeFinder, or RecoverPoint).

https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/

Saying anyone that has a problem with recovery "isn't doing them right" is kind of a no true Scotsman argument.
While plenty of large shops "do it right" (Google etc) I think pointing to the processes that people who have 100K server instances doing the same thing as how a SMB should run can quickly turn into the "cargo cult of the cloud".

Lage shops (like my employer) tend to employ a mix. Data that can be recreated, or lacks compliance requirements, and is large analytic cloud data I can see going down that route. Some webserver and SQL VMs? Those are going to get a traditional backup tool.

StorageNinja

@stacksofplates said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

This is also why I like Elasticsearch so much. Clustering is super easy and so are snapshots/backups.

It's a bit unfair to compare a cloud native No-SQL applications that can play fast and lose with ACID consistency on it's native capabilities against a relational database that's core engine was designed in the 1980's and has a mission to "never loose a transaction at any cost". I do think more data goes into RDMS's than needs to be. Even if I"m going to use something like Casandra I'd consider running a packaged build with added tools for backup/recovery operations (Datastax?) just as it simplifies the admin overhead.

If every application requires bespoke skills to backup, DR and recover this is going to lead to crazy opex overhead. In many enterprises you could have hundreds or thousands of applications. At a certain point you start shoving everything into square holes (Java or .NET, backed by Oracle or M-SQL) so you can manage lifecycle. It's the same reason people run Hadoop in VM's. The average Hadoop instance is only 12TB and operatializing a new bare metal environment and creating another management silo is far worse than the overhead of putting that in a VM even if running 1:1. There is a balance, but I find people tend to over correct. Large enterprises shove everything into few platforms, and SMB's often start sprawling sooner than they should.

stacksofplates

@storageninja said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

This is also why I like Elasticsearch so much. Clustering is super easy and so are snapshots/backups.

It's a bit unfair to compare a cloud native No-SQL applications that can play fast and lose with ACID consistency on it's native capabilities against a relational database that's core engine was designed in the 1980's and has a mission to "never loose a transaction at any cost". I do think more data goes into RDMS's than needs to be. Even if I"m going to use something like Casandra I'd consider running a packaged build with added tools for backup/recovery operations (Datastax?) just as it simplifies the admin overhead.

If every application requires bespoke skills to backup, DR and recover this is going to lead to crazy opex overhead. In many enterprises you could have hundreds or thousands of applications. At a certain point you start shoving everything into square holes (Java or .NET, backed by Oracle or M-SQL) so you can manage lifecycle. It's the same reason people run Hadoop in VM's. The average Hadoop instance is only 12TB and operatializing a new bare metal environment and creating another management silo is far worse than the overhead of putting that in a VM even if running 1:1. There is a balance, but I find people tend to over correct. Large enterprises shove everything into few platforms, and SMB's often start sprawling sooner than they should.

True, but I was more saying that we try to choose solutions that use things like Elasticsearch vs using something else. Obviously that doesn't always work, but we do that for the exact same reason as you mentioned. We can use Graylog and other tools that use Elasticsearch and get nice looking graphs from Grafana from the same tool (just a simple example). I was meaning to make this point

shove everything into few platforms

but did a bad job of it I guess.

StorageNinja

@stacksofplates said in KVM Backing and Support:

There are merits to both sides. For example we do have a lot "backed up" in Git. Things like DHCP servers, DNS servers, web servers, etc that don't have stateful data are stored in Git. Then that Git server is obviously backed up. And you get a little extra redundancy since Git is distributed by nature. We do "agent" based but only because everything is under some type of CM. So it's easy to just make sure that system has the agent's backup role applied to it and that's done automatically.

But I can also see how small shops with not much help would to spend a small amount of money and be able to do agentless with not much extra work.

The other thing that I think people loose track of in their "war on state sprawl" is that most companies don't control the code they have deployed. 75% of code in large enterprises they don't own. You can do platform migrations to open source, and hire developers to do this but if the alternative is $1000 a host for a Veeam license you will get laughed out of the meeting by anyone who's done an ERP migration.

Realistically the easiest way to get rid of backup headaches is to make them someone else's problem. Use SaaS applications, and if it makes sense use SaaS Backup products (Spanning). If the person who owns the code is delivering it, ideally they should be able to achieve enough scale to make custom protection work, or aggregate enough demand to have more leverage with the backup vendors they purchase from.

StorageNinja

@stacksofplates said in KVM Backing and Support:

shove everything into few platforms

but did a bad job of it I guess.

Anyone who uses Microsoft SQL for a log analytic platform did a bad job of it
Hilarious cost scaling issues, and backups with high change rate get fun.

For log analytic situations where data sovereignty isn't a concern, rather than a SMB learn Elasticsearch (which isn't bad to be fair) they could also just use a SaaS provider. SumoLogic, or Log Inteligence (we just launched), Splunk (if they have lots of gold pressed latinum).etc

stacksofplates

@storageninja said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

shove everything into few platforms

but did a bad job of it I guess.

Anyone who uses Microsoft SQL for a log analytic platform did a bad job of it
Hilarious cost scaling issues, and backups with high change rate get fun.

For log analytic situations where data sovereignty isn't a concern, rather than a SMB learn Elasticsearch (which isn't bad to be fair) they could also just use a SaaS provider. SumoLogic, or Log Inteligence (we just launched), Splunk (if they have lots of gold pressed latinum).etc

Sadly I know some people that do just that. And have nothing but problems.

The nice thing about Graylog is there isn't much to learn. Install the components and start the services. But yeah it's definitely simpler to ship off if you have that ability.

stacksofplates

@storageninja said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

There are merits to both sides. For example we do have a lot "backed up" in Git. Things like DHCP servers, DNS servers, web servers, etc that don't have stateful data are stored in Git. Then that Git server is obviously backed up. And you get a little extra redundancy since Git is distributed by nature. We do "agent" based but only because everything is under some type of CM. So it's easy to just make sure that system has the agent's backup role applied to it and that's done automatically.

But I can also see how small shops with not much help would to spend a small amount of money and be able to do agentless with not much extra work.

The other thing that I think people loose track of in their "war on state sprawl" is that most companies don't control the code they have deployed. 75% of code in large enterprises they don't own. You can do platform migrations to open source, and hire developers to do this but if the alternative is $1000 a host for a Veeam license you will get laughed out of the meeting by anyone who's done an ERP migration.

Realistically the easiest way to get rid of backup headaches is to make them someone else's problem. Use SaaS applications, and if it makes sense use SaaS Backup products (Spanning). If the person who owns the code is delivering it, ideally they should be able to achieve enough scale to make custom protection work, or aggregate enough demand to have more leverage with the backup vendors they purchase from.

10000%. If you have the option to use someone else's systems, do it. However while most things in our group are open source, our ERP is all tied into Oracle and a lot of that is delivered with APEX. To get out of that mess would cost astronomical amounts so it's still there.

stacksofplates

@storageninja said in KVM Backing and Support:

@scottalanmiller said in KVM Backing and Support:

They might take more effort and skill

They tend to require more care and feeding, and by structure tend to use a lot more space (Just using application level backup tools produces fulls on every backup job, does a ton more IO, and layering this with LVM snapshot shipping and volume shipping leads to tons of redundant copies of data vs. using something like Commvault that will dedupe everything in a pool. Because of the overhead and costs you often don't see very granular RPO's vs. something that has a journal log and can DVR style replay (Like TimeFinder, or RecoverPoint).

https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/

Saying anyone that has a problem with recovery "isn't doing them right" is kind of a no true Scotsman argument.
While plenty of large shops "do it right" (Google etc) I think pointing to the processes that people who have 100K server instances doing the same thing as how a SMB should run can quickly turn into the "cargo cult of the cloud".

Lage shops (like my employer) tend to employ a mix. Data that can be recreated, or lacks compliance requirements, and is large analytic cloud data I can see going down that route. Some webserver and SQL VMs? Those are going to get a traditional backup tool.

Also there isn't an "Enterprise solution" like huge companies have one team of ops people that work on everything. If you look at larger companies there are tons of smaller teams writing their own solutions. Places like Netflix encourage that and you can write whatever solution you want as long as it fits and meets API requirements, backup strategies, health checks, etc. There isn't a specific Enterprise way of doing things.

stacksofplates

@storageninja said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

This is also why I like Elasticsearch so much. Clustering is super easy and so are snapshots/backups.

It's a bit unfair to compare a cloud native No-SQL applications that can play fast and lose with ACID consistency on it's native capabilities against a relational database that's core engine was designed in the 1980's and has a mission to "never loose a transaction at any cost". I do think more data goes into RDMS's than needs to be. Even if I"m going to use something like Casandra I'd consider running a packaged build with added tools for backup/recovery operations (Datastax?) just as it simplifies the admin overhead.

That was kind of what I meant to point out. Those systems have been around for so long that they've had that amount of time to build in a native replication system (not just things like Galera). Postgres has something but I've never tried. It just seems that if you've been around for 30 years you could have an easier replication set up than currently exists.

StorageNinja

@stacksofplates said in KVM Backing and Support:

@storageninja said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

@stacksofplates said in KVM Backing and Support:

It also bothers me to no end that the systems we use to store our most important data (databases) have the least backup (and redundancy) options. I try to use solutions that rely on them as little as possible (that's why I use things like Grav).

This is also why I like Elasticsearch so much. Clustering is super easy and so are snapshots/backups.

It's a bit unfair to compare a cloud native No-SQL applications that can play fast and lose with ACID consistency on it's native capabilities against a relational database that's core engine was designed in the 1980's and has a mission to "never loose a transaction at any cost". I do think more data goes into RDMS's than needs to be. Even if I"m going to use something like Casandra I'd consider running a packaged build with added tools for backup/recovery operations (Datastax?) just as it simplifies the admin overhead.

That was kind of what I meant to point out. Those systems have been around for so long that they've had that amount of time to build in a native replication system (not just things like Galera). Postgres has something but I've never tried. It just seems that if you've been around for 30 years you could have an easier replication set up than currently exists.

The other problem with systems like this is their testing is very basic. Often simply checksums, or unit testing and not testing of a group of applications and VM's that require function to restore and hit a RPO point. If I"m using SRM or Veeam I can easily do an automated test and spin up a group of 10 VM's that make up the full dependency chain and make sure that a test can be done.

If I'm just scripting backups of PostGres DB I'm at the mercy of my entire build toolchain to do a full stack test (which is a massive non-trivial amount of IO and time vs. SureBackup labs, or linked clones triggered by SRM).