ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Scheduling Simple Local Linux Reboots

    Scheduled Pinned Locked Moved IT Discussion
    linuxcronschedulingrebootsystem administration
    25 Posts 4 Posters 3.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @aaron-closed account
      last edited by

      @aaron said in Scheduling Simple Local Linux Reboots:

      Not the execution, but that the machine comes back up correctly? I want eyes watching every reboot, not the guy on call, but the person doing the reboot.

      Of course, but you don't want the system admins "watching" systems, ideally you don't even want local logins, you want application teams and end users performing checkouts of the final application. You want monitoring systems watching the system itself. Not people staying at consoles.

      aaron-closed accountA 1 Reply Last reply Reply Quote 1
      • scottalanmillerS
        scottalanmiller @aaron-closed account
        last edited by

        @aaron said in Scheduling Simple Local Linux Reboots:

        If an organization decides to reboot systems at a regular maintenance window, I'd still want it to be executed by a person and not a cron job.

        That's both extremely expensive, and complicated. For example, I had 600 servers to myself at one job, 3,000 at another. I couldn't watch those reboots even if I did it every minute of every day. And there is no way that I could see anything useful. What do you expect people to watch for during a reboot? Unless they are scouring logs, which they would not be doing by watching the server anyway, what would they see?

        aaron-closed accountA 1 Reply Last reply Reply Quote 1
        • aaron-closed accountA
          aaron-closed account Banned @scottalanmiller
          last edited by

          This post is deleted!
          scottalanmillerS 1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @aaron-closed account
            last edited by

            @aaron said in Scheduling Simple Local Linux Reboots:

            @scottalanmiller I think we're talking about totally different team sizes hahah

            Whether you are a team of one or one hundred, the factors remain the same. I had a team of 80, but there were tens of thousands of servers. So the per-admin time needed to watch consoles was still very prohibitive.

            1 Reply Last reply Reply Quote 0
            • aaron-closed accountA
              aaron-closed account Banned @scottalanmiller
              last edited by

              This post is deleted!
              scottalanmillerS dafyreD 2 Replies Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller
                last edited by

                At least with VMs, the amount to watch and the time to watch is far, far less. Once upon a time I had to do this with physicals and the fifteen minute memory POST check was seriously painful.

                1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller @aaron-closed account
                  last edited by

                  @aaron said in Scheduling Simple Local Linux Reboots:

                  @scottalanmiller said in Scheduling Simple Local Linux Reboots:

                  what would they see

                  A config file borking a service, a hardware problem that manifested, etc.

                  You would not see that just watching a reboot, that's my point. A config problem would need to be caught either from logs or from the application checkout. It would be caught without an admin watching the reboot.

                  Hardware would not be found in a VM, and a hypervisor failing to reboot would get caught by monitoring really quickly. So I don't see the benefit there unless you have a rush window of say under ten minutes, in which case you do indeed have all of your resources standing by including the remote hands in the datacenter, not just the Linux and hypervisor admins. But that wouldn't be Linux (unless you are on KVM.)

                  1 Reply Last reply Reply Quote 0
                  • dafyreD
                    dafyre @aaron-closed account
                    last edited by dafyre

                    @aaron said in Scheduling Simple Local Linux Reboots:

                    @scottalanmiller said in Scheduling Simple Local Linux Reboots:

                    what would they see

                    A config file borking a service, a hardware problem that manifested, etc.

                    Again, this is why you would use a monitoring and alerting service. Friday afternoon at 5:00PM : 300 systems reboot... By 6:00PM those systems should be back up... If System-247 fails... then start alerting folks.

                    Edit: If Reboots are done regularly, there's much less risk of something breaking, IMO.

                    scottalanmillerS 1 Reply Last reply Reply Quote 1
                    • scottalanmillerS
                      scottalanmiller @dafyre
                      last edited by

                      @dafyre said in Scheduling Simple Local Linux Reboots:

                      @aaron said in Scheduling Simple Local Linux Reboots:

                      @scottalanmiller said in Scheduling Simple Local Linux Reboots:

                      what would they see

                      A config file borking a service, a hardware problem that manifested, etc.

                      Again, this is why you would use a monitoring and alerting service. Friday afternoon at 5:00PM : 300 systems reboot... By 6:00PM those systems should be back up... If System-247 fails... then start alerting folks.

                      With automation, you can tighten that window, too. You can make it a five minute or ten minute alert suppression. You tweak, of course. If you don't automate, you can't have that tight monitoring and are actually, I feel, more likely to miss an error than if you didn't have the human involved.

                      1 Reply Last reply Reply Quote 0
                      • aaron-closed accountA
                        aaron-closed account Banned
                        last edited by aaron-closed account

                        This post is deleted!
                        JaredBuschJ scottalanmillerS 2 Replies Last reply Reply Quote 0
                        • JaredBuschJ
                          JaredBusch @aaron-closed account
                          last edited by

                          @aaron said in Scheduling Simple Local Linux Reboots:

                          I'm not against automation. I think y'all got the wrong idea. I against screwing the in call team on a Friday evening because servers are rebooting and nobody is actively watching the monitoring. I am not talking about watching them POST for goodness sakes.

                          Why would no one be watching the monitoring? That is the point of monitoring. It sends alerts. and someone should always be around to handle it.

                          aaron-closed accountA 1 Reply Last reply Reply Quote 0
                          • aaron-closed accountA
                            aaron-closed account Banned @JaredBusch
                            last edited by

                            This post is deleted!
                            scottalanmillerS JaredBuschJ 2 Replies Last reply Reply Quote 0
                            • scottalanmillerS
                              scottalanmiller @aaron-closed account
                              last edited by

                              @aaron said in Scheduling Simple Local Linux Reboots:

                              I'm not against automation. I think y'all got the wrong idea. I against screwing the on-call team on a Friday evening because servers are rebooting and nobody is actively watching the monitoring. I am not talking about watching them POST for goodness sakes.

                              Better to be on call and get called once in a blue moon than to have to spend Friday night always stuck staring and console screens. I'd much prefer to get paged once in a while (and it is RARE if you are admining well) than to just give up Friday nights for something silly like that. That would be horrible.

                              And how often is the outage something for the admin? If a config file borks a service, that's for the application team to fix, not the system admin. Why make the system admin look for some non-system mistake? Let the right team handle that.

                              1 Reply Last reply Reply Quote 3
                              • scottalanmillerS
                                scottalanmiller @aaron-closed account
                                last edited by

                                @aaron said in Scheduling Simple Local Linux Reboots:

                                @JaredBusch someone watching the monitoring dashboard and someone being paged in the middle of a Friday night dinner are very different.

                                Indeed, getting to have Friday night dinner 95% of the time is the difference 😉

                                1 Reply Last reply Reply Quote 1
                                • JaredBuschJ
                                  JaredBusch @aaron-closed account
                                  last edited by

                                  @aaron said in Scheduling Simple Local Linux Reboots:

                                  @JaredBusch someone watching the monitoring dashboard and someone being paged in the middle of a Friday night dinner are very different.

                                  First of all as we have exhaustively discussed in the other threads, IT hours are not 9 to 5. and if you cannot get past that, you need to find a new line of work.

                                  Second, no one should even be monitoring a dashboard in the first place. You should trust your system to send alerts. You trust it because you test it under control during normal work time.

                                  scottalanmillerS 1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller @JaredBusch
                                    last edited by

                                    @JaredBusch said in Scheduling Simple Local Linux Reboots:

                                    Second, no one should even be monitoring a dashboard in the first place. You should trust your system to send alerts. You trust it because you test it under control during normal work time.

                                    Plus this is a good way to test that system, too. You could leave it on and let people get alerts that there is an outage, and that it is resolved. The person having dinner with the family could watch the alert, know to watch for it to clear, and see it clear and get to enjoy Friday night knowing that the system did what it was supposed to do.

                                    1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller
                                      last edited by

                                      In larger teams, you normally have 24x7 staff. So the reboot schedule goes to the current shift to monitor. It's only shops that lack round the clock scheduling that have this as a real issue, and if you don't have 24x7, shouldn't you be outsourcing to a shop that does if you really need that at all? I think that this normally (maybe not always) becomes a problem when you are dealing with layers and layers of other problems like not having enough IT staff to properly staff a department without causing unnecessary cost and risk and choosing not to outsource to an MSP/ITSP that could do this cost effectively.

                                      1 Reply Last reply Reply Quote 0
                                      • 1
                                      • 2
                                      • 2 / 2
                                      • First post
                                        Last post