ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Azure Outage... Again

    Scheduled Pinned Locked Moved IT Discussion
    microsoftazure
    79 Posts 13 Posters 24.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      Alex Sage @scottalanmiller
      last edited by

      @scottalanmiller How you tried from the US?

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller @Alex Sage
        last edited by

        @aaronstuder said in Azure Outage... Again:

        @scottalanmiller How you tried from the US?

        Yup, NY and KY.

        1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller
          last edited by

          And the client sees it down from PA and MD.

          1 Reply Last reply Reply Quote 0
          • gjacobseG
            gjacobse
            last edited by

            0_1461686119016_2016-04-26 11_54_43-NTG - PURPLEPRINCESS - Connected.png

            This is what we are showing.. and we have I believe four systems running under Azure..

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller
              last edited by

              Way more than four.

              1 Reply Last reply Reply Quote 0
              • A
                Alex Sage
                last edited by

                What the status of your subscription?

                scottalanmillerS 1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @Alex Sage
                  last edited by

                  @aaronstuder said in Azure Outage... Again:

                  What the status of your subscription?

                  Can't check on it. The outage has taken out the system that shows it.

                  A 1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller
                    last edited by

                    Which is what we see with most outages... they lose some core database that reports subscriptions, this cascades to the console and on to the VMs. It's, and this is just me guessing, probably a database instance that handles the subscription data or some data that builds the subscription that has failed and then all of the other outages are likely from dependencies on that system. We've see that or almost exactly that a few times and tons of other companies (hundreds) that we have interfaced with (mostly via MS conferences) have reported the exact same problem as what they see most often.

                    1 Reply Last reply Reply Quote 0
                    • A
                      Alex Sage @scottalanmiller
                      last edited by

                      @scottalanmiller said in Azure Outage... Again:

                      Can't check on it. The outage has taken out the system that shows it.

                      0_1461686683807_upload-f39f0a08-79e7-49c5-b108-42e8f96ed4af

                      1 Reply Last reply Reply Quote -1
                      • gjacobseG
                        gjacobse
                        last edited by

                        0_1461686730602_2016-04-26 12_05_15-NTG - PURPLEPRINCESS - Connected.png

                        A 1 Reply Last reply Reply Quote 1
                        • A
                          Alex Sage @gjacobse
                          last edited by

                          @gjacobse That seems like a issue.

                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @Alex Sage
                            last edited by

                            @aaronstuder said in Azure Outage... Again:

                            @gjacobse That seems like a issue.

                            Yes, that's why we think that their loss of subscription data is the core of the issue. Their VMs are dependent on the subscription data but they can't keep their subscription data working.

                            wirestyle22W 1 Reply Last reply Reply Quote 0
                            • wirestyle22W
                              wirestyle22 @scottalanmiller
                              last edited by

                              @scottalanmiller said in Azure Outage... Again:

                              @aaronstuder said in Azure Outage... Again:

                              @gjacobse That seems like a issue.

                              Yes, that's why we think that their loss of subscription data is the core of the issue. Their VMs are dependent on the subscription data but they can't keep their subscription data working.

                              How would they have configured this? Wouldn't any of their servers be clustered within multiple data centers? How does this happen with such a huge service?

                              scottalanmillerS 1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @wirestyle22
                                last edited by

                                @wirestyle22 said in Azure Outage... Again:

                                @scottalanmiller said in Azure Outage... Again:

                                @aaronstuder said in Azure Outage... Again:

                                @gjacobse That seems like a issue.

                                Yes, that's why we think that their loss of subscription data is the core of the issue. Their VMs are dependent on the subscription data but they can't keep their subscription data working.

                                How would they have configured this? Wouldn't any of their servers be clustered within multiple data centers? How does this happen with such a huge service?

                                They have several known issues in this system. My guess is that they either have another external system that manipulates this one that feeds in bad data and causes outages that way, or that the code of the system that interacts with it has bugs and causes issues that way. The former, I think, is the far more likely based on a few factors - namely that account "type" often affects this. For example, because we are an MS Partner, there have been reports that some partner system has regularly connected to Azure's database and caused it to corrupt.

                                No amount of clustering, multiple data centers or keeping servers up can fix this problem in the least. The problem is, from what we've been told, all from their workflows and security. Basically they have an unhealthy, non-working system that is given permission to control Azure and has been known to "randomly" cause Azure to totally fail.

                                1 Reply Last reply Reply Quote 1
                                • scottalanmillerS
                                  scottalanmiller
                                  last edited by

                                  This is actually a really great example of how platform high availability is so much of a myth. The Azure physical platform can do some amazing HA, but it has incredible fragile dependencies that make the HA features pointless. Who cares if the database is up and running if the data in it gets deleted by some automated process or my careless interns or whatever? Who cares if the application is running if the application itself fails? The high availability just makes people able to see the failed application, it doesn't keep anything working.

                                  Microsoft's problem here is that their product, Azure, itself is what is failing, not the physical infrastructure or the virtualization layer that it is running on. It's the actual cloud layer, not the hypervisor or physical layer, experiencing the problem. They've made their cloud layer overly complex and with dependencies that they are not keeping as reliable as other things.

                                  It shows that holistic risk understanding is very important and that the weakest link matters completely.

                                  1 Reply Last reply Reply Quote 1
                                  • A
                                    Alex Sage
                                    last edited by Alex Sage

                                    Are you sure the client isn't just forgetting to pay the bill?

                                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @Alex Sage
                                      last edited by

                                      @aaronstuder said in Azure Outage... Again:

                                      Are you sure the client isn't just forgetting to buy the bill?

                                      Not how it works. It's our account and we have partner credits, so even if we were not paying the bill our subscription would not go away. The VMs might turn off, I guess, but the account would not vanish. This is 100% a MS issue and it is a recurring one. There is no question where the issue is.

                                      1 Reply Last reply Reply Quote 1
                                      • scottalanmillerS
                                        scottalanmiller
                                        last edited by

                                        We aren't wondering if Azure is down, we know that it is. We know that the issue is Microsoft's and that it is the same issue that they have been having over and over again with many companies (most that we've talked to, actually, it's more than 50% of companies that we've interfaced with report that this exact issue is one that they have experienced and have experienced MS denying it - even to our faces.) What we are asking is how localized is it. Is it just one account (maybe an account manager deleted an account.) Is it regional. Is it people on a single database server or account category?

                                        1 Reply Last reply Reply Quote 0
                                        • scottalanmillerS
                                          scottalanmiller
                                          last edited by

                                          MS support responded much more quickly than they stated that they were likely to do and... they need our subscription info to process the ticket.

                                          AAARRGGHH

                                          1 Reply Last reply Reply Quote 0
                                          • Minion QueenM
                                            Minion Queen
                                            last edited by

                                            Well I responded right away when they responded to the ticket. Not that I can give them any information 😛

                                            scottalanmillerS 1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 3 / 4
                                            • First post
                                              Last post