ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Amazon S3 Outage shows the danger of doing things cheaply.

    Self Promotion
    9
    64
    7.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • coliverC
      coliver @scottalanmiller
      last edited by

      @scottalanmiller said in Amazon S3 Outage shows the danger of doing things cheaply.:

      @coliver said in Amazon S3 Outage shows the danger of doing things cheaply.:

      I would like to see the cost comparison. How much money did these companies lose due to this outage vs how much money they would use having a resilient infrastructure that is geographically load balanced. Is a few hours of downtime a year due to S3 going to break a lot of these companies?

      That's missing a HUGE element which was the risk. Just because you have a bad outage doesn't mean that it was worth protecting against. You have to consider how likely it was to have happened regardless or whether or not it did happen.

      I get that, I would like to see the numbers related to the most recent outage though, for purely academic reasons. I just think it would be interesting. Ignoring the risk entirely for a moment, my guess is that having the infrastructure for a year to protect against this, unlikely, event would have still cost more then the downtime itself cost.

      scottalanmillerS 1 Reply Last reply Reply Quote 1
      • scottalanmillerS
        scottalanmiller @coliver
        last edited by

        @coliver said in Amazon S3 Outage shows the danger of doing things cheaply.:

        @scottalanmiller said in Amazon S3 Outage shows the danger of doing things cheaply.:

        @coliver said in Amazon S3 Outage shows the danger of doing things cheaply.:

        I would like to see the cost comparison. How much money did these companies lose due to this outage vs how much money they would use having a resilient infrastructure that is geographically load balanced. Is a few hours of downtime a year due to S3 going to break a lot of these companies?

        That's missing a HUGE element which was the risk. Just because you have a bad outage doesn't mean that it was worth protecting against. You have to consider how likely it was to have happened regardless or whether or not it did happen.

        I get that, I would like to see the numbers related to the most recent outage though, for purely academic reasons. I just think it would be interesting. Ignoring the risk entirely for a moment, my guess is that having the infrastructure for a year to protect against this, unlikely, event would have still cost more then the downtime itself cost.

        Very easily for sure.

        1 Reply Last reply Reply Quote 0
        • C
          Carnival Boy @Dashrender
          last edited by

          @Dashrender said in Amazon S3 Outage shows the danger of doing things cheaply.:

          Are you saying that you assume that simply by putting a VM (or actual cloud service) in AWS that you automatically assume you have full DC failover, etc? Why do you assume this?

          I don't know what you mean "full DC failover"? I would assume I'd have uptime within the SLA or within published expectations of uptime, which in Amazon's case is about 100% I believe?

          scottalanmillerS DashrenderD 2 Replies Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @Carnival Boy
            last edited by

            @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

            @Dashrender said in Amazon S3 Outage shows the danger of doing things cheaply.:

            Are you saying that you assume that simply by putting a VM (or actual cloud service) in AWS that you automatically assume you have full DC failover, etc? Why do you assume this?

            I don't know what you mean "full DC failover"? I would assume I'd have uptime within the SLA or within published expectations of uptime, which in Amazon's case is about 100% I believe?

            A bit below 100%. Their uptime is from using multiple data centers.

            1 Reply Last reply Reply Quote 0
            • travisdh1T
              travisdh1
              last edited by

              Finally a followup article. Apparently just rebooted to many machines at the same time. Human error, is anyone surprised?

              scottalanmillerS 1 Reply Last reply Reply Quote 1
              • scottalanmillerS
                scottalanmiller @travisdh1
                last edited by

                @travisdh1 said in Amazon S3 Outage shows the danger of doing things cheaply.:

                Finally a followup article. Apparently just rebooted to many machines at the same time. Human error, is anyone surprised?

                Always something simple.

                1 Reply Last reply Reply Quote 1
                • DashrenderD
                  Dashrender @Carnival Boy
                  last edited by

                  @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                  @Dashrender said in Amazon S3 Outage shows the danger of doing things cheaply.:

                  Are you saying that you assume that simply by putting a VM (or actual cloud service) in AWS that you automatically assume you have full DC failover, etc? Why do you assume this?

                  I don't know what you mean "full DC failover"? I would assume I'd have uptime within the SLA or within published expectations of uptime, which in Amazon's case is about 100% I believe?

                  DC as in Datacenter failover - i.e. this DC is offline for whatever reason, so now your data/services is running in another DC?
                  Even if it is listed at 100% the SLA just gives them an out when they don't meet it, i.e. they get to send you a check, that's it. Nothing more. I wouldn't expect them to realtime clone your data to another DC unless you're paying for that feature, at which point your SLA would be even higher, and you're paying a TON more.

                  1 Reply Last reply Reply Quote 1
                  • C
                    Carnival Boy
                    last edited by

                    Right. So why would you think that I would think that if I put data in just one DC I would have DC failover? That doesn't make any sense.

                    DashrenderD 1 Reply Last reply Reply Quote 0
                    • J
                      Jimmy9008
                      last edited by

                      I don't blame AWS at all. If you use a service like AWS, use it properly and build for DR or take the risk! Think about the systems you use, and build properly.

                      scottalanmillerS 1 Reply Last reply Reply Quote 2
                      • scottalanmillerS
                        scottalanmiller @Jimmy9008
                        last edited by

                        @Jimmy9008 said in Amazon S3 Outage shows the danger of doing things cheaply.:

                        I don't blame AWS at all. If you use a service like AWS, use it properly and build for DR or take the risk! Think about the systems you use, and build properly.

                        Oh exactly. It is what it is. Single DC dependency, and on a single service in that DC. AWS tells us what to do if we need higher reliability than that. They were within SLA, I believe. It's all good.

                        J 1 Reply Last reply Reply Quote 1
                        • J
                          Jimmy9008 @scottalanmiller
                          last edited by

                          @scottalanmiller

                          Exactly. We host everything at HQ on site, with a colo in Essex for DR purposes. If HQ was lost, staff are screwed until we restore from backups BUT... customers (the important part) are not really affected at all. We keep a hot copy of our websites and databases running a day out of date (which the business are fine with) in the Essex colo. We then use 'the cloud' to manage the failover process, which is a cheap solution compared to multiple Cloud DC's hosting everything.

                          We have one VM in Azure, and one in AWS. Both check our websites hosted at HQ are available on HTTP/HTTPS every second or so. If not responding, they will use Cloudflare API and point DNS for all our websites to the hot running copies in the colo that are a day out of date. Pretty fast. When tested, it takes seconds and were back online from a customer perspective. Our test, unplug out gateway firewall and see what happens... easy.

                          Yeah it can be better, but it meets our needs and other than cloudflare (which does go down) we have no single point of failure... We're happy with that risk.

                          1 Reply Last reply Reply Quote 0
                          • C
                            Carnival Boy @Deleted74295
                            last edited by

                            @Breffni-Potter said:

                            @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                            I disagree with the article. One of the main reasons I would move to a cloud service is to outsource my redundancy and resilience.

                            But you don't buy any of that from Amazon. This is the biggest misconception about cloud computing.

                            Clearly I have a misconception. I'm not an Amazon customer, but looking at their website, they say things like:
                            Designed for 99.999999999% durability and 99.99% availability of objects over a given year.
                            Designed to sustain the concurrent loss of data in two facilities.

                            Amazon S3 redundantly stores data in multiple facilities and on multiple devices within each facility.

                            All of this seems to me that they are selling resilience. If I read "designed for 99.99%" and then only got 90% availability, would it be fair for Amazon to say "yeah, but that's your fault, we never sold you resilience?" I don't think so.

                            If the argument we're having is "you're not paying for 100% availability" then I agree with you. If your argument is "you're not paying for resilience" then I struggle to agree with you.

                            J scottalanmillerS DashrenderD 4 Replies Last reply Reply Quote 1
                            • J
                              Jimmy9008 @Carnival Boy
                              last edited by

                              @Carnival-Boy Agree. Nice.

                              1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller @Carnival Boy
                                last edited by

                                @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                @Breffni-Potter said:

                                @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                I disagree with the article. One of the main reasons I would move to a cloud service is to outsource my redundancy and resilience.

                                But you don't buy any of that from Amazon. This is the biggest misconception about cloud computing.

                                Clearly I have a misconception. I'm not an Amazon customer, but looking at their website, they say things like:
                                Designed for 99.999999999% durability and 99.99% availability of objects over a given year.
                                Designed to sustain the concurrent loss of data in two facilities.

                                Amazon S3 redundantly stores data in multiple facilities and on multiple devices within each facility.

                                All of this seems to me that they are selling resilience. If I read "designed for 99.99%" and then only got 90% availability, would it be fair for Amazon to say "yeah, but that's your fault, we never sold you resilience?" I don't think so.

                                99.99% is very low availability. 99.999% is "standard" availability. High availability is 99.9999%. They are selling 99.99% uptime, that can't be considered "selling reliability" as it is far too unreliable for that. It's fine for most customers, most customers don't need much availability.

                                So I read the same thing as saying "designed for 99.99% availability" which is a direct statement making it super clear that Amazon S3, unless you do things yourself to make it high availability, is not at all designed for "availability" as a target feature. To me, they've clarified that in what you quoted to make sure we don't assume that availability is their specialty.

                                And they meet 99.99% with ease. 90% would mean that they were down for nearly a month, not an afternoon.

                                1 Reply Last reply Reply Quote 0
                                • scottalanmillerS
                                  scottalanmiller @Carnival Boy
                                  last edited by

                                  @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                  If your argument is "you're not paying for resilience" then I struggle to agree with you.

                                  You are paying for a very specific level of resilience which is considered "low". So you "are paying for resilience", but not high resilience.

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    Carnival Boy
                                    last edited by

                                    Yeah, I'd agree with that. You're right that 99.99% is low.

                                    scottalanmillerS 1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @Carnival Boy
                                      last edited by

                                      @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                      Yeah, I'd agree with that. You're right that 99.99% is low.

                                      Or low-ish at least. It's four nines. It's more than I expect from an average SAN 🙂 Less than I expect from an average server.

                                      1 Reply Last reply Reply Quote 0
                                      • scottalanmillerS
                                        scottalanmiller
                                        last edited by

                                        The big deal about S3 is the durability. They simply never lose data, ever. You might lose access for a few hours, but the data will always be there.

                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          Carnival Boy
                                          last edited by Carnival Boy

                                          @scottalanmiller said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                          Less than I expect from an average server.

                                          It's about an hour a year, I think? We probably get roughly that from our servers because of scheduled reboots and upgrades etc etc. In terms of unplanned downtime, not sure.

                                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @Carnival Boy
                                            last edited by

                                            @Carnival-Boy said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                            @scottalanmiller said in Amazon S3 Outage shows the danger of doing things cheaply.:

                                            Less than I expect from an average server.

                                            It's about an hour a year, I think? We probably get roughly that from our servers because of scheduled reboots and upgrades etc etc. In terms of unplanned downtime, not sure.

                                            Yeah, generally we don't count planned downtime - partially because we typically discuss the server level in house, not the software level just.... because often we have no control over the later. And partially because it's very different, it's downtime when downtime is approved. It is "down" but not down how people normally mean it.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 2 / 4
                                            • First post
                                              Last post