CloudatCost June Outage
-
My C@C server is still up.
-
Mine has been up. Although I haven't touched it much. I assume this is another storage issue with the larger servers.
-
My tweet is above.
I ended up getting my most important instance recovered after a few reboots, fsck and fixing a few things manually, but I had two others that seem to be unrecoverable. Since each instance is running an entirely different workload, my guess is they had some kind of storage outage that thrashed my virtual disks to death.
-
@sithadmin welcome to MangoLassi!
-
Welcome @sithadmin!
-
@sithadmin said:
Since each instance is running an entirely different workload, my guess is they had some kind of storage outage that thrashed my virtual disks to death.
They have that a lot. We've measured their disk IO quite often and it was atrocious. Instead of using scale out cloud storage like CEPH, Gluster or OpenStack they are trying to just use a single SAN to do all of this. They don't know how to build cloud storage and made a newbie mistake as if they'd never done virtualization or storage before. They don't have the architecture now to fix things.
-
@scottalanmiller If that's the case, I wouldn't be surprised at all. It'd difficult to think of something I've seen this provider do the right way.
-
@sithadmin said:
@scottalanmiller If that's the case, I wouldn't be surprised at all. It'd difficult to think of something I've seen this provider do the right way.
No kidding. When we first did a dmidecode on one of these VMs and saw that it was VMware we questioned what else could be wrong. Why a cloud provider would be paying a massive premium for a VMware platform when Xen, KVM and HyperV are all free makes no sense, especially when Linux is the only workload option that CloudatCost had (check the threads as to why Windows isn't legally an option on CloudatCost.) So VMware cost a fortune and crippled them in performance.
Answer that we came up with then - that CloudatCost lacked the technical skills to run OpenStack or other serious cloud platform. So they were doing this to try to "buy" their way out of lacking IT skills.
It was later than we found out about the SAN issue. But it was more of the same. I'm sure they had no idea how storage for a cloud was to work and instead of learning about it, they called up a storage vendor and the sales person designed the storage for them. By the book mistakes one after another.
-
I am surprised they are still online at all.
-
@IRJ said:
I am surprised they are still online at all.
Why? I mean they are redundant after all lol...
-
@thecreativeone91 said:
@IRJ said:
I am surprised they are still online at all.
Why? I mean they are redundant after all lol...
The only thing that is redundant is their outages
-