Domain Controller Down (VM)
-
@wirestyle22 said in Domain Controller Down (VM):
@John-Nicholson
I am definitely interested in any education you are willing to offer but I think we are actually almost finished with this currently. Are you willing to sit with me for literally any amount of time tomorrow? I am a very eager learner.
A bit tied up with prep for VMworld Barcelona but might find a few minutes to talk.
One thing is as soon as humpty is back together again is run RVTools against the vCenter. Get a XLS dump. The health tab on the end will find all kinds of fun misconfigurations but I can go over it with you (or if you can sanitize it and post it hear I can give you a tear down).
Next up your on essentials Plus. You can have 24/7 production support, use it. GSS likes to help.
100Mbps switching is NOT supported for storage or vMotion.
Get some real storage. [email protected] can help you get a small FC DAS HUS with good support to avoid a lot of this mess.Upgrade to the VCSA (use the migrate2VCSA tool!), and 6.x ESXi. ESXi 5.1 is not in general support anymore as of last week or so.
Before you do that upgrade the BIOS/FIrmware on those UCS boxes.
Fix the NTP serve config to start the service, and make sure to have 3 (not 2!) NTP servers so you can fix drift.
Get into the UCS CIMC (Set that up!) and fix the clocks if needed.Get a pair of cheap but fast/good TOR 1Gbps switches. ICX 6450's Brocade's are solid, speak proper RSTP, and fast enough to handle iSCSI for the migration, and vMotion once done.
Get Veeam for backups. Possibly beef up the backup storage.
Replace that Gen6 HP. Its out of support, and unsupportable.Don't worry a lot on resource balance, your CPU and memory usage are good and those cisco hosts (M3's) are current enough. Check the smartness' on them though, and get CIMC alerting setup.
All in FC, labor (week at 10K) Host, migration of a few VM's Veeam setup, small host, switches, upgrade, your looking 50-60K?
Still got other challenges (banish 2003, get VDI deployed, replace campus switching) but I'd give Howard a ring. He's seen worse I promise...
Other people to talk to if your in the SW, Sigma (Nigel Hickey's a good guy there). Does a lot of VDI work.
-
@coliver said in Domain Controller Down (VM):
Looks like an ipod... This is going to be interesting in the long term. Those Cisco chassis can do some expending though so you may be able to get to a more reliable system with what you have.
Actually UCS can't really expand much from a storage perspective. They don't have any native DAS JBOD support, and the MegaRaids on them they do little in integration or customization. UCS was never really designed to use local storage in RAID I'm convinced (at scale anyways). They are useful if your using them in true JBOD (VSAN, they are certified for use) or with HBA's to talk to an external disk array.
-
@JaredBusch said in Domain Controller Down (VM):
@wirestyle22 said in Domain Controller Down (VM):
Thanks to @John-Nicholson for the help! A lot of great information.
The problem is now that we stepped up to help a member of our community and the bosses know nothing of how much this should have cost you to get repaired.
VMware GSS is around 24/7 to help with stuff like this.
Just call them next time. 1 (877) 486-9273
Make sure to add yourself to the authorized list ahead of time for faster service, and write down your customer support. This is why you use enterprise products, so you can get help quickly. -
Steps everyone else missed.
vCenter came up after the DC's so we restarted the vCenter Services.
Some of the VM's were running as zombie's (storage had dropped to long, OS had crashed) so the VM's had to be reset once storage came back up (Note APD detection in 5.5 and 6.x is better and this isn't as common).
Storage latency over 100Mbps iSCSI is awful. 500ms max on one VM, and average of 45ms. This is like running your hard drive remote from Houston to Atlanta. Hence my recommendation of some proper enterprise direct FC attached storage to deal with this mess.
This stuff isn't supported (and has NEVER been supported to use vMotion or iSCSI over 100Mbps) in the past 8 years I've worked with VMware.
As we discussed my home lab from 5 years ago was in a better state for availability and performance and supportability.
-
@John-Nicholson said in Domain Controller Down (VM):
Steps everyone else missed.
We just got access about the time you became available so it worked out well.
-
@John-Nicholson said in Domain Controller Down (VM):
(and has NEVER been supported to use vMotion or iSCSI over 100Mbps) in the past 8 years I've worked with VMware.
Among all the many things wrong, this right here is a killer thing. whoever set this up was nearly criminal.
-
@JaredBusch said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
(and has NEVER been supported to use vMotion or iSCSI over 100Mbps) in the past 8 years I've worked with VMware.
Among all the many things wrong, this right here is a killer thing. whoever set this up was nearly criminal.
Agreed. Slowly but surely changes will be made. Thank god for ML though. For real.
-
it's amazing that someone spent all that money on Cisco USCs but didn't put in any type of 1 GB switches.
-
@Dashrender I think I said word for word exactly what you said when i first started
-
They complain about slow network and they haven't approved a GB switch yet?
-
@Dashrender said in Domain Controller Down (VM):
They complain about slow network and they haven't approved a GB switch yet?
WTF??!?!
-
@John-Nicholson said in Domain Controller Down (VM):
@coliver said in Domain Controller Down (VM):
Looks like an ipod... This is going to be interesting in the long term. Those Cisco chassis can do some expending though so you may be able to get to a more reliable system with what you have.
Actually UCS can't really expand much from a storage perspective. They don't have any native DAS JBOD support, and the MegaRaids on them they do little in integration or customization. UCS was never really designed to use local storage in RAID I'm convinced (at scale anyways). They are useful if your using them in true JBOD (VSAN, they are certified for use) or with HBA's to talk to an external disk array.
Definitely not designed for that use.
-
@JaredBusch said in Domain Controller Down (VM):
@wirestyle22 said in Domain Controller Down (VM):
@Dashrender I am. Thanks everyone. You guys really banded together to help me. I really really appreciate it.
You would have been here a lot sooner had you told people to go away and leave you alone. the time gaps in this thread are telling.
Yes, you should report this to management that you were hampered not only from lack of documentation and obviously having important information hidden... but that the existing people were fooling around like this was a joke to them.
-
@scottalanmiller said in Domain Controller Down (VM):
@John-Nicholson said in Domain Controller Down (VM):
@coliver said in Domain Controller Down (VM):
Looks like an ipod... This is going to be interesting in the long term. Those Cisco chassis can do some expending though so you may be able to get to a more reliable system with what you have.
Actually UCS can't really expand much from a storage perspective. They don't have any native DAS JBOD support, and the MegaRaids on them they do little in integration or customization. UCS was never really designed to use local storage in RAID I'm convinced (at scale anyways). They are useful if your using them in true JBOD (VSAN, they are certified for use) or with HBA's to talk to an external disk array.
Definitely not designed for that use.
Other thing is get some training. There's some rookie mistakes lurking in that config that scare me about other things...
http://vmware.stanly.edu is the poor mans path to a VCP. Grab a copy of mastering vSphere. The HA deep dive book is free now if you know where to look. and increasingly we'll have storage and availability documents lurking at storagehub.vmware.com
-
@wirestyle22 inherited that nightmare.
-
@Dashrender said in Domain Controller Down (VM):
@wirestyle22 inherited that nightmare.
This is true... But getting some training is rarely a bad thing. 8-)
-
Alright, so I'm going to need to fix some DNS issues which I think @JaredBusch informed me of yesterday. When I perform
nslookup
on my file server and printer server it kicks back an error stating:DNS request timed out.
Default Server: Unknown
Address: 192.168.10.16 <-- IP of my domain controller (2003) -
@dafyre said in Domain Controller Down (VM):
@Dashrender said in Domain Controller Down (VM):
@wirestyle22 inherited that nightmare.
This is true... But getting some training is rarely a bad thing. 8-)
I agree. Training and documentation are key once this fire is fully out.
-
WOW it's taken me most of the afternoon to get through all that thread! No back to the normal threads...........hold one it's home time
-
@hobbit666 said in Domain Controller Down (VM):
WOW it's taken me most of the afternoon to get through all that thread! No back to the normal threads...........hold one it's home time
You made it through?