XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
We need to see the Xen logs, too. Those are the logs with "xen" in the name.
I can't upload the .gz to here. Can I open the contents in wordpad? Sorry if I missed something. -
No current logs at all?
-
It ate my text. How do I display the logs to you? Sorry. A lot of new knowledge in a short period of time, combined with stress.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
No current logs at all?
-
Sorry, ate my text again. Unless I should be looking somewhere else...
-
Is there any way to tell why the hosts can't even attach to their own local storage, or why it was unplugged in the first place?
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Sorry, ate my text again. Unless I should be looking somewhere else...
Get us the SMlog tail...
tail -n 100 SMlog
-
Then let's see what is in the xen folder, too.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
Sorry, ate my text again. Unless I should be looking somewhere else...
Get us the SMlog tail...
tail -n 100 SMlog
Dec 27 13:24:41 xen2 SM: [6456] |- 5:0:0:1 sdf 8:80 failed faulty running Dec 27 13:24:41 xen2 SM: [6456] `- 7:0:0:1 sdl 8:176 active ready running Dec 27 13:24:41 xen2 SM: [6456] multipathd> Dec 27 13:24:41 xen2 SM: [6456] MPATH: Set val: [1, 4] Dec 27 13:24:41 xen2 SM: [6456] Matched SCSIid, updating 360024e800070ed06000007f54dba7bfb Dec 27 13:24:41 xen2 SM: [6456] MPATH: Updating entry for [360024e800070ed06000007f54dba7bfb], current: [0, 4] Dec 27 13:24:41 xen2 SM: [6456] MPATH: Update done Dec 27 13:24:41 xen2 SM: [6456] lock: closed /var/lock/sm/mpathcount2/host Dec 27 13:24:41 xen2 SM: [6456] lock: released /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6456] lock: closed /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6579] lock: acquired /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6579] lock: released /var/lock/sm/mpathcount2/host Dec 27 13:24:41 xen2 SM: [6579] MPATH: I get the lock Dec 27 13:24:41 xen2 SM: [6579] Matched SCSIid, updating 360024e80007b786a000004134a9670f3 Dec 27 13:24:41 xen2 SM: [6579] MPATH: Updating entry for [360024e80007b786a000004134a9670f3], current: [2, 4] Dec 27 13:24:41 xen2 SM: [6579] Matched SCSIid, updating 360024e800054baef0000ddd550c85082 Dec 27 13:24:41 xen2 SM: [6579] MPATH: Updating entry for [360024e800054baef0000ddd550c85082], current: [4, 4] Dec 27 13:24:41 xen2 SM: [6579] Matched SCSIid, updating 360024e800070ed06000004314a967309 Dec 27 13:24:41 xen2 SM: [6579] MPATH: Updating entry for [360024e800070ed06000004314a967309], current: [1, 4] Dec 27 13:24:41 xen2 SM: [6579] mpath cmd: show map 360024e800070ed06000004314a967309 topology Dec 27 13:24:41 xen2 SM: [6663] MPATH: Trying to acquire the lock Dec 27 13:24:41 xen2 SM: [6663] lock: tried lock /var/lock/sm/mpathcount1/host, acquired: False (exists: True) Dec 27 13:24:41 xen2 SM: [6663] lock: tried lock /var/lock/sm/mpathcount2/host, acquired: True (exists: True) Dec 27 13:24:41 xen2 SM: [6663] Failed to lock /var/lock/sm/mpathcount1/host on first attempt, blocked by PID 6579 Dec 27 13:24:41 xen2 SM: [6666] MPATH: Trying to acquire the lock Dec 27 13:24:41 xen2 SM: [6666] lock: tried lock /var/lock/sm/mpathcount1/host, acquired: False (exists: True) Dec 27 13:24:41 xen2 SM: [6666] lock: tried lock /var/lock/sm/mpathcount2/host, acquired: False (exists: True) Dec 27 13:24:41 xen2 SM: [6666] lock: closed /var/lock/sm/mpathcount2/host Dec 27 13:24:41 xen2 SM: [6666] lock: closed /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6579] mpath output: multipathd> 360024e800070ed06000004314a967309 dm-0 DELL,MD3000i Dec 27 13:24:41 xen2 SM: [6579] size=1.4T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw Dec 27 13:24:41 xen2 SM: [6579] |-+- policy='round-robin 0' prio=4 status=active Dec 27 13:24:41 xen2 SM: [6579] | |- 6:0:0:1 sdi 8:128 active ghost running Dec 27 13:24:41 xen2 SM: [6579] | `- 4:0:0:1 sdc 8:32 failed faulty running Dec 27 13:24:41 xen2 SM: [6579] `-+- policy='round-robin 0' prio=3 status=enabled Dec 27 13:24:41 xen2 SM: [6579] |- 5:0:0:1 sdf 8:80 failed faulty running Dec 27 13:24:41 xen2 SM: [6579] `- 7:0:0:1 sdl 8:176 active ready running Dec 27 13:24:41 xen2 SM: [6579] multipathd> Dec 27 13:24:41 xen2 SM: [6579] mpath cmd: show map 360024e800070ed06000004314a967309 topology Dec 27 13:24:41 xen2 SM: [6579] mpath output: multipathd> 360024e800070ed06000004314a967309 dm-0 DELL,MD3000i Dec 27 13:24:41 xen2 SM: [6579] size=1.4T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw Dec 27 13:24:41 xen2 SM: [6579] |-+- policy='round-robin 0' prio=4 status=active Dec 27 13:24:41 xen2 SM: [6579] | |- 6:0:0:1 sdi 8:128 active ghost running Dec 27 13:24:41 xen2 SM: [6579] | `- 4:0:0:1 sdc 8:32 failed faulty running Dec 27 13:24:41 xen2 SM: [6579] `-+- policy='round-robin 0' prio=3 status=enabled Dec 27 13:24:41 xen2 SM: [6579] |- 5:0:0:1 sdf 8:80 failed faulty running Dec 27 13:24:41 xen2 SM: [6579] `- 7:0:0:1 sdl 8:176 active ready running Dec 27 13:24:41 xen2 SM: [6579] multipathd> Dec 27 13:24:41 xen2 SM: [6579] MPATH: Set val: [2, 4] Dec 27 13:24:41 xen2 SM: [6579] Matched SCSIid, updating 360024e800070ed06000007f54dba7bfb Dec 27 13:24:41 xen2 SM: [6579] MPATH: Updating entry for [360024e800070ed06000007f54dba7bfb], current: [0, 4] Dec 27 13:24:41 xen2 SM: [6579] MPATH: Update done Dec 27 13:24:41 xen2 SM: [6579] lock: closed /var/lock/sm/mpathcount2/host Dec 27 13:24:41 xen2 SM: [6579] lock: released /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6579] lock: closed /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6663] lock: acquired /var/lock/sm/mpathcount1/host Dec 27 13:24:41 xen2 SM: [6663] lock: released /var/lock/sm/mpathcount2/host Dec 27 13:24:41 xen2 SM: [6663] MPATH: I get the lock Dec 27 13:24:41 xen2 SM: [6663] Matched SCSIid, updating 360024e80007b786a000004134a9670f3 Dec 27 13:24:41 xen2 SM: [6663] MPATH: Updating entry for [360024e80007b786a000004134a9670f3], current: [2, 4] Dec 27 13:24:41 xen2 SM: [6663] Matched SCSIid, updating 360024e800054baef0000ddd550c85082 Dec 27 13:24:41 xen2 SM: [6663] MPATH: Updating entry for [360024e800054baef0000ddd550c85082], current: [4, 4] Dec 27 13:24:41 xen2 SM: [6663] Matched SCSIid, updating 360024e800070ed06000004314a967309 Dec 27 13:24:41 xen2 SM: [6663] MPATH: Updating entry for [360024e800070ed06000004314a967309], current: [2, 4] Dec 27 13:24:41 xen2 SM: [6663] mpath cmd: show map 360024e800070ed06000004314a967309 topology Dec 27 13:24:41 xen2 SM: [6663] mpath output: multipathd> 360024e800070ed06000004314a967309 dm-0 DELL,MD3000i Dec 27 13:24:41 xen2 SM: [6663] size=1.4T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw Dec 27 13:24:41 xen2 SM: [6663] |-+- policy='round-robin 0' prio=4 status=active Dec 27 13:24:41 xen2 SM: [6663] | |- 6:0:0:1 sdi 8:128 active ghost running Dec 27 13:24:41 xen2 SM: [6663] | `- 4:0:0:1 sdc 8:32 failed faulty running Dec 27 13:24:41 xen2 SM: [6663] `-+- policy='round-robin 0' prio=3 status=enabled Dec 27 13:24:41 xen2 SM: [6663] |- 5:0:0:1 sdf 8:80 failed faulty running Dec 27 13:24:41 xen2 SM: [6663] `- 7:0:0:1 sdl 8:176 active ready running Dec 27 13:24:41 xen2 SM: [6663] multipathd> Dec 27 13:24:41 xen2 SM: [6663] mpath cmd: show map 360024e800070ed06000004314a967309 topology Dec 27 13:24:42 xen2 SM: [6663] mpath output: multipathd> 360024e800070ed06000004314a967309 dm-0 DELL,MD3000i Dec 27 13:24:42 xen2 SM: [6663] size=1.4T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw Dec 27 13:24:42 xen2 SM: [6663] |-+- policy='round-robin 0' prio=4 status=active Dec 27 13:24:42 xen2 SM: [6663] | |- 6:0:0:1 sdi 8:128 active ghost running Dec 27 13:24:42 xen2 SM: [6663] | `- 4:0:0:1 sdc 8:32 failed faulty running Dec 27 13:24:42 xen2 SM: [6663] `-+- policy='round-robin 0' prio=3 status=enabled Dec 27 13:24:42 xen2 SM: [6663] |- 5:0:0:1 sdf 8:80 failed faulty running Dec 27 13:24:42 xen2 SM: [6663] `- 7:0:0:1 sdl 8:176 active ready running Dec 27 13:24:42 xen2 SM: [6663] multipathd> Dec 27 13:24:42 xen2 SM: [6663] Matched SCSIid, updating 360024e800070ed06000007f54dba7bfb Dec 27 13:24:42 xen2 SM: [6663] MPATH: Updating entry for [360024e800070ed06000007f54dba7bfb], current: [0, 4] Dec 27 13:24:42 xen2 SM: [6663] MPATH: Update done Dec 27 13:24:42 xen2 SM: [6663] lock: closed /var/lock/sm/mpathcount2/host Dec 27 13:24:42 xen2 SM: [6663] lock: released /var/lock/sm/mpathcount1/host Dec 27 13:24:42 xen2 SM: [6663] lock: closed /var/lock/sm/mpathcount1/host Dec 27 13:24:42 xen2 SM: [6728] MPATH: Trying to acquire the lock Dec 27 13:24:42 xen2 SM: [6728] lock: tried lock /var/lock/sm/mpathcount1/host, acquired: True (exists: True) Dec 27 13:24:42 xen2 SM: [6728] MPATH: I get the lock Dec 27 13:24:42 xen2 SM: [6728] Matched SCSIid, updating 360024e80007b786a000004134a9670f3 Dec 27 13:24:42 xen2 SM: [6728] MPATH: Updating entry for [360024e80007b786a000004134a9670f3], current: [2, 4] Dec 27 13:24:42 xen2 SM: [6728] Matched SCSIid, updating 360024e800054baef0000ddd550c85082 Dec 27 13:24:42 xen2 SM: [6728] MPATH: Updating entry for [360024e800054baef0000ddd550c85082], current: [4, 4] Dec 27 13:24:42 xen2 SM: [6728] Matched SCSIid, updating 360024e800070ed06000004314a967309 Dec 27 13:24:42 xen2 SM: [6728] MPATH: Updating entry for [360024e800070ed06000004314a967309], current: [2, 4] Dec 27 13:24:42 xen2 SM: [6728] mpath cmd: show map 360024e800070ed06000004314a967309 topology
-
-
@CitrixNewbJD okay, no need to look in the xen folder, that's not going to be a useful log
-
I'm discussing conversion options with Scale. Looks like moving the disk images over would not be too hard as long as you can pull them off of the SAN.
-
If you do this...
cd ..
It will take you back to the main logs. What is the output of this...
ls | grep -i xen
-
While you are working on that, this is a good time to discuss triage. In all honesty, this is an old, unmaintained cluster. What is the hardware of the compute nodes?
The storage is poor. Even if we fix things, the system is very old in both software and hardware. So it's only a bandaid. Which is very doable, but it is what it is.
Doing a migration, like right now, might be the logical path forward. Let's get your specs for the current cluster, concerns for a new one and discuss if that's a real possibility. Also, what is the moment to moment impact of the outage?
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
If you do this...
cd ..
It will take you back to the main logs. What is the output of this...
ls | grep -i xen
[root@xen2 ~]# ls | grep -i xen xen2pooldb [root@xen2 ~]#
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
If you do this...
cd ..
It will take you back to the main logs. What is the output of this...
ls | grep -i xen
[root@xen2 ~]# ls | grep -i xen xen2pooldb [root@xen2 ~]#
What does this return...
pwd
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
While you are working on that, this is a good time to discuss triage. In all honesty, this is an old, unmaintained cluster. What is the hardware of the compute nodes?
The storage is poor. Even if we fix things, the system is very old in both software and hardware. So it's only a bandaid. Which is very doable, but it is what it is.
Doing a migration, like right now, might be the logical path forward. Let's get your specs for the current cluster, concerns for a new one and discuss if that's a real possibility. Also, what is the moment to moment impact of the outage?
I actually have a quote for a cluster from Scale that I had planned to try to push forward with the owner of the company when he returns from Europe next week. In the interest of properly giving you the information that you need, I want to be sure you're aware of that.
-
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
If you do this...
cd ..
It will take you back to the main logs. What is the output of this...
ls | grep -i xen
[root@xen2 ~]# ls | grep -i xen xen2pooldb [root@xen2 ~]#
What does this return...
pwd
/root
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
If you do this...
cd ..
It will take you back to the main logs. What is the output of this...
ls | grep -i xen
[root@xen2 ~]# ls | grep -i xen xen2pooldb [root@xen2 ~]#
What does this return...
pwd
/root
that explains a lot - I think you want to be in /var/logs
cd /var/logs
-
@CitrixNewbJD said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
@scottalanmiller said in XenServer 6.2 servers down. I have no Xen skill. Most likely networking? Help!:
While you are working on that, this is a good time to discuss triage. In all honesty, this is an old, unmaintained cluster. What is the hardware of the compute nodes?
The storage is poor. Even if we fix things, the system is very old in both software and hardware. So it's only a bandaid. Which is very doable, but it is what it is.
Doing a migration, like right now, might be the logical path forward. Let's get your specs for the current cluster, concerns for a new one and discuss if that's a real possibility. Also, what is the moment to moment impact of the outage?
I actually have a quote for a cluster from Scale that I had planned to try to push forward with the owner of the company when he returns from Europe next week. In the interest of properly giving you the information that you need, I want to be sure you're aware of that.
So the quote is already spec'd and ready to go? Maybe call him in Europe and see if things can be expedited? Would suck to have a major recovery operation happen just to hold off a discussion by one week.