Overheating NICs in SuperMicro on FreeBSD
-
OMG, they got on the IPMI and internal sensors peg the system at over 115C!! Holy crap. Never seen a server get that hot and come back from it. Amazing that those disks still spin. They've got the vendor on the phone right now getting them to look into it.
-
Holy cats!
-
@scottalanmiller said in Overheating NICs in SuperMicro on FreeBSD:
OMG, they got on the IPMI and internal sensors peg the system at over 115C!! Holy crap. Never seen a server get that hot and come back from it. Amazing that those disks still spin. They've got the vendor on the phone right now getting them to look into it.
They are pushing the limit on the smoke valve... jeeze.
-
Onboard NICs? Strange
-
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
-
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Most ICs should survive this. AMD had a range of GPUs reaching 95° - during regular use with no OC involved. But 115°C is a lot.
Broken sensors / reporting could be a reason, good point. @scottalanmiller: Got an infrared camera?
-
@thwr said in Overheating NICs in SuperMicro on FreeBSD:
Onboard NICs? Strange
Yup, on board 10GigE. Getting more and more common these days.
-
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Close but not quite. And we have several reasons to believe that it is really that hot. But false reporting is still a possibility.
-
@thwr said in Overheating NICs in SuperMicro on FreeBSD:
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Most ICs should survive this. AMD had a range of GPUs reaching 95° - during regular use with no OC involved. But 115°C is a lot.
Broken sensors / reporting could be a reason, good point. @scottalanmiller: Got an infrared camera?
It's not holding that temp, only spiking to it once a day or less.
-
@scottalanmiller said in Overheating NICs in SuperMicro on FreeBSD:
@thwr said in Overheating NICs in SuperMicro on FreeBSD:
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Most ICs should survive this. AMD had a range of GPUs reaching 95° - during regular use with no OC involved. But 115°C is a lot.
Broken sensors / reporting could be a reason, good point. @scottalanmiller: Got an infrared camera?
It's not holding that temp, only spiking to it once a day or less.
Any correlation between network traffic and the temperature spikes? Does it happen at the same time every day, etc, etc?
-
@scottalanmiller said in Overheating NICs in SuperMicro on FreeBSD:
@thwr said in Overheating NICs in SuperMicro on FreeBSD:
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Most ICs should survive this. AMD had a range of GPUs reaching 95° - during regular use with no OC involved. But 115°C is a lot.
Broken sensors / reporting could be a reason, good point. @scottalanmiller: Got an infrared camera?
It's not holding that temp, only spiking to it once a day or less.
Just talked to a friend who is much more into soldering etc than me. He said that 150°C is a temperature to look at, because the so-called https://en.wikipedia.org/wiki/Glass_transition might come into effect. As for the IC itself, there's a "https://en.wikipedia.org/wiki/Junction_temperature" to keep an eye on. Both are related more or less.
-
@dafyre said in Overheating NICs in SuperMicro on FreeBSD:
@scottalanmiller said in Overheating NICs in SuperMicro on FreeBSD:
@thwr said in Overheating NICs in SuperMicro on FreeBSD:
@momurda said in Overheating NICs in SuperMicro on FreeBSD:
It could just be a problem with reporting, and not actually be that hot. I hope. Cuz 115 is melty time.
Most ICs should survive this. AMD had a range of GPUs reaching 95° - during regular use with no OC involved. But 115°C is a lot.
Broken sensors / reporting could be a reason, good point. @scottalanmiller: Got an infrared camera?
It's not holding that temp, only spiking to it once a day or less.
Any correlation between network traffic and the temperature spikes? Does it happen at the same time every day, etc, etc?
Yes, appears to be loosely related.
-
Getting the entire motherboard replaced straight away.
-
Breaking the LAG to potentially reduce flips and load on the NICs. @Mike-Davis
-
It was amazing that Scott found it so fast. I was on the Windows side of things. Inside Windows they were using the iSCSI initiator to connect to the FreeNAS. All the sudden Windows would just log a ton of iSCSI events and go down.
I looked up the events and most people resolved them by putting the iSCSI traffic on a separate NIC. This happened two days in a row at about the same time each day. I was looking at snapshot, backup, etc times when Scott found it in the FreeNAS logs.