Dell Server Not Recognizing Memory
-
@NashBrydges said in Dell Server Not Recognizing Memory:
Samsung ECC RDIMMs @ 16GB 1333Mhz memory
Did you notice this is the manual?
NOTE: 16 GB quad-rank RDIMMs are not supported.
Are you able to determine the specific part number for these DIMMs?
-
@Danp said in Dell Server Not Recognizing Memory:
@NashBrydges said in Dell Server Not Recognizing Memory:
Samsung ECC RDIMMs @ 16GB 1333Mhz memory
Did you notice this is the manual?
NOTE: 16 GB quad-rank RDIMMs are not supported.
Are you able to determine the specific part number for these DIMMs?
I'd check all the small numbers on the DIMMs.
It's possible that someone screwed up and didn't notice.
6x16GB of RAM that is not working is a total of 96GB RAM that is missing. That's a significant amount of the servers total RAM.
It's also possible that one CPU is faulty. Extemely rare though but not impossible. I believe the DIMMs are connected directly to the CPUs internal memory controller.
It's a slightly odd memory configuration so it's not unlikely that it has been upgraded during it's lifetime. Normally it's better to only use 8 DIMMs per CPU and if you need more than 16x16GB use 32GB LRDIMMs instead. Can't mix RDIMMs and LRDIMMs though which is another way to screw up
-
@NashBrydges said in Dell Server Not Recognizing Memory:
I've also run the Dell diagnostics utility on boot-up and everything checked out ok with a PASS on everything.
The diagnosis utility can't test what the CPU can't recognize or find. So it's of limited value.
-
@Danp I did, yeah, no quad rank dimms.
-
@Pete-S That's what I also thought. I will have to spend some more time digging all the module numbers out tomorrow once I'm back there. There has to be something mismatched somewhere. Can't imagine anything else at this point.
-
@NashBrydges said in Dell Server Not Recognizing Memory:
The weird thing is that the server is running "perfectly". I add the quotes because while there are no errors and all VMs are working well with no degradation in performance, there is obviously an issue.
This is what to be expected when the CPU doesn't recognize the memory.
What you have is an one CPU with full memory bandwidth and 192GB of memory and the other CPU with 96GB memory and probably only half memory bandwidth. So the server is less performant than it would normally have been.
-
@NashBrydges said in Dell Server Not Recognizing Memory:
@Pete-S That's what I also thought. I will have to spend some more time digging all the module numbers out tomorrow once I'm back there. There has to be something mismatched somewhere. Can't imagine anything else at this point.
If possible you should be prepared to swap the CPUs.
What kind of CPUs are in there? E5-26xx V2 something perhaps? V1 is probably more likely.
Troubleshooting quickly adds up so it might be time to consider what to do if the problem can't be solved easily. Like looking at the RAM and reseating it.
R720 is well over it's expected life span at this point. It's very much a possibility that the server is on the verge of catastrophic failure and this is the first sign.
-
@Pete-S The modules have all been reseated and swapped around to other slots and still the same thing. The same 6 slots remain unidentified (or unoccupied according to iDrac).
The CPUs are E5-2650 v1.
I've already had the conversation with the owner. Looks like we're going to keep things as they are since everything is operating normally (with the obvious missing RAM). We have good tested backups with another server to migrate the workload to in under an hour should something fail. He's unwilling to spend the cash on a new server and a deep diagnosis will be pretty pricy to pay for my time so...status quo for now.
-
@NashBrydges Guess you can take a horse to water but you can't force him to drink.
-
@NashBrydges Did you try switching the positions of existing CPUs?