I'm looking for any assistance I can get, as I've been scouring Google for what to do here and I'm at a loss. We have a client with a single Dell T620 running Dell custom ESXi 6.0 U3 (no vCenter cluster) and it has been "randomly" rebooting every day at about 06:20. We've been back and forth with Dell providing logs and analysis from iDRAC with no hardware issues found. Unfortunately we only have hardware support from Dell on this server, so they drew their line in the sand and said we'd need to continue troubleshooting on our own or get the client to pay for additional support. There is no UPS management plugged into the server (management via USB/serial from UPS to server), and it has two PSUs with one going into two different UPS. I changed the ESXi host IP, root password, and hostname thinking there may be some PowerCLI script etc. running externally, no change. I brought a loaner on-site to do a V-to-V then brought the server back to work on in our depot. I performed LifeCycle Controller updates to update system firmware and drivers, blew away the RAID configs and rebuilt, then reinstalled ESXi fresh. After that I delivered it on-site again and went to do the V-to-V from loaner back to original server and found this morning the server rebooted yet again. At this point I'm just baffled. I went through the VMware documentation to the best of my ability (https://kb.vmware.com/s/article/1019238) regarding checking logs, however I cannot find anything that describes the source of the reboot. I've been searching mainly in hostd.log, syslog.log, vmkernel.log, and vxpa.log but most of it goes over my head and doesn't indicate a reason why the host rebooted. I'd love to blame the UPS as the source of the issue, but even checking the iDRAC Lifecycle logs around the time of the reboot I don't see PSU power loss, I see "SYS1001: System is turning off" and "SYS1000: System is turning on" logs which not only indicates that there isn't a loss of power, but they both also have "Comment: root" which indicates to me it is something going on in the ESXi environment. Any further help is appreciated, this has been escalated to myself and collective time poured into this issue is embarrassing
↧