I woke up to find my home server, mosearcserver, completely unresponsive. It had been offline for a week. My initial fear was a hardware failure, a Kernel Panic, or the dreaded "Out of Memory" (OOM) killer terminating critical processes.
I booted the machine back up manually and began the investigation to find out why it "crashed."
Step 1: Analyzing the Logs
The first step was to look at the logs from the previous boot cycle to see what happened right before the system went dark. I used journalctl to inspect the final moments:
journalctl -b -1 -e
I expected to see error messages or an abrupt end to the logs. Instead, I found this:
gen 10 02:09:04 mosearcserver systemd-logind[954]: The system will power off now!
gen 10 02:09:04 mosearcserver systemd-logind[954]: System is powering down.
gen 10 02:09:04 mosearcserver systemd[1]: Stopping apache2.service...
gen 10 02:09:04 mosearcserver systemd[1]: Stopping docker.service...
gen 10 02:09:13 mosearcserver systemd[1]: Reached target poweroff.target - System Power Off.
Analysis: This was not a crash. The logs didn't stop abruptly; the system performed a clean, graceful shutdown. It stopped services, unmounted drives, and powered off safely.
Step 2: Finding the "Trigger"
If the OS shut down gracefully, something must have commanded it to do so. I had three suspects:
- Overheating: Did the motherboard trigger a thermal shutdown?
- User Error: Did I (or a script) issue a command?
- Power: Was it a battery issue?
I ruled out overheating by checking thermald logs and current sensors. The system runs cool (around 37°C), and there were no "Critical temperature" warnings in the logs.
Next, I checked the shutdown history to see who issued the command by running last -x shutdown.
The Result:
shutdown system down 6.8.0-90-generic Sat Jan 10 02:09 - 10:17 (7+08:07)
The user listed was system down. If I had run the command via SSH, it would have said mosearc or root. "System down" indicates the Kernel or init system made the decision automatically.
Step 3: The Root Cause
Since mosearcserver runs on laptop hardware, it has a built-in battery and runs the upower service.
Putting the clues together:
- Time: 02:09 AM (unlikely time for human intervention).
- Hardware: Laptop with
upoweractive. - Log Trigger:
systemd-logindinitiated the shutdown.
Conclusion
The server experienced a Power Outage (or the AC adapter was disconnected). The laptop switched to battery power and ran until it hit the critical threshold (usually 5%). At that point, upower instructed systemd to perform an emergency shutdown to prevent data corruption.
Because the laptop is not configured to restart automatically after power loss, it stayed off for 7 days until I manually powered it on.
The Fix & Prevention
To prevent this downtime in the future, I have implemented the following fixes:
- Hardware Check: Verified the power adapter connection is tight and secure.
- BIOS Configuration: I am changing the BIOS setting "Restore on AC Power Loss" to "Power On". This ensures that if the battery drains and the laptop dies, it will automatically boot up as soon as electricity is restored.
- Monitoring: I verified that the
upowerservice is active to ensure safe shutdowns continue to happen if power is lost again. - Take out the Battery A solution can be taking ot the batthery since it is very old (like the whole pc) and it is not working well anymore but I have not physical access to the sever for at least 1 month so I cannot try it now :(
Summary
What looked like a server crash was actually a successful safety mechanism. The system recognized a critical power loss and shut itself down to save the file system. Case closed!