logoalt Hacker News

IgorPartolatoday at 1:11 AM4 repliesview on HN

See I would not reboot the server first before figuring out what is happening. You lose a lot of info by doing that and the worst thing that can happen is that the problem goes away for a little bit.


Replies

gerdesjtoday at 1:30 AM

To be fair, turning it off and on again is unreasonably effective.

I recently diagnosed and fixed an issue with Veeam backups that suddenly stopped working part way through the usual window and stopped working from that point on. This particular setup has three sites (prod, my home and DR), and five backup proxies. Anyway, I read logs and Googled somewhat. I rebooted the backup server - no joy, even though it looked like the issue was there. I restarted the proxies and things started working again.

The error was basically: there are no available proxies, even though they were all available (but not working but not giving off "not working" vibes).

I could bother with trying to look for what went wrong but life is too short. This is the first time that pattern has happened to me (I'll note it down mentally and it was logged in our incident log).

So, OK, I'll agree that a reboot should not generally be the first option. Whilst sciencing it or nerding harder is the purist approach, often a cheeky reboot gets the job done. However, do be aware that a Windows box will often decide to install updates if you are not careful 8)

show 1 reply
galleywest200today at 1:58 AM

My job as a DevOps engineer is to ensure customer uptime. If rebooting is the fastest, we do that. Figuring out the why is the primary developers’ jobs.

This is also a good reason to log everything all the time in a human readable way. You can get services up and then triage at your own pace after.

My job may be different than other’s as I work at an ITSP and we serve business phone lines. When business phones do not work it is immediately clear to our customers. We have to get them back up not just for their business but for the ability for them to dial 911.

show 1 reply
butvacuumtoday at 1:22 AM

most failstates arent worth preserving in a SMB environment. In larger environments or ones equipped for it a snapshot can be taken before rebooting- should the issue repeat.

Once is chance, twice is coincidence, three times makes a pattern.

show 1 reply
ValdikSStoday at 1:31 AM

I've debugged so many issues in my life that sometimes I'd prefer things to just work, and if reboot helps to at least postpone the problem, I'd choose that :D

show 1 reply