logoalt Hacker News

lanstinyesterday at 1:37 AM0 repliesview on HN

While it is fun to have your code run for 500 days without restart, it is a bad architecture. You should be able to move load around from host to host or network to network without losing any work. This involves graceful draining and then shutting down the old.

For impossible errors exiting and sending the dev team as much info as possible (thread dump, memory dump, etc) is helpful.

In my experience logs are good for finding out what is wrong once you know something is wrong. Also if the server is written to have enough but not too much logging you can read them over and get a feel for normal operation.