How does restarting the process fix the crash? If the process crashed because a file was missing, it will still be missing when the process is restarted. Is an infinite crash-loop considered success in Erlang?
Elixir dev: It does not solve all issues. But sometimes you have some kind of rare bug that just happens once X,Z and Y happens in a specific order. If it is restarted it might not happen that way again. Or it might be a temporary problem. You are reaching for an API and it temporarily has issues. It might not have it anymore in 50 ms.
But of course if it crashes because you are reading a file that does not exist it doesnt solve the issue (but it avoids crashing the whole system).
It's not going to be missing the next time around. Usually the file is missing due to some concurrency-problem where the file only gets to exist a little later. A process restart certainly fixes this.
If the problem persists, a larger part of the supervision tree is restarted. This eventually leads to a crash of the full application, if nothing can proceed without this application existing in the Erlang release.
The key point is that there's a very large class of errors which is due to the concurrent interaction of different parts of the system. These problems often go away on the next try, because the risk of them occurring is low.
Typically you then let the error bubble up in the supervisor tree if restarting multiple times doesn't fix it.
Of course there are still errors that can't be recovered from, in which case the whole program may finally crash.
> Is an infinite crash-loop considered success in Erlang?
Of course not, but usually that's not what happens, instead a process crashes because some condition was not considered, the corresponding request is aborted, and a supervisor restarts the process (or doesn't because the acceptor spawns a process per request / client).
Or a long-running worker got into an incorrect state and crashed, and a supervisor will restart it in a known good state (that's a pretty common thing to do in hardware, BEAM makes that idiomatic in software).
I recommend https://ferd.ca/the-zen-of-erlang.html starting from "if my configuration file is corrupted, restarting won't fix anything". The tl;dr is it helps with transient bugs.
If the rest of the program is still running while you fix it, yes?
Also, restarting endlessly is just one strategy between multiple others.
I’m only an armchair expert on Erlang. But, having looked into it repeatedly for a couple decades, my take-away is the “Let it crash” slogan is good. But, also presented a bit out of context. Or, at least assuming context that most people don’t have.
Erlang is used in situations involving a zillion incoming requests. If an individual request fails… Maybe it was important. Maybe it wasn’t. If it was important, it’s expected they’ll try again. What’s most important is that the rest of the requests are not interrupted.
What makes Erlang different is that it is natural and trivial to be able to shut down an individual request on the event of an error without worrying about putting any other part of the system into a bad state.
You can pull this off in other languages via careful attention to the details of your request-handling code. But, the creators of the Erlang language and foundational frameworks have set their users up for success via careful attention to the design of the system as a whole.
That’s great in the contexts in which Erlang is used. But, in the context of a Java desktop app like Open Office, it’s more like saying “Let it throw”. “It” being some user action. And, the slogan being to have a language and framework with such robust exception handling built-in that error handling becomes trivial and nearly invisible.