It is very strange that a post trying to explain the concept of "let it crash" in Elixir (which runs on the BEAM VM) does not mention the doctoral thesis of Joe Armstrong: "Making reliable distributed systems in the presence of software errors".
It must be compulsory lecture for anybody interested in reliable systems, even if they do not use the BEAM VM.
https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A104...
Some core ideas from the paper for the inpatient (failures, isolation, healing):
- Failures are inevitabe, so systems must be designed to EXPECT and recover from them, NOT AVOID them completely.
- Let it crash philosophy allows components to FAIL and RECOVER quickly using supervision trees.
- Processes should be ISOLATED and communicate via MESSAGE PASSING, which prevents cascading failures.
- Supervision trees monitor other processes and RESTART them when they fail, creating a self-healing architecture.