I think a lot of folks who have never looked at Erlang or Elixir and BEAM before misunderstand this ...

Jtsummers • last Sunday at 5:02 PM • 0 replies • view on HN

I think a lot of folks who have never looked at Erlang or Elixir and BEAM before misunderstand this concept because they don't understand how fine-grained processes are, or can be, in Erlang. A very important note: Processes in BEAM languages are cheap, both to create and for context switching, compared to OS threads. While design-wise they offer similar capabilities, this cost difference results in a substantially different approach to design in Erlang than in systems where the cost of introducing and switching between threads is more expensive.

In a more conventional language where concurrency is relatively expensive, and assuming you're not an idiot who writes 1-10k SLOC functions, you end up with functions that have a "single responsibility" (maybe not actually a single responsibility, but closer to it than having 100 duties in one function) near the bottom of your call tree, but they all exist in one thread of execution. In a system, hypothetical, created in this model if your lowest level function is something like:

  retrieve_data(db_connection, query_parameters) -> data

And the database connection fails, would you attempt to restart the database connection in this function? Maybe, but that'd be bad design. You'd most likely raise an exception or change the signature so you could express an error return, in Rust and similar it would become something like:

  retrieve_data(db_connection, query_parameters) -> Result<data, error>

Somewhere higher in the call stack you have a handler which will catch the exception or process the error and determine what to do. That is, the function `retrieve_data` crashes, it fails to achieve its objective and does not attempt any corrective action (beyond maybe a few retries in case the error is transient).

In Erlang, you have a supervision tree which corresponds to this call tree concept but for processes. The process handling data retrieval, having been given some db_conn handler and the parameters, will fail for some reason. Instead of handling the error in this process, the process crashes. The failure condition is passed to the supervisor which may or may not have a handler for this situation.

You might put the simple retry policy in the supervisor (that basic assumption of transient errors, maybe a second or third attempt will succeed). It might have other retry policies, like trying the request again but with a different db_connection (that other one must be bad for some reason, perhaps the db instance it references is down). If it continues to fail, then this supervisor will either handle the error some other way (signaling to another process that the db is down, fix it or tell the supervisor what to do) or perhaps crash itself. This repeats all the way up the supervision tree, ultimately it could mean bringing down the whole system if the error propagates to a high enough level.

This is conceptually no different than how errors and exceptions are handled in sequential, non-concurrent systems. You have handlers that provide mechanisms for retrying or dealing with the errors, and if you don't the error is propagated up (hopefully you don't continue running in a known-bad state) until it is handled or the program crashes entirely.

In languages that offer more expensive concurrency (traditional OS threads), the cost of concurrency (in memory and time) means you end up with a policy that sits somewhere between Erlang's and a straight-line sequential program. Your threads will be larger than Erlang processes so they'll include more error handling within themselves, but ultimately they can still fail and you'll have a supervisor of some sort that determines what happens next (hopefully).

As more languages move to cheap concurrency (Go's goroutines, Java's virtual threads), system designs have a chance to shift closer to Erlang than that straight-line sequential approach if people are willing to take advantage of it.

alt Hacker News