logoalt Hacker News

kqrlast Thursday at 6:31 PM4 repliesview on HN

It doesn't help that many of the popular methodologies focus entirely on failures. They ask a bunch of questions in the style of "how likely is it that this part fails?" "what happens if it fails?" "how can we reduce the risk of it failing?" etc. But software never fails[1] so that's the wrong approach to start from!

Much better to do as you say and think about the software and its role in the system. There are more and less formal ways to do this, but it's definitely better than taking a component view.


Replies

aidenn0yesterday at 3:31 AM

Systems containing software fail, and the cause of that failure may originate in software.

And the article you intended to link is just wrong. E.g. the Therac-25 was not designed to output high power when an operator typed quickly; it was built in such a way to do so. This would be analogous to describing an airplane failure due to using bolts that were too weak: "the bolt didn't fail; it broke under exactly the forces you would expect it to break from its size; if they wanted it to not break, they should have used a larger bolt!" Just like in the Therac example, the failure would be consistently reproducible.

show 2 replies
ryandrakelast Thursday at 6:37 PM

FYI you added a [1] but didn't add the link to whatever you were going to reference!

show 2 replies
gmueckllast Thursday at 8:27 PM

While it is conceivably possible to write perfect software that will run flawlessly on a perfect computer forever, the reality is that the computer it runs on and the devices it controls will eventually fail - it's just a question of when and how, never if. A device that hasn't failed during its lifespan was simply not used long enough to fail.

In light of this, even software development has to focus on failures when you apply this standard. And that does include considerations like failures occurring with in the computer itself (faulty RAM or faulty CPU core).

show 1 reply
lo_zamoyskilast Thursday at 7:26 PM

Well, the failure in question is not the part failing to do what it is objectively defined to do, it is a failure to perform as we expect it to. Meaning, the failure is ours. Inductively, for `x` to FAIL means that either we failed to define `x` properly, or the `y` that simulates `x` (compiler, whatever...) has FAILed.

Of course, the notion of "failure" itself presupposes a purpose. It is a normative notion, and there is no normativity without an aim or a goal.

So, sure, where human artifacts are concerned, we cannot talk about a part failing per se, because unlike natural kinds (like us, where the norm is intrinsic to us, hence why heart failure is an objective failure), the "should" or "ought" of an artifact is a matter of external human intention and expectation.

And as it turns out, a "role in a system" is precisely a teleological view. The system has an overall purpose (one we assign to it), and the role or function of any part is defined in terms of - and in service to - the overall goal. If the system goes from `a->d`, and one part goes from `a->b`, another `b->c`, and still another `c->d`, then the composition of these gives us the system. The meaning of the part comes from the meaning of the whole.