- new
- past
- show
- ask
- show
- jobs
- submit
If you have ever read the software control category definitions in MIL-STD-882E you know that the definitions that this blog author gives are very much his interpretation. The actual definitions in 882E are a god awful mess. Multiple contradictory definitions provided for the same category. Additional parenthetical statements that are intended to clarify, but just muddy the picture further. Yikes...
""" A second important dimension is criticality, the potential damage caused by an undetected defect: loss of comfort (C), loss of discretionary moneys (D), loss of essential moneys (E), and loss of life (L). """
(my rephrasing): he points that the more one moves further into that list, the more hardened/disciplined the way of making should be. From "anything goes" in the beginning to "no exceptions whatsoever" in the end.
[1] https://www.researchgate.net/publication/234820806_Crystal_c...
I've found that a quality process that starts with "you need to comprehensively understand what you're engineering" is almost universally a non-starter for anyone not already using these things. Putting together an exhaustive list of all the ways code interacts with the outside world is hard. If a few engineers actually manage it, they're rarely empowered to make meaningful decisions on whether the consequences of failures are acceptable or fix things if they're not.
Much better to do as you say and think about the software and its role in the system. There are more and less formal ways to do this, but it's definitely better than taking a component view.
And the article you intended to link is just wrong. E.g. the Therac-25 was not designed to output high power when an operator typed quickly; it was built in such a way to do so. This would be analogous to describing an airplane failure due to using bolts that were too weak: "the bolt didn't fail; it broke under exactly the forces you would expect it to break from its size; if they wanted it to not break, they should have used a larger bolt!" Just like in the Therac example, the failure would be consistently reproducible.
1. Generally referred to as a "failure" of the part
2. Closely analogous to many software defects that cause system failure.
“The reason is that, in other fields [than software], people have to deal with the perversity of matter. [When] you are designing circuits or cars or chemicals, you have to face the fact that these physical substances will do what they do, not what they are supposed to do. We in software don't have that problem, and that makes it tremendously easier. We are designing a collection of idealized mathematical parts which have definitions. They do exactly what they are defined to do.
And so there are many problems we [programmers] don't have. For instance, if we put an ‘if’ statement inside of a ‘while’ statement, we don't have to worry about whether the ‘if’ statement can get enough power to run at the speed it's going to run. We don't have to worry about whether it will run at a speed that generates radio frequency interference and induces wrong values in some other parts of the data. We don't have to worry about whether it will loop at a speed that causes a resonance and eventually the ‘if’ statement will vibrate against the ‘while’ statement and one of them will crack. We don't have to worry that chemicals in the environment will get into the boundary between the if statement and the while statement and corrode them, and cause a bad connection. We don't have to worry that other chemicals will get on them and cause a short-circuit. We don't have to worry about whether the heat can be dissipated from this ‘if’ statement through the surrounding ‘while’ statement. We don't have to worry about whether the ‘while’ statement would cause so much voltage drop that the ‘if’ statement won't function correctly. When you look at the value of a variable you don't have to worry about whether you've referenced that variable so many times that you exceed the fan-out limit. You don't have to worry about how much capacitance there is in a certain variable and how much time it will take to store the value in it.
All these things are defined a way, the system is defined to function in a certain way, and it always does. The physical computer might malfunction, but that's not the program's fault. So, because of all these problems we don't have to deal with, our field is tremendously easier.”
— Richard Stallman, 2001: <https://www.gnu.org/philosophy/stallman-mec-india.html#conf9>
The software may need to handle hardware failures, but software that doesn't do that also doesn't fail -- it's inadequately designed.
However, that "as long as" is doing quite a bit of work. In practice, we rarely have a perfect grasp of a real world program. In practice, there is divergence between what we think a program does and what it actually does, gaps in our knowledge, and so on. Naturally, this problem also afflicts mathematical approximations of physical systems.
[0] And even this is not entirely true. Think of a concurrent program. Race conditions can produce all sorts of weird results that are unpredictable. Perfect knowledge of the program will not tell you what the result will be.
In light of this, even software development has to focus on failures when you apply this standard. And that does include considerations like failures occurring with in the computer itself (faulty RAM or faulty CPU core).
Of course, the notion of "failure" itself presupposes a purpose. It is a normative notion, and there is no normativity without an aim or a goal.
So, sure, where human artifacts are concerned, we cannot talk about a part failing per se, because unlike natural kinds (like us, where the norm is intrinsic to us, hence why heart failure is an objective failure), the "should" or "ought" of an artifact is a matter of external human intention and expectation.
And as it turns out, a "role in a system" is precisely a teleological view. The system has an overall purpose (one we assign to it), and the role or function of any part is defined in terms of - and in service to - the overall goal. If the system goes from `a->d`, and one part goes from `a->b`, another `b->c`, and still another `c->d`, then the composition of these gives us the system. The meaning of the part comes from the meaning of the whole.
Has some fun anecdotes in it. My favorite being the nuclear certified supersonic aircraft having a latent defect discovered during integration of a new subsystem. Turns out all of the onboard flight computers crashed at the transition from sub to supersonic, thankfully the aircraft had enough inertia to "ride through" all of their flight computers simultaneously crashing during the transonic boundary.
Moral of that story is your software people need to have the vocabulary to understand the physical properties of the system they're working on.
Most of it happens, as always, at the interface. So these methodologies help you manage these interfaces between people, machine and product.
Maintaining it over time is even harder.
they want the benefits, and are willing to do everything except the things that actually help.