Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Military standard on software control levels (entropicthoughts.com)

63 points by ibobev 1 days ago | 30 comments

AlotOfReading 1 days ago [-]

A lot of people look at safety critical development standards to try and copy process bits for quality. In reality, 90% of the quality benefits come from sitting down to think about the software and its role in the overall system. You don't need all the fancy methodologies and expensive tools. It's also the main benefit you get from formal methods.

I've found that a quality process that starts with "you need to comprehensively understand what you're engineering" is almost universally a non-starter for anyone not already using these things. Putting together an exhaustive list of all the ways code interacts with the outside world is hard. If a few engineers actually manage it, they're rarely empowered to make meaningful decisions on whether the consequences of failures are acceptable or fix things if they're not.

kqr 1 days ago [-]

It doesn't help that many of the popular methodologies focus entirely on failures. They ask a bunch of questions in the style of "how likely is it that this part fails?" "what happens if it fails?" "how can we reduce the risk of it failing?" etc. But software never fails[1] so that's the wrong approach to start from!

Much better to do as you say and think about the software and its role in the system. There are more and less formal ways to do this, but it's definitely better than taking a component view.

aidenn0 19 hours ago [-]

Systems containing software fail, and the cause of that failure may originate in software.

And the article you intended to link is just wrong. E.g. the Therac-25 was not designed to output high power when an operator typed quickly; it was built in such a way to do so. This would be analogous to describing an airplane failure due to using bolts that were too weak: "the bolt didn't fail; it broke under exactly the forces you would expect it to break from its size; if they wanted it to not break, they should have used a larger bolt!" Just like in the Therac example, the failure would be consistently reproducible.

kqr 16 hours ago [-]

It sounds like our main disagreement lies around whether to call it "design error" or "build error" but I do not believe this erases the useful distinction between "error present in the thing from day one" and "unpredictable failure of component suddenly no longer doing what it used to do".

aidenn0 6 hours ago [-]

I think that's definitely part of it. I also believe that a physical component put under stresses it was not capable of bearing, even when those stresses were known to be within the expected environment at design time -- such as a bolt that was too weak for expected conditions -- is both:

1. Generally referred to as a "failure" of the part

2. Closely analogous to many software defects that cause system failure.

motxilo 16 hours ago [-]

You allude to the difference between requirements and constraints. What you say is true, but also it's true that the Therac-25 was not designed to not output high power when an operator typed quickly.

ryandrake 1 days ago [-]

FYI you added a [1] but didn't add the link to whatever you were going to reference!

teddyh 1 days ago [-]

It could have been this:

“The reason is that, in other fields [than software], people have to deal with the perversity of matter. [When] you are designing circuits or cars or chemicals, you have to face the fact that these physical substances will do what they do, not what they are supposed to do. We in software don't have that problem, and that makes it tremendously easier. We are designing a collection of idealized mathematical parts which have definitions. They do exactly what they are defined to do.

And so there are many problems we [programmers] don't have. For instance, if we put an ‘if’ statement inside of a ‘while’ statement, we don't have to worry about whether the ‘if’ statement can get enough power to run at the speed it's going to run. We don't have to worry about whether it will run at a speed that generates radio frequency interference and induces wrong values in some other parts of the data. We don't have to worry about whether it will loop at a speed that causes a resonance and eventually the ‘if’ statement will vibrate against the ‘while’ statement and one of them will crack. We don't have to worry that chemicals in the environment will get into the boundary between the if statement and the while statement and corrode them, and cause a bad connection. We don't have to worry that other chemicals will get on them and cause a short-circuit. We don't have to worry about whether the heat can be dissipated from this ‘if’ statement through the surrounding ‘while’ statement. We don't have to worry about whether the ‘while’ statement would cause so much voltage drop that the ‘if’ statement won't function correctly. When you look at the value of a variable you don't have to worry about whether you've referenced that variable so many times that you exceed the fan-out limit. You don't have to worry about how much capacitance there is in a certain variable and how much time it will take to store the value in it.

All these things are defined a way, the system is defined to function in a certain way, and it always does. The physical computer might malfunction, but that's not the program's fault. So, because of all these problems we don't have to deal with, our field is tremendously easier.”

— Richard Stallman, 2001: <https://www.gnu.org/philosophy/stallman-mec-india.html#conf9>

lukan 1 days ago [-]

Rowhammer, cosmic bitflip or hardware or just compiler bugs come to mind.

kqr 17 hours ago [-]

The first three are hardware failures, not software failures. The latter would be a design error, not a failure.

The software may need to handle hardware failures, but software that doesn't do that also doesn't fail -- it's inadequately designed.

teddyh 1 days ago [-]

None of those are something that you as a programmer should ever worry about.

lo_zamoyski 1 days ago [-]

He makes a valid distinction, in a very specific sense. As long as we understand a program correctly, then we understand its behavior completely [0]. The same cannot be said of spherical cows (which, btw, can be modeled by computers, which means programs inherit the problems of the model, in some sense, and all programs model something).

However, that "as long as" is doing quite a bit of work. In practice, we rarely have a perfect grasp of a real world program. In practice, there is divergence between what we think a program does and what it actually does, gaps in our knowledge, and so on. Naturally, this problem also afflicts mathematical approximations of physical systems.

[0] And even this is not entirely true. Think of a concurrent program. Race conditions can produce all sorts of weird results that are unpredictable. Perfect knowledge of the program will not tell you what the result will be.

kqr 19 hours ago [-]

Oops! https://entropicthoughts.com/software-never-fails

gmueckl 1 days ago [-]

While it is conceivably possible to write perfect software that will run flawlessly on a perfect computer forever, the reality is that the computer it runs on and the devices it controls will eventually fail - it's just a question of when and how, never if. A device that hasn't failed during its lifespan was simply not used long enough to fail.

In light of this, even software development has to focus on failures when you apply this standard. And that does include considerations like failures occurring with in the computer itself (faulty RAM or faulty CPU core).

kqr 16 hours ago [-]

The problem of focusing on failures is that such analysis misses all the losses that occur even when everything works as designed. Analysis has to focus on all losses -- both failures (often the trivial case) and non-failures (design errors, often trickier to find.)

lo_zamoyski 1 days ago [-]

Well, the failure in question is not the part failing to do what it is objectively defined to do, it is a failure to perform as we expect it to. Meaning, the failure is ours. Inductively, for `x` to FAIL means that either we failed to define `x` properly, or the `y` that simulates `x` (compiler, whatever...) has FAILed.

Of course, the notion of "failure" itself presupposes a purpose. It is a normative notion, and there is no normativity without an aim or a goal.

So, sure, where human artifacts are concerned, we cannot talk about a part failing per se, because unlike natural kinds (like us, where the norm is intrinsic to us, hence why heart failure is an objective failure), the "should" or "ought" of an artifact is a matter of external human intention and expectation.

And as it turns out, a "role in a system" is precisely a teleological view. The system has an overall purpose (one we assign to it), and the role or function of any part is defined in terms of - and in service to - the overall goal. If the system goes from `a->d`, and one part goes from `a->b`, another `b->c`, and still another `c->d`, then the composition of these gives us the system. The meaning of the part comes from the meaning of the whole.

mubbicles 1 days ago [-]

Another good document for military standards for software safety is AOP-52.

Has some fun anecdotes in it. My favorite being the nuclear certified supersonic aircraft having a latent defect discovered during integration of a new subsystem. Turns out all of the onboard flight computers crashed at the transition from sub to supersonic, thankfully the aircraft had enough inertia to "ride through" all of their flight computers simultaneously crashing during the transonic boundary.

Moral of that story is your software people need to have the vocabulary to understand the physical properties of the system they're working on.

MobiusHorizons 1 days ago [-]

I also generally find that people looking for “best practices” to follow are trying to avoid that “sitting down to think about the software and its role in the overall system” piece.

inamberclad 1 days ago [-]

Absolutely. If you look at an extensively used standard like DO-178C for avionics, it really says very little about how to program. Instead, the emphasis is on making sure that the software has implemented system level requirements correctly.

trklausss 1 days ago [-]

I see the fancy methodologies and processes as the way of streamlining what you have to do in order to "sit down to think about the software", particularly in teams of more than one developer.

Most of it happens, as always, at the interface. So these methodologies help you manage these interfaces between people, machine and product.

jcims 1 days ago [-]

>Putting together an exhaustive list of all the ways code interacts with the outside world is hard.

Maintaining it over time is even harder.

tehjoker 1 days ago [-]

I think the main benefit of these standards is that when someone proposes a project, the level gets evaluated and either enough (and appropriate) resources are allocated or it is killed in an ideal world.

AlotOfReading 1 days ago [-]

You'd hope. That's not always my experience. What I often see is cutting random bits off the development plan until the resource constraints are nominally satisfied, without much regard for whether the resulting plan is sensible. That's if there's a plan. Sometimes these systems get randomly assigned a level based on vibes, with the expectation that someone will later go back and fix the level if it's incorrect. This works about as well as commented TODOs.

exe34 1 days ago [-]

it's cargo culting. we see the same thing with "agile", which is often used as an excuse to just do what they were going to do anyway.

they want the benefits, and are willing to do everything except the things that actually help.

ldx1024 17 hours ago [-]

"Although the standard is a little more complicated..."

If you have ever read the software control category definitions in MIL-STD-882E you know that the definitions that this blog author gives are very much his interpretation. The actual definitions in 882E are a god awful mess. Multiple contradictory definitions provided for the same category. Additional parenthetical statements that are intended to clarify, but just muddy the picture further. Yikes...

svilen_dobrev 1 days ago [-]

i prefer the "criticality" categorization of Alistair Cockburn in his crystal clear methodologies.. [1] (funny, none of the hundreds of copycats includes that - it's only findable in the book itself (pp ~240):

""" A second important dimension is criticality, the potential damage caused by an undetected defect: loss of comfort (C), loss of discretionary moneys (D), loss of essential moneys (E), and loss of life (L). """

(my rephrasing): he points that the more one moves further into that list, the more hardened/disciplined the way of making should be. From "anything goes" in the beginning to "no exceptions whatsoever" in the end.

[1] https://www.researchgate.net/publication/234820806_Crystal_c...

renewiltord 1 days ago [-]

To be honest, I’m not going to take advice from the guys who have to reboot their machines every 30 days or they won’t work.

superxpro12 1 days ago [-]

well if the project manager would have written that requirement maybe they would have got longer uptimes!