April 9th 2010 was Apollo 13’s 40th anniversary (4 + 9 =13!!!) I have seen the movie about this famous but unsuccessful lunar mission. One memorable scene in the movie is when a team of engineers on the ground (maybe with some scientists) had to construct a device (historic image) out of materials available and accessible on the spacecraft from scratch. Otherwise the trapped astronauts would die from CO2 poisoning. I felt that particular scene truly portraits the problem solving skills of engineers.
When we are commemorating the safe-return of Apollo 13 and its crew, we reflect on the failures of the mission as well. The unexpected ignition of the hydrogen-oxygen mixture due to unexpected exposed electrical wires, leading to the loss of oxygen supply in the service module, consequently the shut-down of the command module, forced the abortion of the 4.4 billion mission as well as jeopardizing the lives of three crew members.
Even though a potential disastrous event was prevented at the end of the day, what kind of lessons we can learn from it? Perhaps the first thought is: Expected the unexpected.
What that means is that, in order to prevent a grand failure, all possible routes leading to it must be considered and there must exist solutions and procedures (aka. backup plans) to ensure the failing of failure. In my opinion, crew member fatality would have been the grand failure, while others can choose their own definition of failures.
However, this approach is almost like an exhaustive search. If the “Butterfly Effect“ is also included, maybe that would give us the safest mission – which is taking forever to analyze all the possibilities of failures and the shuttle never gets to be launched!
Luckily, there already exist formal (and more practical) methods which determine the degree of safety and reliability long before Apollo 13 in 1970. The idea came during World War II . I’m guessing it was a major concern that the possibility of losing missiles before hitting the target or hitting the wrong target. Then the reliability and safety assessment method was extended to industries such as aircraft, space mission and nuclear reactors.
If we consider the exhaustive search above as infinite (and a continuous function f(x)) in the extreme case, by assigning weights (ie. probabilities) to the cases (value of f(x’)) that rarely occur (thanks to quality assurance) such that under certain weights, those cases are no longer worth considering. The weight values are basically probabilities of failures, which can come from statistical analysis of tests. In simple terms, a reliable car, for example, implies it malfunctions less often than a less reliable car.
But probability/reliability alone cannot be the final indication of the degree of safety – just look at how unlikely (or even unanticipated) the ignition of oxygen-hydrogen tank was! But it still led to the failure of the mission!
What we also need, is knowing the degree of consequences (the bad ones) associated with the probabilities! As I mentioned before, the direct consequence of the ignition was the loss of oxygen supply in the service module. This loss itself is not the grand failure (fatality), but it is severe enough to result in an unsuccessful lunar mission. Therefore, it is almost natural to express the combined effect of probability and consequence in a mathematical formula:
risk = frequency x consequence
where frequency can be calculated using the probability. A silly example to illustrate the equation above is that, suppose crew members got into a fight and there was fatality. This certainty would be a grand failure, hence represented by the largest (worst) consequence! But luckily the possibility (also, frequency) of such an event is absolutely zero (who would argue with that?), giving a zero risk scenario!
Now for a more realistic example. A tragic airplane crash happened one day after Apollo 13’s 40th anniversary. Not only the Poland president and his wife were on board, but also the country’s top military, church elites were all killed in the crash. The consequence of the plane crash is more than the loss of extraordinary lives. It also carries great social impacts in that the purpose of their visit to Russia was to commemorate a massacre that had divided Poland and Russia for decades.
If we apply our risk formula, the consequence of the plane crash is definitely the highest and the worst, while the frequency/probability of a plane crash is low but not close to zero. Unfortunately, had there been some risk assessment performed for this flight, it could have been easily determined that letting all Poland top officials and elites fly on the same plane is almost inappropriate. The weather condition and pilot’s misjudgment have been blamed for the accident. But I think it should be the organizers/planers of this flight who take the most responsibility.
Risk, reliability or safety assessment can be applied from multi-million-dollar space missions to a couple hours of air travel. There are many risky situations in life too. But before you decide to take the risk, remember to examine what the probability/frequency and consequences are!