Friday, February 22, 2008
Joe Weiss recently commented over on his
blog, and in the public ISA-99 email lists about his theories about risk analysis for control systems. He makes some salient points, namely that there is not enough statistical data out there, nor likely will there be, to create risk projections. While I agree with the general criticism of the risk projections for control systems, it is clear to me based on previous experience that this is not a problem unique to the control systems space.
I posted a more extended version of this to the ISA-99 list, but would be worth others taking a look as well. Sure Risk Analysis is not easy, but it’s still one of the best tools we have. Many have considered applying game theory as a possible alternate model, but I think that work is still too early in its development.
This is not a problem unique to control systems. While we do have very little statistical data on control systems incidents from which to base past performance, we can still draw some meaningful conclusions in using traditional risk analysis approaches. I have conducted a number of such analysis efforts and we have found that our numbers hold fairly true over time (for example, one customer I worked for had many hundreds of plants, allowing us to see the meaningfulness of the data across a rather short period of time).
As for frequency… a predicted frequency is just that.. predicted. It is taking in some of the best information available and making qualitative guesses about future events… I personally do not believe you can use past events effectively to model future behavior, you must predict based upon the other variables that currently exist in the system. Past events should usually factor in very lightly to a serious risk equation. The factors that DO predict likelihood tend to be capability to cause the problem, number of potential threat vectors, and then some educated guesses about how often an event might occur. I do not agree that risk analysis simply looks only at past statistics. A reasonable likelihood calculation is much more complex and sophisticated. What I think we are trying too hard to do here is to put meaning into an intentional event, when an intentional event is just as entropic as a seemingly random failure. If I have 1000 employees, and I think that 25% of all people we terminate will have the sophistication and motivation to cause harm, and I fire two people, I can calculate a reasonable predictor on how many times someone may intentionally want to do me harm. It is no more statistically significant that someone can violate a design constraint of a system and bypass a safety system than some other random event such as a lightning strike.
While I do agree that there are challenges to the model, it simply should not be just resolved to 1 and the most serious consequence, and here is the reasoning:
- Risk managers in business are used to dealing with this model and it is meaningful to them
- Insurance companies are used to a similar model
- CFO and controllers plan for risk based on these types of projections
- There is not ALWAYS only the worst case, we need to model worst case examples as well as lesser ones to gain understanding.
This is NO DIFFERENT from IT or other risk management practices, it might offend the precise senses of an engineer, but it is the best that we have when dealing with predicting random events. What if we are predicting loss due to fire? In that case, I could care less if someone was an arsonist or if it was a lightning strike, the effect is the same. The question is what are you trying to model? If we can NOT do this, we can’t measure what we should effectively spend (always less than the total risk), we can’t effectively demonstrate to management what level of risk we expect a project to mitigate, and we can’t communicate effectively that we have gone through due care or due diligence to manage risk.