Process Risk Analysis & Threat Modeling: A Practical Perspective into SCADA and Process Control Cyber Security
In the not too distant past, cutting edge western medicine explained illnesses in terms of humours. If you had a cold, you had too much phlegm, so you would balance your humours by increasing your yellow bile, which was antagonistic to phlegm. Apparently this involved sitting in bed and drinking lots of wine.
Now, as comfortable as this remedy sounds, it has a drawback: it doesn’t work. The idea of humours has some correlation with reality, since it was based on observation, but it is oversimplified. Now we know that the outward symptoms of colds are our body’s attempt to remove an invading virus. Our knowledge is still far from perfect, but we know why there’s all the mucus and sneezing.
The important difference is that we now view the human body as an extremely complex system, with failure modes and compensation strategies all built in.
Our current view of network security and reliability is very much like the Hippocratic view of medicine. If you have hackers, you have too many vulnerabilities, so to counter them you get patches and firewall rules. Apparently this involves buying more software.
Now, as comfortable as this remedy sounds, it does have one drawback: it doesn’t work. Just like with the body before it, we are largely failing to view networks as interconnected systems of purposeful parts. Because of this, our ability to understand potential failure modes, our ability to anticipate, plan for, and prevent them, is compromised.
This is especially significant in the realm of SCADA systems and process control, because there are fewer allowable failure modes and the complexity of the network is higher. In IT, if a server goes down, people are inconvenienced, but they have intelligence and agency that allows them to compensate and still get things done, albeit at a lower rate. On SCADA and process control networks, the users are PLCs, sensors, and microcontrollers, and they can work without the network no more than you could pick up a cup of coffee without nerves between your eyes, brain and muscles.
In traditional control environments, there is an understanding of the failure modes of the components of the system and the way these failures impact the rest of the system, and there are entropic fault rates that have allowed us to make accurate estimates of system reliability. But now, control has moved to a networked environment, and this has created new complexities that aren’t fully understood, and has introduced new, non-entropic fault types that we don’t know how to model.
Obviously, we need to learn how these new fault types affect system reliability, and we need to understand how they are caused and how to mitigate the risks. And of course, this has to be done while accounting for the stringent availability requirements that differentiate ICS from IT.
The key to this is knowledge. If we know what produces data, where it travels and what consumes it, we can deepen our understanding of how the process will be disrupted if any part of it fails. If we know how these components behave and where their weaknesses are, we can deepen our understanding of which disruptions are most likely and choose our compensation techniques to maximize their effect on reliability and minimize their scope and cost.
Asset management and change management are important parts of this equation, because effective judgement needs accurate knowledge. For the same reason, knowing about device vulnerabilities is important, as is understanding what part of the process the devices drive, and how they work together. With all of this knowledge, we could conceivably have the level of certainty in networked environments as we do in normal entropic-fault environments, but the sheer volume of information involved means we’re just as likely to choke on it and fail to identify cyber risks, which would leave us right where we are now.
At Wurldtech, we are working to automate the analysis of this information. Resilience profiles for devices are part of the equation, allowing us to gather everything we know about a device into a concise summary, but we are also extending this by developing techniques for threat modeling network-based processes which will further condense this information and allow us to predict the nature and severity of failures, analytically determine the key points of risk to a network, and find the most cost-effective strategies to deal with them.
- Reid Orsten