This year’s PCSF saw many productive discussions on the topic of responsible vulnerability disclosure (big hat tip to Zach and Mike who managed to keep the conversations from reducing to a bun fight). I want to take a moment to further detail a few of my own opinions on this subject matter.
Let me begin with a somewhat pragmatic definition of device vulnerabilities: Device Vulnerabilities (I wonder if this is where Tipping Point’s DVlabs name stemmed from) can be thought of as software, hardware, or requirements artifacts that may be utilized to violate the explicit or implied operational characteristics of a device (availability, key functionalities, etc). What I like about this definition and why I’m bothering to put it forward is that it exposes the fact that the risk of a vulnerability is positively correlated with the degree of asymmetric information (degree reality differs from expectation) which implies, of course, that symmetry is key to risk reduction ergo disclosure of some form is necessary. Obviously the need for some type of disclosure is not new but what is, I think, is a definition of device vulnerability that inherently drives it.
A question I quite frequently field is “why hunt for vulnerabilities”. Let’s explore, for a moment, the utility in identifying zero-days (save the obvious nefarious motivators) and what it means technically. First off, the user of the equipment, the operator or asset-owner, has the need to identify/quantify/mitigate business risk. A vulnerable control device failing to operate as required (and as advertized) in reaction to the presence of a vul’s trigger is an example of technical risk which quite often results in a risk to the business. It is therefore in the operator’s best interest to procure an understanding of the vulnerabilities present in their control devices. But, it does not end there. In order for vul knowledge to translate into business value, the vul information needs to be accompanied by the functional or prescriptive measures required to mitigate the risk. Technically speaking the operator is much more interested in failure than fault. [Recall from earlier blogs, failure refers to device behavior which deviates from the expected whereas fault refers to a static defect in source code]. With knowledge of the failure mode and definition of the trigger, risk reduction is then possible.
For the equipment manufacturer, or vendor, a bug found during Q/A is just a bug, while one found in the field is, more often than not, a vulnerability. Identification of vulnerabilities can lead to improvements in a vendor’s unit/functional/integration testing, and sometimes coding standards which together, at the end of the day, enable the deployment of higher quality products with a reduced support cost. Technically speaking, the vendor is much more interested in faults than failures. This is an important distinction, as the technical requirements for tracing failure to fault differ greatly from those required to render risk reduction measures even though both rely on vul identification.
In defining a vul disclosure process, one must consider who benefits most from the information and who bears the greatest risk from unwanted proliferation. The end-user of the equipment, the operator, does on both accounts. For a disclosure policy to be successful this sect, the operators, must be seen as the policy’s “key customer” and NOT the equipment vendor. How does this translate into building a successful policy for SCADA and process control? Well, we must consider operator constraints such as the significant overhead associated with change management including the impossibility of such on occasion. We must consider the 15-25 year lifecycle of the operational devices. We must consider requirements for 99% uptime. And the list goes on. Keep in mind, the most critical considerations are operator-centric.
As one begins to identify disclosures policy requirements, one quickly learns that controlled/restricted disclosure is what is best at this time. That begets the question, what do I see as the role of the Certs when it comes to IC vulnerability disclosure? I see their value as being a co-ordination center, a place where researchers such as myself can find and procure the proper contacts within an organization to whom the vulnerabilities that we discover can be effectively relayed. I also see a role for the Certs in raising awareness, relaying trend information, but most importantly establishing the common vernacular or domain language through which we can communicate effectively (normalizing defn’s, scoring systems – more on this in a future blog, etc). Consider what RFC 2544 did for Ethernet networking. It provided us definitions for throughput, etc which dictated how switch performance is now communicated in the industry, not to mention drove the manner in which switches/routers/etc are tested. Note to all you stack testers out there, RFC2544
http://www.ietf.org/rfc/rfc2544.txt is a seminal work. If you haven’t read it, well you should!
Last thought. At the end of the day it is the individual who discovers the vulnerability and his customer (internal or external to the company) who dictates the disclosure process followed. Do we need the the equivalent of the Geneva convention for vul dissemination (hat tip to Perry)? I’m not sure. But I would love to hear thoughts on the matter.