Evaluating and Improving Risk Analysis Methods for Critical Systems
At the same time as our dependence on IT systems increases, the number of reports of problems caused by failures of critical IT systems has also increased. Today, almost every societal system or service, e.g., water supply, power supply, transportation, depends on IT systems, and failures of these systems have serious and negative effects on society. In general, public organizations are responsible for delivering these services to society. Risk analysis is an important activity for the development and operation of critical IT systems, but the increased complexity and size of critical systems put additional requirements on the effectiveness of risk analysis methods. Even if a number of methods for risk analysis of technical systems exist, the failure behavior of information systems is typically very different from mechanical systems. Therefore, risk analysis of IT systems requires different risk analysis techniques, or at least adaptations of traditional approaches.
The research objective of this thesis is to improve the analysis process of risks pertaining to critical IT systems, which is addressed in the following three ways. First, by understanding current literature and practices related to risk analysis of IT systems, then by evaluating and comparing existing risk analysis methods, and by suggesting improvements in the risk analysis process and by developing new effective and efficient risk analysis methods to analyze IT systems.
To understand current risk analysis methods and practices we carried out a systematic mapping study. The study found only few empirical research papers on the evaluation of existing risk analysis methods. The results of the study sug- gest to empirically investigate risk analysis methods for analyzing IT systems to conclude which methods are more effective than others. Then, we carried out a semi-structured interview study to investigate several factors regarding current practices and existing challenges of risk analysis and management, e.g., its importance, identification of critical resources, involvement of different stakeholders, used methods, and follow-up analysis.
To evaluate and compare the effectiveness of risk analysis methods we carried out a controlled experiment. In that study, we evaluated the effectiveness of risk analysis methods by counting the number of relevant and non-relevant risks identified by the experiment participants. The difficulty level of risk analysis methods and the experiment participants’ confidence about the identified risks were also investigated. Then, we carried out a case study to evaluate the effectiveness and efficiency of existing risk analysis methods, Failure Mode and Effect Analysis (FMEA) and System Theoretic Process Analysis (STPA). The case study investigates the effectiveness of the methods by performing a comparison of how a hazard analysis is conducted for the same system. It also evaluates the analysis process of risk analysis methods by using a set of qualitative criteria, derived from the Technology Acceptance Model (TAM). After this, another case study was carried out to evaluate and assess the resilience of critical IT systems and networks by applying a simulation method. A hybrid modeling approach was used which considers the technical network, represented using graph theory, as well as the repair system, represented by a queuing model.
To improve the risk analysis process, this thesis also presents a new risk analysis method, Perspective Based Risk Analysis (PBRA), that uses different perspectives while analyzing IT systems. A perspective is a point of view or a specific role adopted by risk analyst while doing risk analysis, i.e., system engineer, system tester, or system user. Based on the findings, we conclude that the use of different perspectives improves effectiveness of risk analysis process. Then, to improve the risk analysis process we carried out a data mining study to save historical information about IT incidents to be used later for risk analysis. It could be an important aid in the process of building a database of occurred IT incidents that later can be used as an input to improve the risk analysis process. Finally, based on the findings of the studies included in this thesis a list of suggestions is presented to improve the risk analysis process. This list of potential suggestions was evaluated in a focus group meeting. The suggestions are for example, risk analysis aware- ness and education, defining clear roles and responsibilities, easy-to-use and adapt risk analysis methods, dealing with subjectivity, carry out risk analysis as early as possible and finally using historical risk data to improve the risk analysis process. Based on the findings it can be concluded that these suggestions are important and useful for risk practitioners to improve the risk analysis process.
The presented research work in this thesis provides research about methods to improve the risk analysis and management practices. Moreover, the presented work in this thesis is based on solid empirical studies.
The latest version of the PhD thesis can be downloaded here.